Merge branch 'master' of https://github.com/DarkstaIkers/youtube-dl into DarkstaIkers...

author Yen Chi Hsuan <yan12125@gmail.com>

Sat, 19 Nov 2016 16:05:11 +0000 (00:05 +0800)

committer Yen Chi Hsuan <yan12125@gmail.com>

Sat, 19 Nov 2016 16:05:11 +0000 (00:05 +0800)
author Yen Chi Hsuan <yan12125@gmail.com>
Sat, 19 Nov 2016 16:05:11 +0000 (00:05 +0800)
committer Yen Chi Hsuan <yan12125@gmail.com>
Sat, 19 Nov 2016 16:05:11 +0000 (00:05 +0800)
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md

index 5b1f573e788bc8b81ef2daa65fa89defb2488e6c..85ac137a1bf17798f20483bdfa5232a94da868a1 100644 (file)
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -6,8 +6,8 @@
  
  ---
  
-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.03.27*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
-- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.03.27**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.11.18*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.11.18**
  
  ### Before submitting an *issue* make sure you have:
  - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
  [debug] User config: []
  [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
  [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.03.27
+[debug] youtube-dl version 2016.11.18
  [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
  [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
  [debug] Proxy map: {}
@@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
  ### Description of your *issue*, suggested solution and other information
  
  Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
-If work on your *issue* required an account credentials please provide them or explain how one can obtain them.
+If work on your *issue* requires account credentials please provide them or explain how one can obtain them.
diff --git a/.github/ISSUE_TEMPLATE_tmpl.md b/.github/ISSUE_TEMPLATE_tmpl.md

index a5e6a4233d88a2b5cdae3d668d2874fb39e6adf0..ab9968129f33790aaf6471f0f41f6b21164fe0a7 100644 (file)
--- a/.github/ISSUE_TEMPLATE_tmpl.md
+++ b/.github/ISSUE_TEMPLATE_tmpl.md
@@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
  ### Description of your *issue*, suggested solution and other information
  
  Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
-If work on your *issue* required an account credentials please provide them or explain how one can obtain them.
+If work on your *issue* requires account credentials please provide them or explain how one can obtain them.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md

new file mode 100644 (file)

index 0000000..46fa26f
--- /dev/null
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,27 @@
+## Please follow the guide below
+
+- You will be asked some questions, please read them **carefully** and answer honestly
+- Put an `x` into all the boxes [ ] relevant to your *pull request* (like that [x])
+- Use *Preview* tab to see how your *pull request* will actually look like
+
+---
+
+### Before submitting a *pull request* make sure you have:
+- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
+- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
+
+### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
+- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
+- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
+
+### What is the purpose of your *pull request*?
+- [ ] Bug fix
+- [ ] Improvement
+- [ ] New extractor
+- [ ] New feature
+
+---
+
+### Description of your *pull request* and other information
+
+Explanation of your *pull request* in arbitrary form goes here. Please make sure the description explains the purpose and effect of your *pull request* and is worded well enough to be understood. Provide as much context and examples as possible.
diff --git a/.gitignore b/.gitignore

index 26dbde73d412673ee9c53ee06a476a803a92edc7..354505d66d61d6416b3700da1587ea4d4970fbc3 100644 (file)
--- a/.gitignore
+++ b/.gitignore
@@ -13,6 +13,7 @@ README.txt
  youtube-dl.1
  youtube-dl.bash-completion
  youtube-dl.fish
+youtube_dl/extractor/lazy_extractors.py
  youtube-dl
  youtube-dl.exe
  youtube-dl.tar.gz
@@ -27,10 +28,18 @@ updates_key.pem
  *.mp4
  *.m4a
  *.m4v
+*.mp3
+*.3gp
+*.wav
  *.part
  *.swp
  test/testdata
+test/local_parameters.json
  .tox
  youtube-dl.zsh
+
+# IntelliJ related files
  .idea
-.idea/*
+*.iml
+
+tmp/
diff --git a/.travis.yml b/.travis.yml

index cc21fae8f41ca567a2367d3515f981d8ec0af759..c74c9cc1295ba7c54228c5a25a71e469f8ea8026 100644 (file)
--- a/.travis.yml
+++ b/.travis.yml
@@ -11,7 +11,6 @@ script: nosetests test --verbose
  notifications:
    email:
      - filippo.valsorda@gmail.com
-    - phihag@phihag.de
      - yasoob.khld@gmail.com
  #  irc:
  #    channels:
diff --git a/AUTHORS b/AUTHORS

index ea8d399785602b268cd1de4b78487b85ad1cd63a..4a6f7e13f45fd72ae3da87c475fadf892d2f7a4f 100644 (file)
--- a/AUTHORS
+++ b/AUTHORS
@@ -26,7 +26,7 @@ Albert Kim
  Pierre Rudloff
  Huarong Huo
  Ismael Mejía
-Steffan 'Ruirize' James
+Steffan Donal
  Andras Elso
  Jelle van der Waa
  Marcin Cieślak
@@ -167,3 +167,26 @@ Kacper Michajłow
  José Joaquín Atria
  Viťas Strádal
  Kagami Hiiragi
+Philip Huppert
+blahgeek
+Kevin Deldycke
+inondle
+Tomáš Čech
+Déstin Reed
+Roman Tsiupa
+Artur Krysiak
+Jakub Adam Wieczorek
+Aleksandar Topuzović
+Nehal Patel
+Rob van Bekkum
+Petr Zvoníček
+Pratyush Singh
+Aleksander Nitecki
+Sebastian Blunt
+Matěj Cepl
+Xie Yanbo
+Philip Xu
+John Hawkinson
+Rich Leeper
+Zhong Jianxin
+Thor77
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md

index 0df6193fb3cdc13f415d2dc6d71c3c82074bb259..0b5a5c1f81b791ac7d37361845e400617c9e084e 100644 (file)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -12,7 +12,7 @@ $ youtube-dl -v <your command line>
  [debug] Proxy map: {}
  ...
  ```
-**Do not post screenshots of verbose log only plain text is acceptable.**
+**Do not post screenshots of verbose logs; only plain text is acceptable.**
  
  The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
  
@@ -46,7 +46,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
  
  ###  Why are existing options not enough?
  
-Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
+Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
  
  ###  Is there enough context in your bug report?
  
@@ -66,7 +66,7 @@ Only post features that you (or an incapacitated friend you can personally talk
  
  ###  Is your question about youtube-dl?
  
-It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
+It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
  
  # DEVELOPER INSTRUCTIONS
  
@@ -85,7 +85,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
  If you want to create a build of youtube-dl yourself, you'll need
  
  * python
-* make (both GNU make and BSD make are supported)
+* make (only GNU make is supported)
  * pandoc
  * zip
  * nosetests
@@ -97,9 +97,17 @@ If you want to add support for a new site, first of all **make sure** this site
  After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
  
  1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
-2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
-3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
+2. Check out the source code with:
+
+        git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
+
+3. Start a new git branch with
+
+        cd youtube-dl
+        git checkout -b yourextractor
+
  4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
+
      ```python
      # coding: utf-8
      from __future__ import unicode_literals
@@ -140,19 +148,151 @@ After you have ensured this site is distributing it's content legally, you can f
                  # TODO more properties (see youtube_dl/extractor/common.py)
              }
      ```
-5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
+5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
  6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
-7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
-8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
-9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
-10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
+7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
+8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
+9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
  
-        $ git add youtube_dl/extractor/__init__.py
+        $ git add youtube_dl/extractor/extractors.py
          $ git add youtube_dl/extractor/yourextractor.py
          $ git commit -m '[yourextractor] Add new extractor'
          $ git push origin yourextractor
  
-11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
+10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
  
  In any case, thank you very much for your contributions!
  
+## youtube-dl coding conventions
+
+This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
+
+Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
+
+### Mandatory and optional metafields
+
+For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
+
+ - `id` (media identifier)
+ - `title` (media title)
+ - `url` (media download URL) or `formats`
+
+In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
+
+[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
+
+#### Example
+
+Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
+
+```python
+meta = self._download_json(url, video_id)
+```
+    
+Assume at this point `meta`'s layout is:
+
+```python
+{
+    ...
+    "summary": "some fancy summary text",
+    ...
+}
+```
+
+Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
+
+```python
+description = meta.get('summary')  # correct
+```
+
+and not like:
+
+```python
+description = meta['summary']  # incorrect
+```
+
+The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
+
+Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
+
+```python
+description = self._search_regex(
+    r'<span[^>]+id="title"[^>]*>([^<]+)<',
+    webpage, 'description', fatal=False)
+```
+
+With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
+
+You can also pass `default=<some fallback value>`, for example:
+
+```python
+description = self._search_regex(
+    r'<span[^>]+id="title"[^>]*>([^<]+)<',
+    webpage, 'description', default=None)
+```
+
+On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
+ 
+### Provide fallbacks
+
+When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
+
+#### Example
+
+Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
+
+```python
+title = meta['title']
+```
+
+If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
+
+Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
+
+```python
+title = meta.get('title') or self._og_search_title(webpage)
+```
+
+This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
+
+### Make regular expressions flexible
+
+When using regular expressions try to write them fuzzy and flexible.
+ 
+#### Example
+
+Say you need to extract `title` from the following HTML code:
+
+```html
+<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
+```
+
+The code for that task should look similar to:
+
+```python
+title = self._search_regex(
+    r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
+```
+
+Or even better:
+
+```python
+title = self._search_regex(
+    r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
+    webpage, 'title', group='title')
+```
+
+Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute: 
+
+The code definitely should not look like:
+
+```python
+title = self._search_regex(
+    r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
+    webpage, 'title', group='title')
+```
+
+### Use safe conversion functions
+
+Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
+
diff --git a/ChangeLog b/ChangeLog

new file mode 100644 (file)

index 0000000..1512941
--- /dev/null
+++ b/ChangeLog
@@ -0,0 +1,978 @@
+version 2016.11.18
+
+Extractors
+* [youtube:live] Relax URL regular expression (#11164)
+* [openload] Fix extraction (#10408, #11122)
+* [vlive] Prefer locale over language for subtitles id (#11203)
+
+
+version 2016.11.14.1
+
+Core
++ [downoader/fragment,f4m,hls] Respect HTTP headers from info dict
+* [extractor/common] Fix media templates with Bandwidth substitution pattern in
+  MPD manifests (#11175)
+* [extractor/common] Improve thumbnail extraction from JSON-LD
+
+Extractors
++ [nrk] Workaround geo restriction
++ [nrk] Improve error detection and messages
++ [afreecatv] Add support for vod.afreecatv.com (#11174)
+* [cda] Fix and improve extraction (#10929, #10936)
+* [plays] Fix extraction (#11165)
+* [eagleplatform] Fix extraction (#11160)
++ [audioboom] Recognize /posts/ URLs (#11149)
+
+
+version 2016.11.08.1
+
+Extractors
+* [espn:article] Fix support for espn.com articles
+* [franceculture] Fix extraction (#11140)
+
+
+version 2016.11.08
+
+Extractors
+* [tmz:article] Fix extraction (#11052)
+* [espn] Fix extraction (#11041)
+* [mitele] Fix extraction after website redesign (#10824)
+- [ard] Remove age restriction check (#11129)
+* [generic] Improve support for pornhub.com embeds (#11100)
++ [generic] Add support for redtube.com embeds (#11099)
++ [generic] Add support for drtuber.com embeds (#11098)
++ [redtube] Add support for embed URLs
++ [drtuber] Add support for embed URLs
++ [yahoo] Improve content id extraction (#11088)
+* [toutv] Relax URL regular expression (#11121)
+
+
+version 2016.11.04
+
+Core
+* [extractor/common] Tolerate malformed RESOLUTION attribute in m3u8
+  manifests (#11113)
+* [downloader/ism] Fix AVC Decoder Configuration Record
+
+Extractors
++ [fox9] Add support for fox9.com (#11110)
++ [anvato] Extract more metadata and improve formats extraction
+* [vodlocker] Improve removed videos detection (#11106)
++ [vzaar] Add support for vzaar.com (#11093)
++ [vice] Add support for uplynk preplay videos (#11101)
+* [tubitv] Fix extraction (#11061)
++ [shahid] Add support for authentication (#11091)
++ [radiocanada] Add subtitles support (#11096)
++ [generic] Add support for ISM manifests
+
+
+version 2016.11.02
+
+Core
++ Add basic support for Smooth Streaming protocol (#8118, #10969)
+* Improve MPD manifest base URL extraction (#10909, #11079)
+* Fix --match-filter for int-like strings (#11082)
+
+Extractors
++ [mva] Add support for ISM formats
++ [msn] Add support for ISM formats
++ [onet] Add support for ISM formats
++ [tvp] Add support for ISM formats
++ [nicknight] Add support for nicknight sites (#10769)
+
+
+version 2016.10.30
+
+Extractors
+* [facebook] Improve 1080P video detection (#11073)
+* [imgur] Recognize /r/ URLs (#11071)
+* [beeg] Fix extraction (#11069)
+* [openload] Fix extraction (#10408)
+* [gvsearch] Modernize and fix search request (#11051)
+* [adultswim] Fix extraction (#10979)
++ [nobelprize] Add support for nobelprize.org (#9999)
+* [hornbunny] Fix extraction (#10981)
+* [tvp] Improve video id extraction (#10585)
+
+
+version 2016.10.26
+
+Extractors
++ [rentv] Add support for ren.tv (#10620)
++ [ard] Detect unavailable videos (#11018)
+* [vk] Fix extraction (#11022)
+
+
+version 2016.10.25
+
+Core
+* Running youtube-dl in the background is fixed (#10996, #10706, #955)
+
+Extractors
++ [jamendo] Add support for jamendo.com (#10132, #10736)
++ [pandatv] Add support for panda.tv (#10736)
++ [dotsub] Support Vimeo embed (#10964)
+* [litv] Fix extraction
++ [vimeo] Delegate ondemand redirects to ondemand extractor (#10994)
+* [vivo] Fix extraction (#11003)
++ [twitch:stream] Add support for rebroadcasts (#10995)
+* [pluralsight] Fix subtitles conversion (#10990)
+
+
+version 2016.10.21.1
+
+Extractors
++ [pluralsight] Process all clip URLs (#10984)
+
+
+version 2016.10.21
+
+Core
+- Disable thumbnails embedding in mkv
++ Add support for Comcast multiple-system operator (#10819)
+
+Extractors
+* [pluralsight] Adapt to new API (#10972)
+* [openload] Fix extraction (#10408, #10971)
++ [natgeo] Extract m3u8 formats (#10959)
+
+
+version 2016.10.19
+
+Core
++ [utils] Expose PACKED_CODES_RE
++ [extractor/common] Extract non smil wowza mpd manifests
++ [extractor/common] Detect f4m audio-only formats
+
+Extractors
+* [vidzi] Fix extraction (#10908, #10952)
+* [urplay] Fix subtitles extraction
++ [urplay] Add support for urskola.se (#10915)
++ [orf] Add subtitles support (#10939)
+* [youtube] Fix --no-playlist behavior for youtu.be/id URLs (#10896)
+* [nrk] Relax URL regular expression (#10928)
++ [nytimes] Add support for podcasts (#10926)
+* [pluralsight] Relax URL regular expression (#10941)
+
+
+version 2016.10.16
+
+Core
+* [postprocessor/ffmpeg] Return correct filepath and ext in updated information
+  in FFmpegExtractAudioPP (#10879)
+
+Extractors
++ [ruutu] Add support for supla.fi (#10849)
++ [theoperaplatform] Add support for theoperaplatform.eu (#10914)
+* [lynda] Fix height for prioritized streams
++ [lynda] Add fallback extraction scenario
+* [lynda] Switch to https (#10916)
++ [huajiao] New extractor (#10917)
+* [cmt] Fix mgid extraction (#10813)
++ [safari:course] Add support for techbus.safaribooksonline.com
+* [orf:tvthek] Fix extraction and modernize (#10898)
+* [chirbit] Fix extraction of user profile pages
+* [carambatv] Fix extraction
+* [canalplus] Fix extraction for some videos
+* [cbsinteractive] Fix extraction for cnet.com
+* [parliamentliveuk] Lower case URLs are now recognized (#10912)
+
+
+version 2016.10.12
+
+Core
++ Support HTML media elements without child nodes
+* [Makefile] Support for GNU make < 4 is fixed; BSD make dropped (#9387)
+
+Extractors
+* [dailymotion] Fix extraction (#10901)
+* [vimeo:review] Fix extraction (#10900)
+* [nhl] Correctly handle invalid formats (#10713)
+* [footyroom] Fix extraction (#10810)
+* [abc.net.au:iview] Fix for standalone (non series) videos (#10895)
++ [hbo] Add support for episode pages (#10892)
+* [allocine] Fix extraction (#10860)
++ [nextmedia] Recognize action news on AppleDaily
+* [lego] Improve info extraction and bypass geo restriction (#10872)
+
+
+version 2016.10.07
+
+Extractors
++ [iprima] Detect geo restriction
+* [facebook] Fix video extraction (#10846)
++ [commonprotocols] Support direct MMS links (#10838)
++ [generic] Add support for multiple vimeo embeds (#10862)
++ [nzz] Add support for nzz.ch (#4407)
++ [npo] Detect geo restriction
++ [npo] Add support for 2doc.nl (#10842)
++ [lego] Add support for lego.com (#10369)
++ [tonline] Add support for t-online.de (#10376)
+* [techtalks] Relax URL regular expression (#10840)
+* [youtube:live] Extend URL regular expression (#10839)
++ [theweatherchannel] Add support for weather.com (#7188)
++ [thisoldhouse] Add support for thisoldhouse.com (#10837)
++ [nhl] Add support for wch2016.com (#10833)
+* [pornoxo] Use JWPlatform to improve metadata extraction
+
+
+version 2016.10.02
+
+Core
+* Fix possibly lost extended attributes during post-processing
++ Support pyxattr as well as python-xattr for --xattrs and
+  --xattr-set-filesize (#9054)
+
+Extractors
++ [jwplatform] Support DASH streams in JWPlayer
++ [jwplatform] Support old-style JWPlayer playlists
++ [byutv:event] Add extractor
+* [periscope:user] Fix extraction (#10820)
+* [dctp] Fix extraction (#10734)
++ [instagram] Extract video dimensions (#10790)
++ [tvland] Extend URL regular expression (#10812)
++ [vgtv] Add support for tv.aftonbladet.se (#10800)
+- [aftonbladet] Remove extractor
+* [vk] Fix timestamp and view count extraction (#10760)
++ [vk] Add support for running and finished live streams (#10799)
++ [leeco] Recognize more Le Sports URLs (#10794)
++ [instagram] Extract comments (#10788)
++ [ketnet] Extract mzsource formats (#10770)
+* [limelight:media] Improve HTTP formats extraction
+
+
+version 2016.09.27
+
+Core
++ Add hdcore query parameter to akamai f4m formats
++ Delegate HLS live streams downloading to ffmpeg
++ Improved support for HTML5 subtitles
+
+Extractors
++ [vk] Add support for dailymotion embeds (#10661)
+* [promptfile] Fix extraction (#10634)
+* [kaltura] Speed up embed regular expressions (#10764)
++ [npo] Add support for anderetijden.nl (#10754)
++ [prosiebensat1] Add support for advopedia sites
+* [mwave] Relax URL regular expression (#10735, #10748)
+* [prosiebensat1] Fix playlist support (#10745)
++ [prosiebensat1] Add support for sat1gold sites (#10745)
++ [cbsnews:livevideo] Fix extraction and extract m3u8 formats
++ [brightcove:new] Add support for live streams
+* [soundcloud] Generalize playlist entries extraction (#10733)
++ [mtv] Add support for new URL schema (#8169, #9808)
+* [einthusan] Fix extraction (#10714)
++ [twitter] Support Periscope embeds (#10737)
++ [openload] Support subtitles (#10625)
+
+
+version 2016.09.24
+
+Core
++ Add support for watchTVeverywhere.com authentication provider based MSOs for
+  Adobe Pass authentication (#10709)
+
+Extractors
++ [soundcloud:playlist] Provide video id for early playlist entries (#10733)
++ [prosiebensat1] Add support for kabeleinsdoku (#10732)
+* [cbs] Extract info from thunder videoPlayerService (#10728)
+* [openload] Fix extraction (#10408)
++ [ustream] Support the new HLS streams (#10698)
++ [ooyala] Extract all HLS formats
++ [cartoonnetwork] Add support for Adobe Pass authentication
++ [soundcloud] Extract license metadata
++ [fox] Add support for Adobe Pass authentication (#8584)
++ [tbs] Add support for Adobe Pass authentication (#10642, #10222)
++ [trutv] Add support for Adobe Pass authentication (#10519)
++ [turner] Add support for Adobe Pass authentication
+
+
+version 2016.09.19
+
+Extractors
++ [crunchyroll] Check if already authenticated (#10700)
+- [twitch:stream] Remove fallback to profile extraction when stream is offline
+* [thisav] Improve title extraction (#10682)
+* [vyborymos] Improve station info extraction
+
+
+version 2016.09.18
+
+Core
++ Introduce manifest_url and fragments fields in formats dictionary for
+  fragmented media
++ Provide manifest_url field for DASH segments, HLS and HDS
++ Provide fragments field for DASH segments
+* Rework DASH segments downloader to use fragments field
++ Add helper method for Wowza Streaming Engine formats extraction
+
+Extractors
++ [vyborymos] Add extractor for vybory.mos.ru (#10692)
++ [xfileshare] Add title regular expression for streamin.to (#10646)
++ [globo:article] Add support for multiple videos (#10653)
++ [thisav] Recognize HTML5 videos (#10447)
+* [jwplatform] Improve JWPlayer detection
++ [mangomolo] Add support for Mangomolo embeds
++ [toutv] Add support for authentication (#10669)
+* [franceinter] Fix upload date extraction
+* [tv4] Fix HLS and HDS formats extraction (#10659)
+
+
+version 2016.09.15
+
+Core
+* Improve _hidden_inputs
++ Introduce improved explicit Adobe Pass support
++ Add --ap-mso to provide multiple-system operator identifier
++ Add --ap-username to provide MSO account username
++ Add --ap-password to provide MSO account password
++ Add --ap-list-mso to list all supported MSOs
++ Add support for Rogers Cable multiple-system operator (#10606)
+
+Extractors
+* [crunchyroll] Fix authentication (#10655)
+* [twitch] Fix API calls (#10654, #10660)
++ [bellmedia] Add support for more Bell Media Television sites
+* [franceinter] Fix extraction (#10538, #2105)
+* [kuwo] Improve error detection (#10650)
++ [go] Add support for free full episodes (#10439)
+* [bilibili] Fix extraction for specific videos (#10647)
+* [nhk] Fix extraction (#10633)
+* [kaltura] Improve audio detection
+* [kaltura] Skip chun format
++ [vimeo:ondemand] Pass Referer along with embed URL (#10624)
++ [nbc] Add support for NBC Olympics (#10361)
+
+
+version 2016.09.11.1
+
+Extractors
++ [tube8] Extract categories and tags (#10579)
++ [pornhub] Extract categories and tags (#10499)
+* [openload] Temporary fix (#10408)
++ [foxnews] Add support Fox News articles (#10598)
+* [viafree] Improve video id extraction (#10615)
+* [iwara] Fix extraction after relaunch (#10462, #3215)
++ [tfo] Add extractor for tfo.org
+* [lrt] Fix audio extraction (#10566)
+* [9now] Fix extraction (#10561)
++ [canalplus] Add support for c8.fr (#10577)
+* [newgrounds] Fix uploader extraction (#10584)
++ [polskieradio:category] Add support for category lists (#10576)
++ [ketnet] Add extractor for ketnet.be (#10343)
++ [canvas] Add support for een.be (#10605)
++ [telequebec] Add extractor for telequebec.tv (#1999)
+* [parliamentliveuk] Fix extraction (#9137)
+
+
+version 2016.09.08
+
+Extractors
++ [jwplatform] Extract height from format label
++ [yahoo] Extract Brightcove Legacy Studio embeds (#9345)
+* [videomore] Fix extraction (#10592)
+* [foxgay] Fix extraction (#10480)
++ [rmcdecouverte] Add extractor for rmcdecouverte.bfmtv.com (#9709)
+* [gamestar] Fix metadata extraction (#10479)
+* [puls4] Fix extraction (#10583)
++ [cctv] Add extractor for CCTV and CNTV (#8153)
++ [lci] Add extractor for lci.fr (#10573)
++ [wat] Extract DASH formats
++ [viafree] Improve video id detection (#10569)
++ [trutv] Add extractor for trutv.com (#10519)
++ [nick] Add support for nickelodeon.nl (#10559)
++ [abcotvs:clips] Add support for clips.abcotvs.com
++ [abcotvs] Add support for ABC Owned Television Stations sites (#9551)
++ [miaopai] Add extractor for miaopai.com (#10556)
+* [gamestar] Fix metadata extraction (#10479)
++ [bilibili] Add support for episodes (#10190)
++ [tvnoe] Add extractor for tvnoe.cz (#10524)
+
+
+version 2016.09.04.1
+
+Core
+* In DASH downloader if the first segment fails, abort the whole download
+  process to prevent throttling (#10497)
++ Add support for --skip-unavailable-fragments and --fragment retries in
+  hlsnative downloader (#10165, #10448).
++ Add support for --skip-unavailable-fragments in DASH downloader
++ Introduce --skip-unavailable-fragments option for fragment based downloaders
+  that allows to skip fragments unavailable due to a HTTP error
+* Fix extraction of video/audio entries with src attribute in
+  _parse_html5_media_entries (#10540)
+
+Extractors
+* [theplatform] Relax URL regular expression (#10546)
+* [youtube:playlist] Extend URL regular expression
+* [rottentomatoes] Delegate extraction to internetvideoarchive extractor
+* [internetvideoarchive] Extract all formats
+* [pornvoisines] Fix extraction (#10469)
+* [rottentomatoes] Fix extraction (#10467)
+* [espn] Extend URL regular expression (#10549)
+* [vimple] Extend URL regular expression (#10547)
+* [youtube:watchlater] Fix extraction (#10544)
+* [youjizz] Fix extraction (#10437)
++ [foxnews] Add support for FoxNews Insider (#10445)
++ [fc2] Recognize Flash player URLs (#10512)
+
+
+version 2016.09.03
+
+Core
+* Restore usage of NAME attribute from EXT-X-MEDIA tag for formats codes in
+  _extract_m3u8_formats (#10522)
+* Handle semicolon in mimetype2ext
+
+Extractors
++ [youtube] Add support for rental videos' previews (#10532)
+* [youtube:playlist] Fallback to video extraction for video/playlist URLs when
+  no playlist is actually served (#10537)
++ [drtv] Add support for dr.dk/nyheder (#10536)
++ [facebook:plugins:video] Add extractor (#10530)
++ [go] Add extractor for *.go.com sites
+* [adobepass] Check for authz_token expiration (#10527)
+* [nytimes] improve extraction
+* [thestar] Fix extraction (#10465)
+* [glide] Fix extraction (#10478)
+- [exfm] Remove extractor (#10482)
+* [youporn] Fix categories and tags extraction (#10521)
++ [curiositystream] Add extractor for app.curiositystream.com
+- [thvideo] Remove extractor (#10464)
+* [movingimage] Fix for the new site name (#10466)
++ [cbs] Add support for once formats (#10515)
+* [limelight] Skip ism snd duplicate manifests
++ [porncom] Extract categories and tags (#10510)
++ [facebook] Extract timestamp (#10508)
++ [yahoo] Extract more formats
+
+
+version 2016.08.31
+
+Extractors
+* [soundcloud] Fix URL regular expression to avoid clashes with sets (#10505)
+* [bandcamp:album] Fix title extraction (#10455)
+* [pyvideo] Fix extraction (#10468)
++ [ctv] Add support for tsn.ca, bnn.ca and thecomedynetwork.ca (#10016)
+* [9c9media] Extract more metadata
+* [9c9media] Fix multiple stacks extraction (#10016)
+* [adultswim] Improve video info extraction (#10492)
+* [vodplatform] Improve embed regular expression
+- [played] Remove extractor (#10470)
++ [tbs] Add extractor for tbs.com and tntdrama.com (#10222)
++ [cartoonnetwork] Add extractor for cartoonnetwork.com (#10110)
+* [adultswim] Rework in terms of turner extractor
+* [cnn] Rework in terms of turner extractor
+* [nba] Rework in terms of turner extractor
++ [turner] Add base extractor for Turner Broadcasting System based sites
+* [bilibili] Fix extraction (#10375)
+* [openload] Fix extraction (#10408)
+
+
+version 2016.08.28
+
+Core
++ Add warning message that ffmpeg doesn't support SOCKS
+* Improve thumbnail sorting
++ Extract formats from #EXT-X-MEDIA tags in _extract_m3u8_formats
+* Fill IV with leading zeros for IVs shorter than 16 octets in hlsnative
++ Add ac-3 to the list of audio codecs in parse_codecs
+
+Extractors
+* [periscope:user] Fix extraction (#10453)
+* [douyutv] Fix extraction (#10153, #10318, #10444)
++ [nhk:vod] Add extractor for www3.nhk.or.jp on demand (#4437, #10424)
+- [trutube] Remove extractor (#10438)
++ [usanetwork] Add extractor for usanetwork.com
+* [crackle] Fix extraction (#10333)
+* [spankbang] Fix description and uploader extraction (#10339)
+* [discoverygo] Detect cable provider restricted videos (#10425)
++ [cbc] Add support for watch.cbc.ca
+* [kickstarter] Silent the warning for og:description (#10415)
+* [mtvservices:embedded] Fix extraction for the new 'edge' player (#10363)
+
+
+version 2016.08.24.1
+
+Extractors
++ [pluralsight] Add support for subtitles (#9681)
+
+
+version 2016.08.24
+
+Extractors
+* [youtube] Fix authentication (#10392)
+* [openload] Fix extraction (#10408)
++ [bravotv] Add support for Adobe Pass (#10407)
+* [bravotv] Fix clip info extraction (#10407)
+* [eagleplatform] Improve embedded videos detection (#10409)
+* [awaan] Fix extraction
+* [mtvservices:embedded] Update config URL
++ [abc:iview] Add extractor (#6148)
+
+
+version 2016.08.22
+
+Core
+* Improve formats and subtitles extension auto calculation
++ Recognize full unit names in parse_filesize
++ Add support for m3u8 manifests in HTML5 multimedia tags
+* Fix octal/hexadecimal number detection in js_to_json
+
+Extractors
++ [ivi] Add support for 720p and 1080p
++ [charlierose] Add new extractor (#10382)
+* [1tv] Fix extraction (#9249)
+* [twitch] Renew authentication
+* [kaltura] Improve subtitles extension calculation
++ [zingmp3] Add support for video clips
+* [zingmp3] Fix extraction (#10041)
+* [kaltura] Improve subtitles extraction (#10279)
+* [cultureunplugged] Fix extraction (#10330)
++ [cnn] Add support for money.cnn.com (#2797)
+* [cbsnews] Fix extraction (#10362)
+* [cbs] Fix extraction (#10393)
++ [litv] Support 'promo' URLs (#10385)
+* [snotr] Fix extraction (#10338)
+* [n-tv.de] Fix extraction (#10331)
+* [globo:article] Relax URL and video id regular expressions (#10379)
+
+
+version 2016.08.19
+
+Core
+- Remove output template description from --help
+* Recognize lowercase units in parse_filesize
+
+Extractors
++ [porncom] Add extractor for porn.com (#2251, #10251)
++ [generic] Add support for DBTV embeds
+* [vk:wallpost] Fix audio extraction for new site layout
+* [vk] Fix authentication
++ [hgtvcom:show] Add extractor for hgtv.com shows (#10365)
++ [discoverygo] Add support for another GO network sites
+
+
+version 2016.08.17
+
+Core
++ Add _get_netrc_login_info
+
+Extractors
+* [mofosex] Extract all formats (#10335)
++ [generic] Add support for vbox7 embeds
++ [vbox7] Add support for embed URLs
++ [viafree] Add extractor (#10358)
++ [mtg] Add support for viafree URLs (#10358)
+* [theplatform] Extract all subtitles per language
++ [xvideos] Fix HLS extraction (#10356)
++ [amcnetworks] Add extractor
++ [bbc:playlist] Add support for pagination (#10349)
++ [fxnetworks] Add extractor (#9462)
+* [cbslocal] Fix extraction for SendtoNews-based videos
+* [sendtonews] Fix extraction
+* [jwplatform] Extract video id from JWPlayer data
+- [zippcast] Remove extractor (#10332)
++ [viceland] Add extractor (#8799)
++ [adobepass] Add base extractor for Adobe Pass Authentication
+* [life:embed] Improve extraction
+* [vgtv] Detect geo restricted videos (#10348)
++ [uplynk] Add extractor
+* [xiami] Fix extraction (#10342)
+
+
+version 2016.08.13
+
+Core
+* Show progress for curl external downloader
+* Forward more options to curl external downloader
+
+Extractors
+* [pbs] Fix description extraction
+* [franceculture] Fix extraction (#10324)
+* [pornotube] Fix extraction (#10322)
+* [4tube] Fix metadata extraction (#10321)
+* [imgur] Fix width and height extraction (#10325)
+* [expotv] Improve extraction
++ [vbox7] Fix extraction (#10309)
+- [tapely] Remove extractor (#10323)
+* [muenchentv] Fix extraction (#10313)
++ [24video] Add support for .me and .xxx TLDs
+* [24video] Fix comment count extraction
+* [sunporno] Add support for embed URLs
+* [sunporno] Fix metadata extraction (#10316)
++ [hgtv] Add extractor for hgtv.ca (#3999)
+- [pbs] Remove request to unavailable API
++ [pbs] Add support for high quality HTTP formats
++ [crunchyroll] Add support for HLS formats (#10301)
+
+
+version 2016.08.12
+
+Core
+* Subtitles are now written as is. Newline conversions are disabled. (#10268)
++ Recognize more formats in unified_timestamp
+
+Extractors
+- [goldenmoustache] Remove extractor (#10298)
+* [drtuber] Improve title extraction
+* [drtuber] Make dislike count optional (#10297)
+* [chirbit] Fix extraction (#10296)
+* [francetvinfo] Relax URL regular expression
+* [rtlnl] Relax URL regular expression (#10282)
+* [formula1] Relax URL regular expression (#10283)
+* [wat] Improve extraction (#10281)
+* [ctsnews] Fix extraction
+
+
+version 2016.08.10
+
+Core
+* Make --metadata-from-title non fatal when title does not match the pattern
+* Introduce options for randomized sleep before each download
+  --min-sleep-interval and --max-sleep-interval (#9930)
+* Respect default in _search_json_ld
+
+Extractors
++ [uol] Add extractor for uol.com.br (#4263)
+* [rbmaradio] Fix extraction and extract all formats (#10242)
++ [sonyliv] Add extractor for sonyliv.com (#10258)
+* [aparat] Fix extraction
+* [cwtv] Extract HTTP formats
++ [rozhlas] Add extractor for prehravac.rozhlas.cz (#10253)
+* [kuwo:singer] Fix extraction
+
+
+version 2016.08.07
+
+Core
++ Add support for TV Parental Guidelines ratings in parse_age_limit
++ Add decode_png (#9706)
++ Add support for partOfTVSeries in JSON-LD
+* Lower master M3U8 manifest preference for better format sorting
+
+Extractors
++ [discoverygo] Add extractor (#10245)
+* [flipagram] Make JSON-LD extraction non fatal
+* [generic] Make JSON-LD extraction non fatal
++ [bbc] Add support for morph embeds (#10239)
+* [tnaflixnetworkbase] Improve title extraction
+* [tnaflix] Fix metadata extraction (#10249)
+* [fox] Fix theplatform release URL query
+* [openload] Fix extraction (#9706)
+* [bbc] Skip duplicate manifest URLs
+* [bbc] Improve format code
++ [bbc] Add support for DASH and F4M
+* [bbc] Improve format sorting and listing
+* [bbc] Improve playlist extraction
++ [pokemon] Add extractor (#10093)
++ [condenast] Add fallback scenario for video info extraction
+
+
+version 2016.08.06
+
+Core
+* Add support for JSON-LD root list entries (#10203)
+* Improve unified_timestamp
+* Lower preference of RTSP formats in generic sorting
++ Add support for multiple properties in _og_search_property
+* Improve password hiding from verbose output
+
+Extractors
++ [adultswim] Add support for trailers (#10235)
+* [archiveorg] Improve extraction (#10219)
++ [jwplatform] Add support for playlists
++ [jwplatform] Add support for relative URLs
+* [jwplatform] Improve audio detection
++ [tvplay] Capture and output native error message
++ [tvplay] Extract series metadata
++ [tvplay] Add support for subtitles (#10194)
+* [tvp] Improve extraction (#7799)
+* [cbslocal] Fix timestamp parsing (#10213)
++ [naver] Add support for subtitles (#8096)
+* [naver] Improve extraction
+* [condenast] Improve extraction
+* [engadget] Relax URL regular expression
+* [5min] Fix extraction
++ [nationalgeographic] Add support for Episode Guide
++ [kaltura] Add support for subtitles
+* [kaltura] Optimize network requests
++ [vodplatform] Add extractor for vod-platform.net
+- [gamekings] Remove extractor
+* [limelight] Extract HTTP formats
+* [ntvru] Fix extraction
++ [comedycentral] Re-add :tds and :thedailyshow shortnames
+
+
+version 2016.08.01
+
+Fixed/improved extractors
+- [yandexmusic:track] Adapt to changes in track location JSON (#10193)
+- [bloomberg] Support another form of player (#10187)
+- [limelight] Skip DRM protected videos
+- [safari] Relax regular expressions for URL matching (#10202)
+- [cwtv] Add support for cwtvpr.com (#10196)
+
+
+version 2016.07.30
+
+Fixed/improved extractors
+- [twitch:clips] Sort formats
+- [tv2] Use m3u8_native
+- [tv2:article] Fix video detection (#10188)
+- rtve (#10076)
+- [dailymotion:playlist] Optimize download archive processing (#10180)
+
+
+version 2016.07.28
+
+Fixed/improved extractors
+- shared (#10170)
+- soundcloud (#10179)
+- twitch (#9767)
+
+
+version 2016.07.26.2
+
+Fixed/improved extractors
+- smotri
+- camdemy
+- mtv
+- comedycentral
+- cmt
+- cbc
+- mgtv
+- orf
+
+
+version 2016.07.24
+
+New extractors
+- arkena (#8682)
+- lcp (#8682)
+
+Fixed/improved extractors
+- facebook (#10151)
+- dailymail
+- telegraaf
+- dcn
+- onet
+- tvp
+
+Miscellaneous
+- Support $Time$ in DASH manifests
+
+
+version 2016.07.22
+
+New extractors
+- odatv (#9285)
+
+Fixed/improved extractors
+- bbc
+- youjizz (#10131)
+- youtube (#10140)
+- pornhub (#10138)
+- eporner (#10139)
+
+
+version 2016.07.17
+
+New extractors
+- nintendo (#9986)
+- streamable (#9122)
+
+Fixed/improved extractors
+- ard (#10095)
+- mtv
+- comedycentral (#10101)
+- viki (#10098)
+- spike (#10106)
+
+Miscellaneous
+- Improved twitter player detection (#10090)
+
+
+version 2016.07.16
+
+New extractors
+- ninenow (#5181)
+
+Fixed/improved extractors
+- rtve (#10076)
+- brightcove
+- 3qsdn
+- syfy (#9087, #3820, #2388)
+- youtube (#10083)
+
+Miscellaneous
+- Fix subtitle embedding for video-only and audio-only files (#10081)
+
+
+version 2016.07.13
+
+New extractors
+- rudo
+
+Fixed/improved extractors
+- biobiochiletv
+- tvplay
+- dbtv
+- brightcove
+- tmz
+- youtube (#10059)
+- shahid (#10062)
+- vk
+- ellentv (#10067)
+
+
+version 2016.07.11
+
+New Extractors
+- roosterteeth (#9864)
+
+Fixed/improved extractors
+- miomio (#9605)
+- vuclip
+- youtube
+- vidzi (#10058)
+
+
+version 2016.07.09.2
+
+Fixed/improved extractors
+- vimeo (#1638)
+- facebook (#10048)
+- lynda (#10047)
+- animeondemand
+
+Fixed/improved features
+- Embedding subtitles no longer throws an error with problematic inputs (#9063)
+
+
+version 2016.07.09.1
+
+Fixed/improved extractors
+- youtube
+- ard
+- srmediatek (#9373)
+
+
+version 2016.07.09
+
+New extractors
+- Flipagram (#9898)
+
+Fixed/improved extractors
+- telecinco
+- toutv
+- radiocanada
+- tweakers (#9516)
+- lynda
+- nick (#7542)
+- polskieradio (#10028)
+- le
+- facebook (#9851)
+- mgtv
+- animeondemand (#10031)
+
+Fixed/improved features
+- `--postprocessor-args` and `--downloader-args` now accepts non-ASCII inputs
+  on non-Windows systems
+
+
+version 2016.07.07
+
+New extractors
+- kamcord (#10001)
+
+Fixed/improved extractors
+- spiegel (#10018)
+- metacafe (#8539, #3253)
+- onet (#9950)
+- francetv (#9955)
+- brightcove (#9965)
+- daum (#9972)
+
+
+version 2016.07.06
+
+Fixed/improved extractors
+- youtube (#10007, #10009)
+- xuite
+- stitcher
+- spiegel
+- slideshare
+- sandia
+- rtvnh
+- prosiebensat1
+- onionstudios
+
+
+version 2016.07.05
+
+Fixed/improved extractors
+- brightcove
+- yahoo (#9995)
+- pornhub (#9997)
+- iqiyi
+- kaltura (#5557)
+- la7
+- Changed features
+- Rename --cn-verfication-proxy to --geo-verification-proxy
+Miscellaneous
+- Add script for displaying downloads statistics
+
+
+version 2016.07.03.1
+
+Fixed/improved extractors
+- theplatform
+- aenetworks
+- nationalgeographic
+- hrti (#9482)
+- facebook (#5701)
+- buzzfeed (#5701)
+- rai (#8617, #9157, #9232, #8552, #8551)
+- nationalgeographic (#9991)
+- iqiyi
+
+
+version 2016.07.03
+
+New extractors
+- hrti (#9482)
+
+Fixed/improved extractors
+- vk (#9981)
+- facebook (#9938)
+- xtube (#9953, #9961)
+
+
+version 2016.07.02
+
+New extractors
+- fusion (#9958)
+
+Fixed/improved extractors
+- twitch (#9975)
+- vine (#9970)
+- periscope (#9967)
+- pornhub (#8696)
+
+
+version 2016.07.01
+
+New extractors
+- 9c9media
+- ctvnews (#2156)
+- ctv (#4077)
+
+Fixed/Improved extractors
+- rds
+- meta (#8789)
+- pornhub (#9964)
+- sixplay (#2183)
+
+New features
+- Accept quoted strings across multiple lines (#9940)
diff --git a/Makefile b/Makefile

index 3a6c3794438b57f9ebfdc9197a3a21f0dd7505ce..b7cec16669fff13abde79db73501853ac155cf5b 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
-all: youtube-dl README.md CONTRIBUTING.md ISSUE_TEMPLATE.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
+all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
  
  clean:
-       rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
+       rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
         find . -name "*.pyc" -delete
         find . -name "*.class" -delete
  
@@ -12,7 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
  PYTHON ?= /usr/bin/env python
  
  # set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
-SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi
+SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)
  
  install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
         install -d $(DESTDIR)$(BINDIR)
@@ -37,7 +37,7 @@ test:
  ot: offlinetest
  
  offlinetest: codetest
-       $(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
+       $(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py --exclude test_socks.py
  
  tar: youtube-dl.tar.gz
  
@@ -59,7 +59,7 @@ README.md: youtube_dl/*.py youtube_dl/*/*.py
  CONTRIBUTING.md: README.md
         $(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
  
-ISSUE_TEMPLATE.md:
+.github/ISSUE_TEMPLATE.md: devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md  youtube_dl/version.py
         $(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md .github/ISSUE_TEMPLATE.md
  
  supportedsites:
@@ -69,7 +69,7 @@ README.txt: README.md
         pandoc -f markdown -t plain README.md -o README.txt
  
  youtube-dl.1: README.md
-       $(PYTHON) devscripts/prepare_manpage.py >youtube-dl.1.temp.md
+       $(PYTHON) devscripts/prepare_manpage.py youtube-dl.1.temp.md
         pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
         rm -f youtube-dl.1.temp.md
  
@@ -88,7 +88,13 @@ youtube-dl.fish: youtube_dl/*.py youtube_dl/*/*.py devscripts/fish-completion.in
  
  fish-completion: youtube-dl.fish
  
-youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
+lazy-extractors: youtube_dl/extractor/lazy_extractors.py
+
+_EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py')
+youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
+       $(PYTHON) devscripts/make_lazy_extractors.py $@
+
+youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog
         @tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
                 --exclude '*.DS_Store' \
                 --exclude '*.kate-swp' \
@@ -101,7 +107,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
                 --exclude 'docs/_build' \
                 -- \
                 bin devscripts test youtube_dl docs \
-               LICENSE README.md README.txt \
+               ChangeLog LICENSE README.md README.txt \
                 Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
                 youtube-dl.zsh youtube-dl.fish setup.py \
                 youtube-dl
diff --git a/README.md b/README.md

index e972bf69f8aedc6ec235f7f50a83e8ac5b951994..98e37442070128aefdf0109beec8d69e0fbffaf7 100644 (file)
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@ youtube-dl - download videos from youtube.com or other video platforms
  
  To install it right away for all UNIX users (Linux, OS X, etc.), type:
  
-    sudo curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
+    sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
      sudo chmod a+rx /usr/local/bin/youtube-dl
  
  If you do not have curl, you can alternatively use a recent wget:
@@ -25,20 +25,26 @@ If you do not have curl, you can alternatively use a recent wget:
      sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
      sudo chmod a+rx /usr/local/bin/youtube-dl
  
-Windows users can [download a .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in their home directory or any other location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29).
+Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
  
-OS X users can install **youtube-dl** with [Homebrew](http://brew.sh/).
+You can also use pip:
+
+    sudo pip install --upgrade youtube-dl
+    
+This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
+
+OS X users can install youtube-dl with [Homebrew](http://brew.sh/):
  
      brew install youtube-dl
  
-You can also use pip:
+Or with [MacPorts](https://www.macports.org/):
  
-    sudo pip install youtube-dl
+    sudo port install youtube-dl
  
  Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
  
  # DESCRIPTION
-**youtube-dl** is a small command-line program to download videos from
+**youtube-dl** is a command-line program to download videos from
  YouTube.com and a few more sites. It requires the Python interpreter, version
  2.6, 2.7, or 3.2+, and it is not platform specific. It should work on
  your Unix box, on Windows or on Mac OS X. It is released to the public domain,
@@ -73,8 +79,8 @@ which means you can modify it, redistribute it or use it however you like.
                                       repairs broken URLs, but emits an error if
                                       this is not possible instead of searching.
      --ignore-config                  Do not read configuration files. When given
-                                     in the global configuration file /etc
-                                     /youtube-dl.conf: Do not read the user
+                                     in the global configuration file
+                                     /etc/youtube-dl.conf: Do not read the user
                                       configuration in ~/.config/youtube-
                                       dl/config (%APPDATA%/youtube-dl/config.txt
                                       on Windows)
@@ -83,11 +89,15 @@ which means you can modify it, redistribute it or use it however you like.
      --mark-watched                   Mark videos watched (YouTube only)
      --no-mark-watched                Do not mark videos watched (YouTube only)
      --no-color                       Do not emit color codes in output
+    --abort-on-unavailable-fragment  Abort downloading when some fragment is not
+                                     available
  
  ## Network Options:
-    --proxy URL                      Use the specified HTTP/HTTPS proxy. Pass in
-                                     an empty string (--proxy "") for direct
-                                     connection
+    --proxy URL                      Use the specified HTTP/HTTPS/SOCKS proxy.
+                                     To enable experimental SOCKS proxy, specify
+                                     a proper scheme. For example
+                                     socks5://127.0.0.1:1080/. Pass in an empty
+                                     string (--proxy "") for direct connection
      --socket-timeout SECONDS         Time to wait before giving up, in seconds
      --source-address IP              Client-side IP address to bind to
                                       (experimental)
@@ -95,9 +105,9 @@ which means you can modify it, redistribute it or use it however you like.
                                       (experimental)
      -6, --force-ipv6                 Make all connections via IPv6
                                       (experimental)
-    --cn-verification-proxy URL      Use this proxy to verify the IP address for
-                                     some Chinese sites. The default proxy
-                                     specified by --proxy (or none, if the
+    --geo-verification-proxy URL     Use this proxy to verify the IP address for
+                                     some geo-restricted sites. The default
+                                     proxy specified by --proxy (or none, if the
                                       options is not present) is used for the
                                       actual downloading. (experimental)
  
@@ -160,12 +170,15 @@ which means you can modify it, redistribute it or use it however you like.
                                       (experimental)
  
  ## Download Options:
-    -r, --rate-limit LIMIT           Maximum download rate in bytes per second
+    -r, --limit-rate RATE            Maximum download rate in bytes per second
                                       (e.g. 50K or 4.2M)
      -R, --retries RETRIES            Number of retries (default is 10), or
                                       "infinite".
      --fragment-retries RETRIES       Number of retries for a fragment (default
-                                     is 10), or "infinite" (DASH only)
+                                     is 10), or "infinite" (DASH and hlsnative
+                                     only)
+    --skip-unavailable-fragments     Skip unavailable fragments (DASH and
+                                     hlsnative only)
      --buffer-size SIZE               Size of download buffer (e.g. 1024 or 16K)
                                       (default is 1024)
      --no-resize-buffer               Do not automatically adjust the buffer
@@ -176,7 +189,9 @@ which means you can modify it, redistribute it or use it however you like.
      --xattr-set-filesize             Set file xattribute ytdl.filesize with
                                       expected filesize (experimental)
      --hls-prefer-native              Use the native HLS downloader instead of
-                                     ffmpeg (experimental)
+                                     ffmpeg
+    --hls-prefer-ffmpeg              Use ffmpeg instead of the native HLS
+                                     downloader
      --hls-use-mpegts                 Use the mpegts container for HLS videos,
                                       allowing to play the video while
                                       downloading (some players may not be able
@@ -191,32 +206,8 @@ which means you can modify it, redistribute it or use it however you like.
      -a, --batch-file FILE            File containing URLs to download ('-' for
                                       stdin)
      --id                             Use only video ID in file name
-    -o, --output TEMPLATE            Output filename template. Use %(title)s to
-                                     get the title, %(uploader)s for the
-                                     uploader name, %(uploader_id)s for the
-                                     uploader nickname if different,
-                                     %(autonumber)s to get an automatically
-                                     incremented number, %(ext)s for the
-                                     filename extension, %(format)s for the
-                                     format description (like "22 - 1280x720" or
-                                     "HD"), %(format_id)s for the unique id of
-                                     the format (like YouTube's itags: "137"),
-                                     %(upload_date)s for the upload date
-                                     (YYYYMMDD), %(extractor)s for the provider
-                                     (youtube, metacafe, etc), %(id)s for the
-                                     video id, %(playlist_title)s,
-                                     %(playlist_id)s, or %(playlist)s (=title if
-                                     present, ID otherwise) for the playlist the
-                                     video is in, %(playlist_index)s for the
-                                     position in the playlist. %(height)s and
-                                     %(width)s for the width and height of the
-                                     video format. %(resolution)s for a textual
-                                     description of the resolution of the video
-                                     format. %% for a literal percent. Use - to
-                                     output to stdout. Can also be used to
-                                     download to a different directory, for
-                                     example with -o '/my/downloads/%(uploader)s
-                                     /%(title)s-%(id)s.%(ext)s' .
+    -o, --output TEMPLATE            Output filename template, see the "OUTPUT
+                                     TEMPLATE" for all the info
      --autonumber-size NUMBER         Specify the number of digits in
                                       %(autonumber)s when it is present in output
                                       filename template or --auto-number option
@@ -245,18 +236,19 @@ which means you can modify it, redistribute it or use it however you like.
      --write-info-json                Write video metadata to a .info.json file
      --write-annotations              Write video annotations to a
                                       .annotations.xml file
-    --load-info FILE                 JSON file containing the video information
+    --load-info-json FILE            JSON file containing the video information
                                       (created with the "--write-info-json"
                                       option)
      --cookies FILE                   File to read cookies from and dump cookie
                                       jar in
      --cache-dir DIR                  Location in the filesystem where youtube-dl
                                       can store some downloaded information
-                                     permanently. By default $XDG_CACHE_HOME
-                                     /youtube-dl or ~/.cache/youtube-dl . At the
-                                     moment, only YouTube player files (for
-                                     videos with obfuscated signatures) are
-                                     cached, but that may change.
+                                     permanently. By default
+                                     $XDG_CACHE_HOME/youtube-dl or
+                                     ~/.cache/youtube-dl . At the moment, only
+                                     YouTube player files (for videos with
+                                     obfuscated signatures) are cached, but that
+                                     may change.
      --no-cache-dir                   Disable filesystem caching
      --rm-cache-dir                   Delete all filesystem cache files
  
@@ -319,7 +311,15 @@ which means you can modify it, redistribute it or use it however you like.
                                       bidirectional text support. Requires bidiv
                                       or fribidi executable in PATH
      --sleep-interval SECONDS         Number of seconds to sleep before each
-                                     download.
+                                     download when used alone or a lower bound
+                                     of a range for randomized sleep before each
+                                     download (minimum possible number of
+                                     seconds to sleep) when used along with
+                                     --max-sleep-interval.
+    --max-sleep-interval SECONDS     Upper bound of a range for randomized sleep
+                                     before each download (maximum possible
+                                     number of seconds to sleep). Must only be
+                                     used along with --min-sleep-interval.
  
  ## Video Format Options:
      -f, --format FORMAT              Video format code, see the "FORMAT
@@ -358,6 +358,17 @@ which means you can modify it, redistribute it or use it however you like.
      -n, --netrc                      Use .netrc authentication data
      --video-password PASSWORD        Video password (vimeo, smotri, youku)
  
+## Adobe Pass Options:
+    --ap-mso MSO                     Adobe Pass multiple-system operator (TV
+                                     provider) identifier, use --ap-list-mso for
+                                     a list of available MSOs
+    --ap-username USERNAME           Multiple-system operator account login
+    --ap-password PASSWORD           Multiple-system operator account password.
+                                     If this option is left out, youtube-dl will
+                                     ask interactively.
+    --ap-list-mso                    List all supported multiple-system
+                                     operators
+
  ## Post-processing Options:
      -x, --extract-audio              Convert video files to audio-only files
                                       (requires ffmpeg or avconv and ffprobe or
@@ -413,13 +424,22 @@ which means you can modify it, redistribute it or use it however you like.
  
  # CONFIGURATION
  
-You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
+You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
  
  For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
  ```
+# Lines starting with # are comments
+
+# Always extract audio
  -x
+
+# Do not copy the mtime
  --no-mtime
+
+# Use this proxy
  --proxy 127.0.0.1:3128
+
+# Save all videos under Movies directory in your home directory
  -o ~/Movies/%(title)s.%(ext)s
  ```
  
@@ -429,12 +449,12 @@ You can use `--ignore-config` if you want to disable the configuration file for
  
  ### Authentication with `.netrc` file
  
-You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a`.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
+You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
  ```
  touch $HOME/.netrc
  chmod a-rwx,u+rw $HOME/.netrc
  ```
-After that you can add credentials for extractor in the following format, where *extractor* is the name of extractor in lowercase:
+After that you can add credentials for an extractor in the following format, where *extractor* is the name of the extractor in lowercase:
  ```
  machine <extractor> login <login> password <password>
  ```
@@ -463,7 +483,7 @@ The basic usage is not to set any template arguments when downloading a single f
   - `display_id`: An alternative identifier for the video
   - `uploader`: Full name of the video uploader
   - `license`: License name the video is licensed under
- - `creator`: The main artist who created the video
+ - `creator`: The creator of the video
   - `release_date`: The date (YYYYMMDD) when the video was released
   - `timestamp`: UNIX timestamp of the moment the video became available
   - `upload_date`: Video upload date (YYYYMMDD)
@@ -500,6 +520,9 @@ The basic usage is not to set any template arguments when downloading a single f
   - `autonumber`: Five-digit number that will be increased with each download, starting at zero
   - `playlist`: Name or id of the playlist that contains the video
   - `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist
+ - `playlist_id`: Playlist identifier
+ - `playlist_title`: Playlist title
+
  
  Available for the video that belongs to some logical chapter or section:
   - `chapter`: Name or title of the chapter the video belongs to
@@ -515,18 +538,34 @@ Available for the video that is an episode of some series or programme:
   - `episode_number`: Number of the video episode within a season
   - `episode_id`: Id of the video episode
  
-Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
+Available for the media that is a track or a part of a music album:
+ - `track`: Title of the track
+ - `track_number`: Number of the track within an album or a disc
+ - `track_id`: Id of the track
+ - `artist`: Artist(s) of the track
+ - `genre`: Genre(s) of the track
+ - `album`: Title of the album the track belongs to
+ - `album_type`: Type of the album
+ - `album_artist`: List of all artists appeared on the album
+ - `disc_number`: Number of the disc or other physical medium the track belongs to
+ - `release_year`: Year (YYYY) when the album was released
+
+Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with `NA`.
  
-For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
+For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj`, this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
  
-Output template can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` that will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
+Output templates can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` which will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
  
-To specify percent literal in output template use `%%`. To output to stdout use `-o -`.
+To use percent literals in an output template use `%%`. To output to stdout use `-o -`.
  
  The current default template is `%(title)s-%(id)s.%(ext)s`.
  
  In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
  
+#### Output template and Windows batch files
+
+If you are using an output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
+
  #### Output template examples
  
  Note on Windows you may need to use double quotes instead of single.
@@ -558,7 +597,7 @@ $ youtube-dl -o - BaW_jenozKc
  
  By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**.
  
-But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
+But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so-called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
  
  The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
  
@@ -566,21 +605,21 @@ The general syntax for format selection is `--format FORMAT` or shorter `-f FORM
  
  The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific. 
  
-You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download best quality format of particular file extension served as a single file, e.g. `-f webm` will download best quality format with `webm` extension served as a single file.
+You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download the best quality format of a particular file extension served as a single file, e.g. `-f webm` will download the best quality format with the `webm` extension served as a single file.
  
-You can also use special names to select particular edge case format:
- - `best`: Select best quality format represented by single file with video and audio
- - `worst`: Select worst quality format represented by single file with video and audio
- - `bestvideo`: Select best quality video only format (e.g. DASH video), may not be available
- - `worstvideo`: Select worst quality video only format, may not be available
- - `bestaudio`: Select best quality audio only format, may not be available
- - `worstaudio`: Select worst quality audio only format, may not be available
+You can also use special names to select particular edge case formats:
+ - `best`: Select the best quality format represented by a single file with video and audio.
+ - `worst`: Select the worst quality format represented by a single file with video and audio.
+ - `bestvideo`: Select the best quality video-only format (e.g. DASH video). May not be available.
+ - `worstvideo`: Select the worst quality video-only format. May not be available.
+ - `bestaudio`: Select the best quality audio only-format. May not be available.
+ - `worstaudio`: Select the worst quality audio only-format. May not be available.
  
-For example, to download worst quality video only format you can use `-f worstvideo`.
+For example, to download the worst quality video-only format you can use `-f worstvideo`.
  
  If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
  
-If you want to download several formats of the same video use comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or more sophisticated example combined with precedence feature `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
+If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
  
  You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
  
@@ -602,15 +641,15 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
   - `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
   - `format_id`: A short description of the format
  
-Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
+Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
  
  Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
  
-You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download best video only format, best audio only format and mux them together with ffmpeg/avconv.
+You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg/avconv.
  
  Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
  
-Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
+Since the end of April 2015 and version 2015.04.26, youtube-dl uses `-f bestvideo+bestaudio/best` as the default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
  
  If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
  
@@ -630,7 +669,11 @@ $ youtube-dl -f 'best[filesize<50M]'
  
  # Download best format available via direct link over HTTP/HTTPS protocol
  $ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
+
+# Download the best video format and the best audio format without merging them
+$ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'
  ```
+Note that in the last example, an output template is recommended as bestvideo and bestaudio may have the same file name.
  
  
  # VIDEO SELECTION
@@ -677,11 +720,19 @@ hash -r
  
  Again, from then on you'll be able to update with `sudo youtube-dl -U`.
  
+### youtube-dl is extremely slow to start on Windows
+
+Add a file exclusion for `youtube-dl.exe` in Windows Defender settings.
+
  ### I'm getting an error `Unable to extract OpenGraph title` on YouTube playlists
  
  YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos.
  
-If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging guys](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
+If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging people](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
+
+### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number`
+
+Make sure you are not using `-o` with any of these options `-t`, `--title`, `--id`, `-A` or `--auto-number` set in command line or in a configuration file. Remove the latter if any.
  
  ### Do I always have to pass `-citw`?
  
@@ -703,11 +754,11 @@ Videos or video formats streamed via RTMP protocol can only be downloaded when [
  
  ### I have downloaded a video but how can I play it?
  
-Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
+Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org/) or [mplayer](http://www.mplayerhq.hu/).
  
  ### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
  
-It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies.  Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl.
+It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies and/or HTTP headers. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl. You can also get necessary cookies and HTTP headers from JSON output obtained with `--dump-json`.
  
  It may be beneficial to use IPv6; in some cases, the restrictions are only applied to IPv4. Some services (sometimes only for a subset of videos) do not restrict the video URL by IP address, cookie, or user-agent, but these are the exception rather than the rule.
  
@@ -760,9 +811,9 @@ means you're using an outdated version of Python. Please update to Python 2.6 or
  
  Since June 2012 ([#342](https://github.com/rg3/youtube-dl/issues/342)) youtube-dl is packed as an executable zipfile, simply unzip it (might need renaming to `youtube-dl.zip` first on some systems) or clone the git repository, as laid out above. If you modify the code, you can run it by executing the `__main__.py` file. To recompile the executable, run `make youtube-dl`.
  
-### The exe throws a *Runtime error from Visual C++*
+### The exe throws an error due to missing `MSVCR100.dll`
  
-To run the exe you need to install first the [Microsoft Visual C++ 2008 Redistributable Package](http://www.microsoft.com/en-us/download/details.aspx?id=29).
+To run the exe you need to install first the [Microsoft Visual C++ 2010 Redistributable Package (x86)](https://www.microsoft.com/en-US/download/details.aspx?id=5555).
  
  ### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files?
  
@@ -785,10 +836,42 @@ Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the opt
  
  ### How do I pass cookies to youtube-dl?
  
-Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
+Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
+
+In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
+
+Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
  
  Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
  
+### How do I stream directly to media player?
+
+You will first need to tell youtube-dl to stream media to stdout with `-o -`, and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to [vlc](http://www.videolan.org/) can be achieved with:
+
+    youtube-dl -o - "http://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
+
+### How do I download only new videos from a playlist?
+
+Use download-archive feature. With this feature you should initially download the complete playlist with `--download-archive /path/to/download/archive/file.txt` that will record identifiers of all the videos in a special file. Each subsequent run with the same `--download-archive` will download only new videos and skip all videos that have been downloaded before. Note that only successful downloads are recorded in the file.
+
+For example, at first,
+
+    youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
+
+will download the complete `PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re` playlist and create a file `archive.txt`. Each subsequent run will only download new videos if any:
+
+    youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
+
+### Should I add `--hls-prefer-native` into my config?
+
+When youtube-dl detects an HLS video, it can download it either with the built-in downloader or ffmpeg. Since many HLS streams are slightly invalid and ffmpeg/youtube-dl each handle some invalid cases better than the other, there is an option to switch the downloader if needed.
+
+When youtube-dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube-dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube-dl, with improvements of the built-in downloader and/or ffmpeg.
+
+In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](http://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
+
+If you put either `--hls-prefer-native` or `--hls-prefer-ffmpeg` into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to [file an issue](https://yt-dl.org/bug) or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case.
+
  ### Can you add support for this anime video site, or site which shows current movies for free?
  
  As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.
@@ -817,6 +900,12 @@ It is *not* possible to detect whether a URL is supported or not. That's because
  
  If you want to find out whether a given URL is supported, simply call youtube-dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube-dl on the console) or catching an `UnsupportedError` exception if you run it from a Python program.
  
+# Why do I need to go through that much red tape when filing bugs?
+
+Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was already reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
+
+youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current.
+
  # DEVELOPER INSTRUCTIONS
  
  Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
@@ -834,7 +923,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
  If you want to create a build of youtube-dl yourself, you'll need
  
  * python
-* make (both GNU make and BSD make are supported)
+* make (only GNU make is supported)
  * pandoc
  * zip
  * nosetests
@@ -846,9 +935,17 @@ If you want to add support for a new site, first of all **make sure** this site
  After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
  
  1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
-2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
-3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
+2. Check out the source code with:
+
+        git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
+
+3. Start a new git branch with
+
+        cd youtube-dl
+        git checkout -b yourextractor
+
  4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
+
      ```python
      # coding: utf-8
      from __future__ import unicode_literals
@@ -889,22 +986,154 @@ After you have ensured this site is distributing it's content legally, you can f
                  # TODO more properties (see youtube_dl/extractor/common.py)
              }
      ```
-5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
+5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
  6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
-7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
-8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
-9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
-10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
+7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
+8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
+9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
  
-        $ git add youtube_dl/extractor/__init__.py
+        $ git add youtube_dl/extractor/extractors.py
          $ git add youtube_dl/extractor/yourextractor.py
          $ git commit -m '[yourextractor] Add new extractor'
          $ git push origin yourextractor
  
-11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
+10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
  
  In any case, thank you very much for your contributions!
  
+## youtube-dl coding conventions
+
+This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
+
+Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
+
+### Mandatory and optional metafields
+
+For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
+
+ - `id` (media identifier)
+ - `title` (media title)
+ - `url` (media download URL) or `formats`
+
+In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
+
+[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
+
+#### Example
+
+Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
+
+```python
+meta = self._download_json(url, video_id)
+```
+    
+Assume at this point `meta`'s layout is:
+
+```python
+{
+    ...
+    "summary": "some fancy summary text",
+    ...
+}
+```
+
+Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
+
+```python
+description = meta.get('summary')  # correct
+```
+
+and not like:
+
+```python
+description = meta['summary']  # incorrect
+```
+
+The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
+
+Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
+
+```python
+description = self._search_regex(
+    r'<span[^>]+id="title"[^>]*>([^<]+)<',
+    webpage, 'description', fatal=False)
+```
+
+With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
+
+You can also pass `default=<some fallback value>`, for example:
+
+```python
+description = self._search_regex(
+    r'<span[^>]+id="title"[^>]*>([^<]+)<',
+    webpage, 'description', default=None)
+```
+
+On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
+ 
+### Provide fallbacks
+
+When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
+
+#### Example
+
+Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
+
+```python
+title = meta['title']
+```
+
+If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
+
+Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
+
+```python
+title = meta.get('title') or self._og_search_title(webpage)
+```
+
+This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
+
+### Make regular expressions flexible
+
+When using regular expressions try to write them fuzzy and flexible.
+ 
+#### Example
+
+Say you need to extract `title` from the following HTML code:
+
+```html
+<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
+```
+
+The code for that task should look similar to:
+
+```python
+title = self._search_regex(
+    r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
+```
+
+Or even better:
+
+```python
+title = self._search_regex(
+    r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
+    webpage, 'title', group='title')
+```
+
+Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute: 
+
+The code definitely should not look like:
+
+```python
+title = self._search_regex(
+    r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
+    webpage, 'title', group='title')
+```
+
+### Use safe conversion functions
+
+Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
+
  # EMBEDDING YOUTUBE-DL
  
  youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new).
@@ -920,7 +1149,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
      ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
  ```
  
-Most likely, you'll want to use various options. For a list of what can be done, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L121-L269). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
+Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L128-L278). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
  
  Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:
  
@@ -961,7 +1190,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
  
  # BUGS
  
-Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
+Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
  
  **Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
  ```
@@ -977,7 +1206,7 @@ $ youtube-dl -v <your command line>
  [debug] Proxy map: {}
  ...
  ```
-**Do not post screenshots of verbose log only plain text is acceptable.**
+**Do not post screenshots of verbose logs; only plain text is acceptable.**
  
  The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
  
@@ -1011,7 +1240,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
  
  ###  Why are existing options not enough?
  
-Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
+Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
  
  ###  Is there enough context in your bug report?
  
@@ -1031,7 +1260,7 @@ Only post features that you (or an incapacitated friend you can personally talk
  
  ###  Is your question about youtube-dl?
  
-It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
+It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
  
  # COPYRIGHT
  
diff --git a/devscripts/bash-completion.py b/devscripts/bash-completion.py

index ce68f26f9ca39bd298f5d4149346af686257e042..3d1391334bd38a23c7024192c6c36522acaa5613 100755 (executable)
--- a/devscripts/bash-completion.py
+++ b/devscripts/bash-completion.py
@@ -25,5 +25,6 @@ def build_completion(opt_parser):
          filled_template = template.replace("{{flags}}", " ".join(opts_flag))
          f.write(filled_template)
  
+
  parser = youtube_dl.parseOpts()[0]
  build_completion(parser)
diff --git a/devscripts/buildserver.py b/devscripts/buildserver.py

index 7c2f49f8bb63bbe2b47efca151129a7e6b49674d..fc99c3213dddf985cfcf4fe74584cc09eeaf3175 100644 (file)
--- a/devscripts/buildserver.py
+++ b/devscripts/buildserver.py
@@ -1,17 +1,38 @@
  #!/usr/bin/python3
  
-from http.server import HTTPServer, BaseHTTPRequestHandler
-from socketserver import ThreadingMixIn
  import argparse
  import ctypes
  import functools
+import shutil
+import subprocess
  import sys
+import tempfile
  import threading
  import traceback
  import os.path
  
+sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
+from youtube_dl.compat import (
+    compat_input,
+    compat_http_server,
+    compat_str,
+    compat_urlparse,
+)
+
+# These are not used outside of buildserver.py thus not in compat.py
+
+try:
+    import winreg as compat_winreg
+except ImportError:  # Python 2
+    import _winreg as compat_winreg
  
-class BuildHTTPServer(ThreadingMixIn, HTTPServer):
+try:
+    import socketserver as compat_socketserver
+except ImportError:  # Python 2
+    import SocketServer as compat_socketserver
+
+
+class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer):
      allow_reuse_address = True
  
  
@@ -191,7 +212,7 @@ def main(args=None):
                          action='store_const', dest='action', const='service',
                          help='Run as a Windows service')
      parser.add_argument('-b', '--bind', metavar='<host:port>',
-                        action='store', default='localhost:8142',
+                        action='store', default='0.0.0.0:8142',
                          help='Bind to host:port (default %default)')
      options = parser.parse_args(args=args)
  
@@ -216,7 +237,7 @@ def main(args=None):
      srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
      thr = threading.Thread(target=srv.serve_forever)
      thr.start()
-    input('Press ENTER to shut down')
+    compat_input('Press ENTER to shut down')
      srv.shutdown()
      thr.join()
  
@@ -231,8 +252,6 @@ def rmtree(path):
              os.remove(fname)
      os.rmdir(path)
  
-#==============================================================================
-
  
  class BuildError(Exception):
      def __init__(self, output, code=500):
@@ -249,15 +268,25 @@ class HTTPError(BuildError):
  
  class PythonBuilder(object):
      def __init__(self, **kwargs):
-        pythonVersion = kwargs.pop('python', '2.7')
-        try:
-            key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, r'SOFTWARE\Python\PythonCore\%s\InstallPath' % pythonVersion)
+        python_version = kwargs.pop('python', '3.4')
+        python_path = None
+        for node in ('Wow6432Node\\', ''):
              try:
-                self.pythonPath, _ = _winreg.QueryValueEx(key, '')
-            finally:
-                _winreg.CloseKey(key)
-        except Exception:
-            raise BuildError('No such Python version: %s' % pythonVersion)
+                key = compat_winreg.OpenKey(
+                    compat_winreg.HKEY_LOCAL_MACHINE,
+                    r'SOFTWARE\%sPython\PythonCore\%s\InstallPath' % (node, python_version))
+                try:
+                    python_path, _ = compat_winreg.QueryValueEx(key, '')
+                finally:
+                    compat_winreg.CloseKey(key)
+                break
+            except Exception:
+                pass
+
+        if not python_path:
+            raise BuildError('No such Python version: %s' % python_version)
+
+        self.pythonPath = python_path
  
          super(PythonBuilder, self).__init__(**kwargs)
  
@@ -305,8 +334,10 @@ class YoutubeDLBuilder(object):
  
      def build(self):
          try:
-            subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
-                                    cwd=self.buildPath)
+            proc = subprocess.Popen([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'], stdin=subprocess.PIPE, cwd=self.buildPath)
+            proc.wait()
+            #subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
+            #                        cwd=self.buildPath)
          except subprocess.CalledProcessError as e:
              raise BuildError(e.output)
  
@@ -369,12 +400,12 @@ class Builder(PythonBuilder, GITBuilder, YoutubeDLBuilder, DownloadBuilder, Clea
      pass
  
  
-class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
+class BuildHTTPRequestHandler(compat_http_server.BaseHTTPRequestHandler):
      actionDict = {'build': Builder, 'download': Builder}  # They're the same, no more caching.
  
      def do_GET(self):
-        path = urlparse.urlparse(self.path)
-        paramDict = dict([(key, value[0]) for key, value in urlparse.parse_qs(path.query).items()])
+        path = compat_urlparse.urlparse(self.path)
+        paramDict = dict([(key, value[0]) for key, value in compat_urlparse.parse_qs(path.query).items()])
          action, _, path = path.path.strip('/').partition('/')
          if path:
              path = path.split('/')
@@ -388,7 +419,7 @@ class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
                          builder.close()
                  except BuildError as e:
                      self.send_response(e.code)
-                    msg = unicode(e).encode('UTF-8')
+                    msg = compat_str(e).encode('UTF-8')
                      self.send_header('Content-Type', 'text/plain; charset=UTF-8')
                      self.send_header('Content-Length', len(msg))
                      self.end_headers()
@@ -400,7 +431,5 @@ class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
          else:
              self.send_response(500, 'Malformed URL')
  
-#==============================================================================
-
  if __name__ == '__main__':
      main()
diff --git a/devscripts/create-github-release.py b/devscripts/create-github-release.py

new file mode 100644 (file)

index 0000000..30716ad
--- /dev/null
+++ b/devscripts/create-github-release.py
@@ -0,0 +1,120 @@
+#!/usr/bin/env python
+from __future__ import unicode_literals
+
+import base64
+import io
+import json
+import mimetypes
+import netrc
+import optparse
+import os
+import re
+import sys
+
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from youtube_dl.compat import (
+    compat_basestring,
+    compat_input,
+    compat_getpass,
+    compat_print,
+    compat_urllib_request,
+)
+from youtube_dl.utils import (
+    make_HTTPS_handler,
+    sanitized_Request,
+)
+
+
+class GitHubReleaser(object):
+    _API_URL = 'https://api.github.com/repos/rg3/youtube-dl/releases'
+    _UPLOADS_URL = 'https://uploads.github.com/repos/rg3/youtube-dl/releases/%s/assets?name=%s'
+    _NETRC_MACHINE = 'github.com'
+
+    def __init__(self, debuglevel=0):
+        self._init_github_account()
+        https_handler = make_HTTPS_handler({}, debuglevel=debuglevel)
+        self._opener = compat_urllib_request.build_opener(https_handler)
+
+    def _init_github_account(self):
+        try:
+            info = netrc.netrc().authenticators(self._NETRC_MACHINE)
+            if info is not None:
+                self._username = info[0]
+                self._password = info[2]
+                compat_print('Using GitHub credentials found in .netrc...')
+                return
+            else:
+                compat_print('No GitHub credentials found in .netrc')
+        except (IOError, netrc.NetrcParseError):
+            compat_print('Unable to parse .netrc')
+        self._username = compat_input(
+            'Type your GitHub username or email address and press [Return]: ')
+        self._password = compat_getpass(
+            'Type your GitHub password and press [Return]: ')
+
+    def _call(self, req):
+        if isinstance(req, compat_basestring):
+            req = sanitized_Request(req)
+        # Authorizing manually since GitHub does not response with 401 with
+        # WWW-Authenticate header set (see
+        # https://developer.github.com/v3/#basic-authentication)
+        b64 = base64.b64encode(
+            ('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
+        req.add_header('Authorization', 'Basic %s' % b64)
+        response = self._opener.open(req).read().decode('utf-8')
+        return json.loads(response)
+
+    def list_releases(self):
+        return self._call(self._API_URL)
+
+    def create_release(self, tag_name, name=None, body='', draft=False, prerelease=False):
+        data = {
+            'tag_name': tag_name,
+            'target_commitish': 'master',
+            'name': name,
+            'body': body,
+            'draft': draft,
+            'prerelease': prerelease,
+        }
+        req = sanitized_Request(self._API_URL, json.dumps(data).encode('utf-8'))
+        return self._call(req)
+
+    def create_asset(self, release_id, asset):
+        asset_name = os.path.basename(asset)
+        url = self._UPLOADS_URL % (release_id, asset_name)
+        # Our files are small enough to be loaded directly into memory.
+        data = open(asset, 'rb').read()
+        req = sanitized_Request(url, data)
+        mime_type, _ = mimetypes.guess_type(asset_name)
+        req.add_header('Content-Type', mime_type or 'application/octet-stream')
+        return self._call(req)
+
+
+def main():
+    parser = optparse.OptionParser(usage='%prog CHANGELOG VERSION BUILDPATH')
+    options, args = parser.parse_args()
+    if len(args) != 3:
+        parser.error('Expected a version and a build directory')
+
+    changelog_file, version, build_path = args
+
+    with io.open(changelog_file, encoding='utf-8') as inf:
+        changelog = inf.read()
+
+    mobj = re.search(r'(?s)version %s\n{2}(.+?)\n{3}' % version, changelog)
+    body = mobj.group(1) if mobj else ''
+
+    releaser = GitHubReleaser()
+
+    new_release = releaser.create_release(
+        version, name='youtube-dl %s' % version, body=body)
+    release_id = new_release['id']
+
+    for asset in os.listdir(build_path):
+        compat_print('Uploading %s...' % asset)
+        releaser.create_asset(release_id, os.path.join(build_path, asset))
+
+
+if __name__ == '__main__':
+    main()
diff --git a/devscripts/fish-completion.py b/devscripts/fish-completion.py

index 41629d87d006fbaf4ba90cbb87bf60388fb7f7e5..51d19dd33d3bf5c05fc86f3c63e23c00871fda90 100755 (executable)
--- a/devscripts/fish-completion.py
+++ b/devscripts/fish-completion.py
@@ -44,5 +44,6 @@ def build_completion(opt_parser):
      with open(FISH_COMPLETION_FILE, 'w') as f:
          f.write(filled_template)
  
+
  parser = youtube_dl.parseOpts()[0]
  build_completion(parser)
diff --git a/devscripts/generate_aes_testdata.py b/devscripts/generate_aes_testdata.py

index 2e389fc8e742e26b0985f3492835ccb6790cef3e..e3df42cc2da6c99d9104c9bd2bac776af5a61c46 100644 (file)
--- a/devscripts/generate_aes_testdata.py
+++ b/devscripts/generate_aes_testdata.py
@@ -23,6 +23,7 @@ def openssl_encode(algo, key, iv):
      out, _ = prog.communicate(secret_msg)
      return out
  
+
  iv = key = [0x20, 0x15] + 14 * [0]
  
  r = openssl_encode('aes-128-cbc', key, iv)
diff --git a/devscripts/gh-pages/generate-download.py b/devscripts/gh-pages/generate-download.py

index 392e3ba21ab86070f2df164362fbd177b750c726..fcd7e1dff663f1607c2842081b40dc1890cf677a 100755 (executable)
--- a/devscripts/gh-pages/generate-download.py
+++ b/devscripts/gh-pages/generate-download.py
@@ -15,13 +15,9 @@ data = urllib.request.urlopen(URL).read()
  with open('download.html.in', 'r', encoding='utf-8') as tmplf:
      template = tmplf.read()
  
-md5sum = hashlib.md5(data).hexdigest()
-sha1sum = hashlib.sha1(data).hexdigest()
  sha256sum = hashlib.sha256(data).hexdigest()
  template = template.replace('@PROGRAM_VERSION@', version)
  template = template.replace('@PROGRAM_URL@', URL)
-template = template.replace('@PROGRAM_MD5SUM@', md5sum)
-template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
  template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
  template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0])
  template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1])
diff --git a/devscripts/gh-pages/update-sites.py b/devscripts/gh-pages/update-sites.py

index 503c1372fd3589f45a207d043999a5286f6c5e1e..531c93c7089c1847a7e9018fcda5ca177f68547e 100755 (executable)
--- a/devscripts/gh-pages/update-sites.py
+++ b/devscripts/gh-pages/update-sites.py
@@ -32,5 +32,6 @@ def main():
      with open('supportedsites.html', 'w', encoding='utf-8') as sitesf:
          sitesf.write(template)
  
+
  if __name__ == '__main__':
      main()
diff --git a/devscripts/lazy_load_template.py b/devscripts/lazy_load_template.py

new file mode 100644 (file)

index 0000000..c4e5fc1
--- /dev/null
+++ b/devscripts/lazy_load_template.py
@@ -0,0 +1,19 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+
+class LazyLoadExtractor(object):
+    _module = None
+
+    @classmethod
+    def ie_key(cls):
+        return cls.__name__[:-2]
+
+    def __new__(cls, *args, **kwargs):
+        mod = __import__(cls._module, fromlist=(cls.__name__,))
+        real_cls = getattr(mod, cls.__name__)
+        instance = real_cls.__new__(real_cls)
+        instance.__init__(*args, **kwargs)
+        return instance
diff --git a/devscripts/make_contributing.py b/devscripts/make_contributing.py

index 5e454a429e46eeb108612690ae2b523ee98f30d5..226d1a5d6644953982db6346a00a21ec45f9b089 100755 (executable)
--- a/devscripts/make_contributing.py
+++ b/devscripts/make_contributing.py
@@ -28,5 +28,6 @@ def main():
      with io.open(outfile, 'w', encoding='utf-8') as outf:
          outf.write(out)
  
+
  if __name__ == '__main__':
      main()
diff --git a/devscripts/make_lazy_extractors.py b/devscripts/make_lazy_extractors.py

new file mode 100644 (file)

index 0000000..19114d3
--- /dev/null
+++ b/devscripts/make_lazy_extractors.py
@@ -0,0 +1,99 @@
+from __future__ import unicode_literals, print_function
+
+from inspect import getsource
+import os
+from os.path import dirname as dirn
+import sys
+
+print('WARNING: Lazy loading extractors is an experimental feature that may not always work', file=sys.stderr)
+
+sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
+
+lazy_extractors_filename = sys.argv[1]
+if os.path.exists(lazy_extractors_filename):
+    os.remove(lazy_extractors_filename)
+
+from youtube_dl.extractor import _ALL_CLASSES
+from youtube_dl.extractor.common import InfoExtractor, SearchInfoExtractor
+
+with open('devscripts/lazy_load_template.py', 'rt') as f:
+    module_template = f.read()
+
+module_contents = [
+    module_template + '\n' + getsource(InfoExtractor.suitable) + '\n',
+    'class LazyLoadSearchExtractor(LazyLoadExtractor):\n    pass\n']
+
+ie_template = '''
+class {name}({bases}):
+    _VALID_URL = {valid_url!r}
+    _module = '{module}'
+'''
+
+make_valid_template = '''
+    @classmethod
+    def _make_valid_url(cls):
+        return {valid_url!r}
+'''
+
+
+def get_base_name(base):
+    if base is InfoExtractor:
+        return 'LazyLoadExtractor'
+    elif base is SearchInfoExtractor:
+        return 'LazyLoadSearchExtractor'
+    else:
+        return base.__name__
+
+
+def build_lazy_ie(ie, name):
+    valid_url = getattr(ie, '_VALID_URL', None)
+    s = ie_template.format(
+        name=name,
+        bases=', '.join(map(get_base_name, ie.__bases__)),
+        valid_url=valid_url,
+        module=ie.__module__)
+    if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
+        s += '\n' + getsource(ie.suitable)
+    if hasattr(ie, '_make_valid_url'):
+        # search extractors
+        s += make_valid_template.format(valid_url=ie._make_valid_url())
+    return s
+
+
+# find the correct sorting and add the required base classes so that sublcasses
+# can be correctly created
+classes = _ALL_CLASSES[:-1]
+ordered_cls = []
+while classes:
+    for c in classes[:]:
+        bases = set(c.__bases__) - set((object, InfoExtractor, SearchInfoExtractor))
+        stop = False
+        for b in bases:
+            if b not in classes and b not in ordered_cls:
+                if b.__name__ == 'GenericIE':
+                    exit()
+                classes.insert(0, b)
+                stop = True
+        if stop:
+            break
+        if all(b in ordered_cls for b in bases):
+            ordered_cls.append(c)
+            classes.remove(c)
+            break
+ordered_cls.append(_ALL_CLASSES[-1])
+
+names = []
+for ie in ordered_cls:
+    name = ie.__name__
+    src = build_lazy_ie(ie, name)
+    module_contents.append(src)
+    if ie in _ALL_CLASSES:
+        names.append(name)
+
+module_contents.append(
+    '_ALL_CLASSES = [{0}]'.format(', '.join(names)))
+
+module_src = '\n'.join(module_contents) + '\n'
+
+with open(lazy_extractors_filename, 'wt') as f:
+    f.write(module_src)
diff --git a/devscripts/make_supportedsites.py b/devscripts/make_supportedsites.py

index 8cb4a46380253643e6df2370058c433094cf159b..764795bc5b1e560b033c2e9a0c395cecb10b1242 100644 (file)
--- a/devscripts/make_supportedsites.py
+++ b/devscripts/make_supportedsites.py
@@ -41,5 +41,6 @@ def main():
      with io.open(outfile, 'w', encoding='utf-8') as outf:
          outf.write(out)
  
+
  if __name__ == '__main__':
      main()
diff --git a/devscripts/prepare_manpage.py b/devscripts/prepare_manpage.py

index 776e6556e5b2bd683acbcf79d7bc07431be6548a..f9fe63f1ffd5073b312f22e8f08fb7798fa3f7a4 100644 (file)
--- a/devscripts/prepare_manpage.py
+++ b/devscripts/prepare_manpage.py
@@ -1,13 +1,46 @@
  from __future__ import unicode_literals
  
  import io
+import optparse
  import os.path
-import sys
  import re
  
  ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
  README_FILE = os.path.join(ROOT_DIR, 'README.md')
  
+PREFIX = '''%YOUTUBE-DL(1)
+
+# NAME
+
+youtube\-dl \- download videos from youtube.com or other video platforms
+
+# SYNOPSIS
+
+**youtube-dl** \[OPTIONS\] URL [URL...]
+
+'''
+
+
+def main():
+    parser = optparse.OptionParser(usage='%prog OUTFILE.md')
+    options, args = parser.parse_args()
+    if len(args) != 1:
+        parser.error('Expected an output filename')
+
+    outfile, = args
+
+    with io.open(README_FILE, encoding='utf-8') as f:
+        readme = f.read()
+
+    readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
+    readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
+    readme = PREFIX + readme
+
+    readme = filter_options(readme)
+
+    with io.open(outfile, 'w', encoding='utf-8') as outf:
+        outf.write(readme)
+
  
  def filter_options(readme):
      ret = ''
@@ -21,43 +54,26 @@ def filter_options(readme):
  
          if in_options:
              if line.lstrip().startswith('-'):
-                option, description = re.split(r'\s{2,}', line.lstrip())
-                split_option = option.split(' ')
-
-                if not split_option[-1].startswith('-'):  # metavar
-                    option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
-
-                # Pandoc's definition_lists. See http://pandoc.org/README.html
-                # for more information.
-                ret += '\n%s\n:   %s\n' % (option, description)
-            else:
-                ret += line.lstrip() + '\n'
+                split = re.split(r'\s{2,}', line.lstrip())
+                # Description string may start with `-` as well. If there is
+                # only one piece then it's a description bit not an option.
+                if len(split) > 1:
+                    option, description = split
+                    split_option = option.split(' ')
+
+                    if not split_option[-1].startswith('-'):  # metavar
+                        option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
+
+                    # Pandoc's definition_lists. See http://pandoc.org/README.html
+                    # for more information.
+                    ret += '\n%s\n:   %s\n' % (option, description)
+                    continue
+            ret += line.lstrip() + '\n'
          else:
              ret += line + '\n'
  
      return ret
  
-with io.open(README_FILE, encoding='utf-8') as f:
-    readme = f.read()
-
-PREFIX = '''%YOUTUBE-DL(1)
-
-# NAME
-
-youtube\-dl \- download videos from youtube.com or other video platforms
-
-# SYNOPSIS
-
-**youtube-dl** \[OPTIONS\] URL [URL...]
-
-'''
-readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
-readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
-readme = PREFIX + readme
-
-readme = filter_options(readme)
  
-if sys.version_info < (3, 0):
-    print(readme.encode('utf-8'))
-else:
-    print(readme)
+if __name__ == '__main__':
+    main()
diff --git a/devscripts/release.sh b/devscripts/release.sh

index 6718ce39b965ea0cf00f91c2d97034a96beedfa8..4db5def5d8534ef73664fc90d00433d90d363bbc 100755 (executable)
--- a/devscripts/release.sh
+++ b/devscripts/release.sh
@@ -6,7 +6,7 @@
  # * the git config user.signingkey is properly set
  
  # You will need
-# pip install coverage nose rsa
+# pip install coverage nose rsa wheel
  
  # TODO
  # release notes
@@ -15,10 +15,33 @@
  set -e
  
  skip_tests=true
-if [ "$1" = '--run-tests' ]; then
-    skip_tests=false
-    shift
-fi
+gpg_sign_commits=""
+buildserver='localhost:8142'
+
+while true
+do
+case "$1" in
+    --run-tests)
+        skip_tests=false
+        shift
+    ;;
+    --gpg-sign-commits|-S)
+        gpg_sign_commits="-S"
+        shift
+    ;;
+    --buildserver)
+        buildserver="$2"
+        shift 2
+    ;;
+    --*)
+        echo "ERROR: unknown option $1"
+        exit 1
+    ;;
+    *)
+        break
+    ;;
+esac
+done
  
  if [ -z "$1" ]; then echo "ERROR: specify version number like this: $0 1994.09.06"; exit 1; fi
  version="$1"
@@ -33,6 +56,12 @@ if [ ! -z "`git status --porcelain | grep -v CHANGELOG`" ]; then echo 'ERROR: th
  useless_files=$(find youtube_dl -type f -not -name '*.py')
  if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in youtube_dl: $useless_files"; exit 1; fi
  if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi
+if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; exit 1; fi
+if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
+if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
+
+read -p "Is ChangeLog up to date? (y/n) " -n 1
+if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
  
  /bin/echo -e "\n### First of all, testing..."
  make clean
@@ -45,10 +74,13 @@ fi
  /bin/echo -e "\n### Changing version in version.py..."
  sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
  
+/bin/echo -e "\n### Changing version in ChangeLog..."
+sed -i "s/<unreleased>/$version/" ChangeLog
+
  /bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
-make README.md CONTRIBUTING.md ISSUE_TEMPLATE.md supportedsites
-git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
-git commit -m "release $version"
+make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
+git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py ChangeLog
+git commit $gpg_sign_commits -m "release $version"
  
  /bin/echo -e "\n### Now tagging, signing and pushing..."
  git tag -s -m "Release $version" "$version"
@@ -64,7 +96,7 @@ git push origin "$version"
  REV=$(git rev-parse HEAD)
  make youtube-dl youtube-dl.tar.gz
  read -p "VM running? (y/n) " -n 1
-wget "http://localhost:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
+wget "http://$buildserver/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
  mkdir -p "build/$version"
  mv youtube-dl youtube-dl.exe "build/$version"
  mv youtube-dl.tar.gz "build/$version/youtube-dl-$version.tar.gz"
@@ -74,15 +106,16 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
  (cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS)
  (cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
  
-/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
+/bin/echo -e "\n### Signing and uploading the new binaries to GitHub..."
  for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
-scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
-ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
+
+ROOT=$(pwd)
+python devscripts/create-github-release.py ChangeLog $version "$ROOT/build/$version"
+
  ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
  
  /bin/echo -e "\n### Now switching to gh-pages..."
  git clone --branch gh-pages --single-branch . build/gh-pages
-ROOT=$(pwd)
  (
      set -e
      ORIGIN_URL=$(git config --get remote.origin.url)
@@ -94,7 +127,7 @@ ROOT=$(pwd)
      "$ROOT/devscripts/gh-pages/update-copyright.py"
      "$ROOT/devscripts/gh-pages/update-sites.py"
      git add *.html *.html.in update
-    git commit -m "release $version"
+    git commit $gpg_sign_commits -m "release $version"
      git push "$ROOT" gh-pages
      git push "$ORIGIN_URL" gh-pages
  )
diff --git a/devscripts/show-downloads-statistics.py b/devscripts/show-downloads-statistics.py

new file mode 100644 (file)

index 0000000..e25d284
--- /dev/null
+++ b/devscripts/show-downloads-statistics.py
@@ -0,0 +1,47 @@
+#!/usr/bin/env python
+from __future__ import unicode_literals
+
+import itertools
+import json
+import os
+import re
+import sys
+
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from youtube_dl.compat import (
+    compat_print,
+    compat_urllib_request,
+)
+from youtube_dl.utils import format_bytes
+
+
+def format_size(bytes):
+    return '%s (%d bytes)' % (format_bytes(bytes), bytes)
+
+
+total_bytes = 0
+
+for page in itertools.count(1):
+    releases = json.loads(compat_urllib_request.urlopen(
+        'https://api.github.com/repos/rg3/youtube-dl/releases?page=%s' % page
+    ).read().decode('utf-8'))
+
+    if not releases:
+        break
+
+    for release in releases:
+        compat_print(release['name'])
+        for asset in release['assets']:
+            asset_name = asset['name']
+            total_bytes += asset['download_count'] * asset['size']
+            if all(not re.match(p, asset_name) for p in (
+                    r'^youtube-dl$',
+                    r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
+                    r'^youtube-dl\.exe$')):
+                continue
+            compat_print(
+                ' %s size: %s downloads: %d'
+                % (asset_name, format_size(asset['size']), asset['download_count']))
+
+compat_print('total downloads traffic: %s' % format_size(total_bytes))
diff --git a/devscripts/zsh-completion.py b/devscripts/zsh-completion.py

index 04728e8e2ce763ca886853061875c59e4f645921..60aaf76cc3297adc6e80984890e33e4267b95c2b 100755 (executable)
--- a/devscripts/zsh-completion.py
+++ b/devscripts/zsh-completion.py
@@ -44,5 +44,6 @@ def build_completion(opt_parser):
      with open(ZSH_COMPLETION_FILE, "w") as f:
          f.write(template)
  
+
  parser = youtube_dl.parseOpts()[0]
  build_completion(parser)
diff --git a/docs/conf.py b/docs/conf.py

index 594ca61a6bf984d173620a3e95eaca28b22cda5a..0aaf1b8fcf8220301d63250e83cb1587b618388c 100644 (file)
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  #
  # youtube-dl documentation build configuration file, created by
  # sphinx-quickstart on Fri Mar 14 21:05:43 2014.
diff --git a/docs/supportedsites.md b/docs/supportedsites.md

index 00b8c247cd606dc6e2522220a455411e6c3484be..77832504a885a06c1f3a2dd459a2594d27aea470 100644 (file)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -6,15 +6,23 @@
   - **22tracks:genre**
   - **22tracks:track**
   - **24video**
+ - **3qsdn**: 3Q SDN
   - **3sat**
   - **4tube**
   - **56.com**
   - **5min**
   - **8tracks**
   - **91porn**
+ - **9c9media**
+ - **9c9media:stack**
   - **9gag**
+ - **9now.com.au**
   - **abc.net.au**
- - **Abc7News**
+ - **abc.net.au:iview**
+ - **abcnews**
+ - **abcnews:video**
+ - **abcotvs**: ABC Owned Television Stations
+ - **abcotvs:clips**
   - **AcademicEarth:Course**
   - **acast**
   - **acast:channel**
@@ -25,11 +33,13 @@
   - **AdobeTVVideo**
   - **AdultSwim**
   - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- - **Aftonbladet**
+ - **AfreecaTV**: afreecatv.com
   - **AirMozilla**
   - **AlJazeera**
   - **Allocine**
   - **AlphaPorno**
+ - **AMCNetworks**
+ - **anderetijden**: npo.nl and ntr.nl
   - **AnimeOnDemand**
   - **anitube.se**
   - **AnySex**
@@ -40,8 +50,8 @@
   - **appletrailers:section**
   - **archive.org**: archive.org videos
   - **ARD**
- - **ARD:mediathek**: Saarländischer Rundfunk
   - **ARD:mediathek**
+ - **Arkena**
   - **arte.tv**
   - **arte.tv:+7**
   - **arte.tv:cinema**
@@ -50,13 +60,20 @@
   - **arte.tv:ddc**
   - **arte.tv:embed**
   - **arte.tv:future**
+ - **arte.tv:info**
   - **arte.tv:magazine**
+ - **arte.tv:playlist**
   - **AtresPlayer**
   - **ATTTechChannel**
   - **AudiMedia**
   - **AudioBoom**
   - **audiomack**
   - **audiomack:album**
+ - **auroravid**: AuroraVid
+ - **AWAAN**
+ - **awaan:live**
+ - **awaan:season**
+ - **awaan:video**
   - **Azubu**
   - **AzubuLive**
   - **BaiduVideo**: 百度视频
@@ -67,14 +84,18 @@
   - **bbc**: BBC
   - **bbc.co.uk**: BBC iPlayer
   - **bbc.co.uk:article**: BBC articles
- - **BeatportPro**
+ - **bbc.co.uk:iplayer:playlist**
+ - **bbc.co.uk:playlist**
+ - **Beatport**
   - **Beeg**
   - **BehindKink**
+ - **BellMedia**
   - **Bet**
   - **Bigflix**
   - **Bild**: Bild.de
   - **BiliBili**
   - **BioBioChileTV**
+ - **BIQLE**
   - **BleacherReport**
   - **BleacherReportCMS**
   - **blinkx**
@@ -90,55 +111,73 @@
   - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
   - **BuzzFeed**
   - **BYUtv**
+ - **BYUtvEvent**
   - **Camdemy**
   - **CamdemyFolder**
+ - **CamWithHer**
   - **canalc2.tv**
   - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
   - **Canvas**
- - **CBC**
- - **CBCPlayer**
+ - **CarambaTV**
+ - **CarambaTVPage**
+ - **CartoonNetwork**
+ - **cbc.ca**
+ - **cbc.ca:player**
+ - **cbc.ca:watch**
+ - **cbc.ca:watch:video**
   - **CBS**
- - **CBSNews**: CBS News
- - **CBSNewsLiveVideo**: CBS News Live Videos
+ - **CBSInteractive**
+ - **CBSLocal**
+ - **cbsnews**: CBS News
+ - **cbsnews:livevideo**: CBS News Live Videos
   - **CBSSports**
+ - **CCTV**
   - **CDA**
   - **CeskaTelevize**
   - **channel9**: Channel 9
+ - **CharlieRose**
   - **Chaturbate**
   - **Chilloutzone**
   - **chirbit**
   - **chirbit:profile**
   - **Cinchcast**
- - **Cinemassacre**
   - **Clipfish**
   - **cliphunter**
+ - **ClipRs**
   - **Clipsyndicate**
+ - **CloserToTruth**
   - **cloudtime**: CloudTime
   - **Cloudy**
   - **Clubic**
   - **Clyp**
   - **cmt.com**
- - **CNET**
+ - **CNBC**
   - **CNN**
   - **CNNArticle**
   - **CNNBlogs**
- - **CollegeHumor**
   - **CollegeRama**
   - **ComCarCoff**
   - **ComedyCentral**
- - **ComedyCentralShows**: The Daily Show / The Colbert Report
+ - **ComedyCentralShortname**
+ - **ComedyCentralTV**
   - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
+ - **Coub**
   - **Cracked**
   - **Crackle**
   - **Criterion**
   - **CrooksAndLiars**
   - **Crunchyroll**
   - **crunchyroll:playlist**
+ - **CSNNE**
   - **CSpan**: C-SPAN
   - **CtsNews**: 華視新聞
+ - **CTVNews**
   - **culturebox.francetvinfo.fr**
   - **CultureUnplugged**
+ - **curiositystream**
+ - **curiositystream:collection**
   - **CWTV**
+ - **DailyMail**
   - **dailymotion**
   - **dailymotion:playlist**
   - **dailymotion:user**
@@ -148,17 +187,15 @@
   - **daum.net:playlist**
   - **daum.net:user**
   - **DBTV**
- - **DCN**
- - **dcn:live**
- - **dcn:season**
- - **dcn:video**
   - **DctpTv**
   - **DeezerPlaylist**
   - **defense.gouv.fr**
   - **democracynow**
   - **DHM**: Filmarchiv - Deutsches Historisches Museum
+ - **DigitallySpeaking**
   - **Digiteka**
   - **Discovery**
+ - **DiscoveryGo**
   - **Dotsub**
   - **DouyuTV**: 斗鱼
   - **DPlay**
@@ -168,7 +205,6 @@
   - **Dropbox**
   - **DrTuber**
   - **DRTV**
- - **Dump**
   - **Dumpert**
   - **dvtv**: http://video.aktualne.cz/
   - **dw**
@@ -189,30 +225,37 @@
   - **EroProfile**
   - **Escapist**
   - **ESPN**
+ - **ESPNArticle**
   - **EsriVideo**
   - **Europa**
   - **EveryonesMixtape**
- - **exfm**: ex.fm
   - **ExpoTV**
   - **ExtremeTube**
+ - **EyedoTV**
   - **facebook**
+ - **FacebookPluginsVideo**
   - **faz.net**
   - **fc2**
+ - **fc2:embed**
   - **Fczenit**
   - **features.aol.com**
   - **fernsehkritik.tv**
   - **Firstpost**
   - **FiveTV**
   - **Flickr**
+ - **Flipagram**
   - **Folketinget**: Folketinget (ft.dk; Danish parliament)
   - **FootyRoom**
+ - **Formula1**
   - **FOX**
+ - **FOX9**
   - **Foxgay**
- - **FoxNews**: Fox News and Fox Business Video
+ - **foxnews**: Fox News and Fox Business Video
+ - **foxnews:article**
+ - **foxnews:insider**
   - **FoxSports**
   - **france2.fr:generation-quoi**
   - **FranceCulture**
- - **FranceCultureEmission**
   - **FranceInter**
   - **francetv**: France 2, 3, 4, 5 and Ô
   - **francetvinfo.fr**
@@ -221,14 +264,14 @@
   - **FreeVideo**
   - **Funimation**
   - **FunnyOrDie**
+ - **Fusion**
+ - **FXNetworks**
   - **GameInformer**
- - **Gamekings**
   - **GameOne**
   - **gameone:playlist**
   - **Gamersyde**
   - **GameSpot**
   - **GameStar**
- - **Gametrailers**
   - **Gazeta**
   - **GDCVault**
   - **generic**: Generic downloader that works on some sites
@@ -238,8 +281,9 @@
   - **Glide**: Glide mobile video messages (glide.me)
   - **Globo**
   - **GloboArticle**
+ - **Go**
   - **GodTube**
- - **GoldenMoustache**
+ - **GodTV**
   - **Golem**
   - **GoogleDrive**
   - **Goshgay**
@@ -247,12 +291,16 @@
   - **Groupon**
   - **Hark**
   - **HBO**
+ - **HBOEpisode**
   - **HearThisAt**
   - **Heise**
   - **HellPorno**
   - **Helsinki**: helsinki.fi
   - **HentaiStigma**
+ - **HGTV**
+ - **hgtv.com:show**
   - **HistoricFilms**
+ - **history:topic**: History.com Topic
   - **hitbox**
   - **hitbox:live**
   - **HornBunny**
@@ -260,6 +308,9 @@
   - **HotStar**
   - **Howcast**
   - **HowStuffWorks**
+ - **HRTi**
+ - **HRTiPlaylist**
+ - **Huajiao**: 花椒直播
   - **HuffPost**: Huffington Post
   - **Hypem**
   - **Iconosquare**
@@ -281,19 +332,23 @@
   - **ivi**: ivi.ru
   - **ivi:compilation**: ivi.ru compilations
   - **ivideon**: Ivideon TV
+ - **Iwara**
   - **Izlesene**
- - **JadoreCettePub**
+ - **Jamendo**
+ - **JamendoAlbum**
   - **JeuxVideo**
   - **Jove**
   - **jpopsuki.tv**
   - **JWPlatform**
   - **Kaltura**
+ - **Kamcord**
   - **KanalPlay**: Kanal 5/9/11 Play
   - **Kankan**
   - **Karaoketv**
   - **KarriereVideos**
   - **keek**
   - **KeezMovies**
+ - **Ketnet**
   - **KhanAcademy**
   - **KickStarter**
   - **KonserthusetPlay**
@@ -307,23 +362,31 @@
   - **kuwo:mv**: 酷我音乐 - MV
   - **kuwo:singer**: 酷我音乐 - 歌手
   - **kuwo:song**: 酷我音乐
- - **la7.tv**
+ - **la7.it**
   - **Laola1Tv**
+ - **LCI**
+ - **Lcp**
+ - **LcpPlay**
   - **Le**: 乐视网
+ - **Learnr**
   - **Lecture2Go**
+ - **LEGO**
   - **Lemonde**
   - **LePlaylist**
   - **LetvCloud**: 乐视云
   - **Libsyn**
+ - **life**: Life.ru
   - **life:embed**
- - **lifenews**: LIFE | NEWS
   - **limelight**
   - **limelight:channel**
   - **limelight:channel_list**
+ - **LiTV**
   - **LiveLeak**
   - **livestream**
   - **livestream:original**
   - **LnkGo**
+ - **loc**: Library of Congress
+ - **LocalNews8**
   - **LoveHomePorn**
   - **lrt.lt**
   - **lynda**: lynda.com videos
@@ -333,42 +396,52 @@
   - **mailru**: Видео@Mail.Ru
   - **MakersChannel**
   - **MakerTV**
- - **Malemotion**
+ - **mangomolo:live**
+ - **mangomolo:video**
   - **MatchTV**
   - **MDR**: MDR.DE and KiKA
   - **media.ccc.de**
+ - **META**
   - **metacafe**
   - **Metacritic**
   - **Mgoon**
+ - **MGTV**: 芒果TV
+ - **MiaoPai**
   - **Minhateca**
   - **MinistryGrid**
   - **Minoto**
   - **miomio.tv**
   - **MiTele**: mitele.es
   - **mixcloud**
+ - **mixcloud:playlist**
+ - **mixcloud:stream**
+ - **mixcloud:user**
   - **MLB**
   - **Mnet**
   - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
   - **Mofosex**
   - **Mojvideo**
   - **Moniker**: allmyvideos.net and vidspot.net
- - **mooshare**: Mooshare.biz
   - **Morningstar**: morningstar.com
   - **Motherless**
   - **Motorsport**: motorsport.com
   - **MovieClips**
   - **MovieFap**
   - **Moviezine**
+ - **MovingImage**
   - **MPORA**
- - **MSNBC**
- - **MTV**
+ - **MSN**
+ - **mtg**: MTG services
+ - **mtv**
   - **mtv.de**
- - **mtviggy.com**
+ - **mtv:video**
   - **mtvservices:embedded**
   - **MuenchenTV**: münchen.tv
   - **MusicPlayOn**
- - **muzu.tv**
+ - **mva**: Microsoft Virtual Academy videos
+ - **mva:course**: Microsoft Virtual Academy courses
   - **Mwave**
+ - **MwaveMeetGreet**
   - **MySpace**
   - **MySpace:album**
   - **MySpass**
@@ -376,11 +449,14 @@
   - **myvideo** (Currently broken)
   - **MyVidster**
   - **n-tv.de**
- - **NationalGeographic**
+ - **natgeo**
+ - **natgeo:episodeguide**
+ - **natgeo:video**
   - **Naver**
   - **NBA**
   - **NBC**
   - **NBCNews**
+ - **NBCOlympics**
   - **NBCSports**
   - **NBCSportsVPlayer**
   - **ndr**: NDR.de - Norddeutscher Rundfunk
@@ -388,7 +464,6 @@
   - **ndr:embed:base**
   - **NDTV**
   - **NerdCubedFeed**
- - **Nerdist**
   - **netease:album**: 网易云音乐 - 专辑
   - **netease:djradio**: 网易云音乐 - 电台
   - **netease:mv**: 网易云音乐 - MV
@@ -401,22 +476,26 @@
   - **Newstube**
   - **NextMedia**: 蘋果日報
   - **NextMediaActionNews**: 蘋果日報 - 動新聞
- - **nextmovie.com**
   - **nfb**: National Film Board of Canada
   - **nfl.com**
+ - **NhkVod**
   - **nhl.com**
   - **nhl.com:news**: NHL news
- - **nhl.com:videocenter**: NHL videocenter category
+ - **nhl.com:videocenter**
+ - **nhl.com:videocenter:category**: NHL videocenter category
   - **nick.com**
+ - **nick.de**
+ - **nicknight**
   - **niconico**: ニコニコ動画
   - **NiconicoPlaylist**
+ - **Nintendo**
   - **njoy**: N-JOY
   - **njoy:embed**
+ - **NobelPrize**
   - **Noco**
   - **Normalboots**
   - **NosVideo**
   - **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
- - **novamov**: NovaMov
   - **nowness**
   - **nowness:playlist**
   - **nowness:series**
@@ -437,10 +516,14 @@
   - **Nuvid**
   - **NYTimes**
   - **NYTimesArticle**
+ - **NZZ**
   - **ocw.mit.edu**
+ - **OdaTV**
   - **Odnoklassniki**
   - **OktoberfestTV**
   - **on.aol.com**
+ - **onet.tv**
+ - **onet.tv:channel**
   - **OnionStudios**
   - **Ooyala**
   - **OoyalaExternal**
@@ -450,20 +533,21 @@
   - **orf:iptv**: iptv.ORF.at
   - **orf:oe1**: Radio Österreich 1
   - **orf:tvthek**: ORF TVthek
+ - **PandaTV**: 熊猫TV
   - **pandora.tv**: 판도라TV
   - **parliamentlive.tv**: UK parliament videos
   - **Patreon**
   - **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET  (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
   - **pcmag**
- - **Periscope**: Periscope
+ - **People**
+ - **periscope**: Periscope
+ - **periscope:user**: Periscope user videos
   - **PhilharmonieDeParis**: Philharmonie de Paris
   - **phoenix.de**
   - **Photobucket**
   - **Pinkbike**
   - **Pladform**
- - **PlanetaPlay**
   - **play.fm**
- - **played.to**
   - **PlaysTV**
   - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
   - **Playvid**
@@ -473,13 +557,18 @@
   - **plus.google**: Google Plus
   - **pluzz.francetv.fr**
   - **podomatic**
+ - **Pokemon**
+ - **PolskieRadio**
+ - **PolskieRadioCategory**
+ - **PornCom**
   - **PornHd**
- - **PornHub**
+ - **PornHub**: PornHub and Thumbzilla
   - **PornHubPlaylist**
   - **PornHubUserVideos**
   - **Pornotube**
   - **PornoVoisines**
   - **PornoXO**
+ - **PressTV**
   - **PrimeShareTV**
   - **PromptFile**
   - **prosiebensat1**: ProSiebenSat.1 Digital
@@ -490,10 +579,12 @@
   - **qqmusic:playlist**: QQ音乐 - 歌单
   - **qqmusic:singer**: QQ音乐 - 歌手
   - **qqmusic:toplist**: QQ音乐 - 排行榜
- - **QuickVid**
   - **R7**
+ - **R7Article**
   - **radio.de**
   - **radiobremen**
+ - **radiocanada**
+ - **RadioCanadaAudioVideo**
   - **radiofrance**
   - **RadioJavan**
   - **Rai**
@@ -502,13 +593,21 @@
   - **RDS**: RDS.ca
   - **RedTube**
   - **RegioTV**
+ - **RENTV**
+ - **RENTVArticle**
   - **Restudy**
+ - **Reuters**
   - **ReverbNation**
- - **Revision3**
+ - **revision**
+ - **revision3:embed**
   - **RICE**
   - **RingTV**
+ - **RMCDecouverte**
+ - **RockstarGames**
+ - **RoosterTeeth**
   - **RottenTomatoes**
   - **Roxwel**
+ - **Rozhlas**
   - **RTBF**
   - **rte**: Raidió Teilifís Éireann TV
   - **rte:radio**: Raidió Teilifís Éireann radio
@@ -519,7 +618,9 @@
   - **rtve.es:alacarta**: RTVE a la carta
   - **rtve.es:infantil**: RTVE infantil
   - **rtve.es:live**: RTVE.es live streams
+ - **rtve.es:television**
   - **RTVNH**
+ - **Rudo**
   - **RUHD**
   - **RulePorn**
   - **rutube**: Rutube videos
@@ -543,26 +644,28 @@
   - **ScreencastOMatic**
   - **ScreenJunkies**
   - **ScreenwaveMedia**
+ - **Seeker**
   - **SenateISVP**
+ - **SendtoNews**
   - **ServingSys**
   - **Sexu**
- - **SexyKarma**: Sexy Karma and Watch Indian Porn
   - **Shahid**
- - **Shared**: shared.sx and vivo.sx
+ - **Shared**: shared.sx
   - **ShareSix**
   - **Sina**
+ - **SixPlay**
+ - **skynewsarabia:article**
   - **skynewsarabia:video**
- - **skynewsarabia:video**
+ - **SkySports**
   - **Slideshare**
   - **Slutload**
   - **smotri**: Smotri.com
   - **smotri:broadcast**: Smotri.com broadcasts
   - **smotri:community**: Smotri.com community videos
   - **smotri:user**: Smotri.com user videos
- - **SnagFilms**
- - **SnagFilmsEmbed**
   - **Snotr**
   - **Sohu**
+ - **SonyLIV**
   - **soundcloud**
   - **soundcloud:playlist**
   - **soundcloud:search**: Soundcloud search
@@ -586,12 +689,13 @@
   - **SportBoxEmbed**
   - **SportDeutschland**
   - **Sportschau**
+ - **sr:mediathek**: Saarländischer Rundfunk
   - **SRGSSR**
   - **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
- - **SSA**
   - **stanfordoc**: Stanford Open ClassRoom
   - **Steam**
   - **Stitcher**
+ - **Streamable**
   - **streamcloud.eu**
   - **StreamCZ**
   - **StreetVoice**
@@ -601,9 +705,12 @@
   - **SWRMediathek**
   - **Syfy**
   - **SztvHu**
+ - **t-online.de**
   - **Tagesschau**
- - **Tapely**
+ - **tagesschau:player**
   - **Tass**
+ - **TBS**
+ - **TDSLifeway**
   - **teachertube**: teachertube.com videos
   - **teachertube:user:collection**: teachertube.com user and collection videos
   - **TeachingChannel**
@@ -617,20 +724,22 @@
   - **Telecinco**: telecinco.es, cuatro.com and mediaset.es
   - **Telegraaf**
   - **TeleMB**
+ - **TeleQuebec**
   - **TeleTask**
- - **TenPlay**
+ - **Telewebion**
   - **TF1**
+ - **TFO**
   - **TheIntercept**
- - **TheOnion**
+ - **theoperaplatform**
   - **ThePlatform**
   - **ThePlatformFeed**
   - **TheScene**
   - **TheSixtyOne**
   - **TheStar**
+ - **TheWeatherChannel**
   - **ThisAmericanLife**
   - **ThisAV**
- - **THVideo**
- - **THVideoPlaylist**
+ - **ThisOldHouse**
   - **tinypic**: tinypic.com videos
   - **tlc.de**
   - **TMZ**
@@ -638,13 +747,13 @@
   - **TNAFlix**
   - **TNAFlixNetworkEmbed**
   - **toggle**
+ - **Tosh**: Tosh.0
   - **tou.tv**
   - **Toypics**: Toypics user profile
   - **ToypicsUser**: Toypics user profile
   - **TrailerAddict** (Currently broken)
   - **Trilulilu**
- - **trollvids**
- - **TruTube**
+ - **TruTV**
   - **Tube8**
   - **TubiTv**
   - **tudou**
@@ -666,12 +775,13 @@
   - **TVCArticle**
   - **tvigle**: Интернет-телевидение Tvigle.ru
   - **tvland.com**
- - **tvp.pl**
- - **tvp.pl:Series**
- - **TVPlay**: TV3Play and related services
+ - **TVNoe**
+ - **tvp**: Telewizja Polska
+ - **tvp:embed**: Telewizja Polska
+ - **tvp:series**
   - **Tweakers**
- - **twitch:bookmarks**
   - **twitch:chapter**
+ - **twitch:clips**
   - **twitch:past_broadcasts**
   - **twitch:profile**
   - **twitch:stream**
@@ -680,16 +790,21 @@
   - **twitter**
   - **twitter:amplify**
   - **twitter:card**
- - **Ubu**
   - **udemy**
   - **udemy:course**
   - **UDNEmbed**: 聯合影音
   - **Unistra**
+ - **uol.com.br**
+ - **uplynk**
+ - **uplynk:preplay**
   - **Urort**: NRK P3 Urørt
+ - **URPlay**
+ - **USANetwork**
   - **USAToday**
   - **ustream**
   - **ustream:channel**
- - **Ustudio**
+ - **ustudio**
+ - **ustudio:embed**
   - **Varzesh3**
   - **Vbox7**
   - **VeeHD**
@@ -697,10 +812,14 @@
   - **Vessel**
   - **Vesti**: Вести.Ru
   - **Vevo**
+ - **VevoPlaylist**
   - **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
   - **vh1.com**
+ - **Viafree**
   - **Vice**
+ - **Viceland**
   - **ViceShow**
+ - **Vidbit**
   - **Viddler**
   - **video.google:search**: Google Video search
   - **video.mit.edu**
@@ -713,12 +832,15 @@
   - **VideoPremium**
   - **VideoTt**: video.tt - Your True Tube (Currently broken)
   - **videoweed**: VideoWeed
+ - **Vidio**
   - **vidme**
   - **vidme:user**
   - **vidme:user:likes**
   - **Vidzi**
   - **vier**
   - **vier:videos**
+ - **ViewLift**
+ - **ViewLiftEmbed**
   - **Viewster**
   - **Viidea**
   - **viki**
@@ -735,40 +857,49 @@
   - **Vimple**: Vimple - one-click video hosting
   - **Vine**
   - **vine:user**
+ - **Vivo**: vivo.sx
   - **vk**: VK
   - **vk:uservideos**: VK - User's Videos
+ - **vk:wallpost**
   - **vlive**
   - **Vodlocker**
+ - **VODPlatform**
   - **VoiceRepublic**
+ - **VoxMedia**
   - **Vporn**
   - **vpro**: npo.nl and ntr.nl
   - **VRT**
   - **vube**: Vube.com
   - **VuClip**
- - **vulture.com**
+ - **VyboryMos**
+ - **Vzaar**
   - **Walla**
- - **WashingtonPost**
+ - **washingtonpost**
+ - **washingtonpost:article**
   - **wat.tv**
- - **WayOfTheMaster**
+ - **WatchIndianPorn**: Watch Indian Porn
   - **WDR**
   - **wdr:mobile**
- - **WDRMaus**: Sendung mit der Maus
   - **WebOfStories**
   - **WebOfStoriesPlaylist**
- - **Weibo**
   - **WeiqiTV**: WQTV
   - **wholecloud**: WholeCloud
   - **Wimp**
   - **Wistia**
- - **WNL**
+ - **wnl**: npo.nl and ntr.nl
   - **WorldStarHipHop**
   - **wrzuta.pl**
+ - **wrzuta.pl:playlist**
   - **WSJ**: Wall Street Journal
   - **XBef**
   - **XboxClips**
- - **XFileShare**: XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me
+ - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE
   - **XHamster**
   - **XHamsterEmbed**
+ - **xiami:album**: 虾米音乐 - 专辑
+ - **xiami:artist**: 虾米音乐 - 歌手
+ - **xiami:collection**: 虾米音乐 - 精选集
+ - **xiami:song**: 虾米音乐
   - **XMinus**
   - **XNXX**
   - **Xstream**
@@ -787,6 +918,7 @@
   - **Ynet**
   - **YouJizz**
   - **youku**: 优酷
+ - **youku:show**
   - **YouPorn**
   - **YourUpload**
   - **youtube**: YouTube.com
@@ -800,6 +932,7 @@
   - **youtube:search**: YouTube.com searches
   - **youtube:search:date**: YouTube.com searches, newest videos first
   - **youtube:search_url**: YouTube.com search URLs
+ - **youtube:shared**
   - **youtube:show**: YouTube.com (multi-season) shows
   - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
   - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
@@ -807,6 +940,4 @@
   - **Zapiks**
   - **ZDF**
   - **ZDFChannel**
- - **zingmp3:album**: mp3.zing.vn albums
- - **zingmp3:song**: mp3.zing.vn songs
- - **ZippCast**
+ - **zingmp3**: mp3.zing.vn
diff --git a/setup.cfg b/setup.cfg

index 5760112d4564bb4fe8389b9c134ef6ef406a81b3..2dc06ffe413f76f4d776fe44780f327a170d7801 100644 (file)
--- a/setup.cfg
+++ b/setup.cfg
@@ -2,5 +2,5 @@
  universal = True
  
  [flake8]
-exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/make_issue_template.py,setup.py,build,.git
+exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git
  ignore = E402,E501,E731
diff --git a/setup.py b/setup.py

index bfe931f5b42a506ca3cceb1a5ec4acdb6a6a4813..ce6dd1870bc52951d268f96aa4dc68ea6f92e04d 100644 (file)
--- a/setup.py
+++ b/setup.py
@@ -1,5 +1,5 @@
  #!/usr/bin/env python
-# -*- coding: utf-8 -*-
+# coding: utf-8
  
  from __future__ import print_function
  
@@ -8,11 +8,12 @@ import warnings
  import sys
  
  try:
-    from setuptools import setup
+    from setuptools import setup, Command
      setuptools_available = True
  except ImportError:
-    from distutils.core import setup
+    from distutils.core import setup, Command
      setuptools_available = False
+from distutils.spawn import spawn
  
  try:
      # This will create an exe that needs Microsoft Visual C++ 2008
@@ -20,25 +21,37 @@ try:
      import py2exe
  except ImportError:
      if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
-        print("Cannot import py2exe", file=sys.stderr)
+        print('Cannot import py2exe', file=sys.stderr)
          exit(1)
  
  py2exe_options = {
-    "bundle_files": 1,
-    "compressed": 1,
-    "optimize": 2,
-    "dist_dir": '.',
-    "dll_excludes": ['w9xpopen.exe', 'crypt32.dll'],
+    'bundle_files': 1,
+    'compressed': 1,
+    'optimize': 2,
+    'dist_dir': '.',
+    'dll_excludes': ['w9xpopen.exe', 'crypt32.dll'],
  }
  
+# Get the version from youtube_dl/version.py without importing the package
+exec(compile(open('youtube_dl/version.py').read(),
+             'youtube_dl/version.py', 'exec'))
+
+DESCRIPTION = 'YouTube video downloader'
+LONG_DESCRIPTION = 'Command-line program to download videos from YouTube.com and other video sites'
+
  py2exe_console = [{
-    "script": "./youtube_dl/__main__.py",
-    "dest_base": "youtube-dl",
+    'script': './youtube_dl/__main__.py',
+    'dest_base': 'youtube-dl',
+    'version': __version__,
+    'description': DESCRIPTION,
+    'comments': LONG_DESCRIPTION,
+    'product_name': 'youtube-dl',
+    'product_version': __version__,
  }]
  
  py2exe_params = {
      'console': py2exe_console,
-    'options': {"py2exe": py2exe_options},
+    'options': {'py2exe': py2exe_options},
      'zipfile': None
  }
  
@@ -70,16 +83,27 @@ else:
      else:
          params['scripts'] = ['bin/youtube-dl']
  
-# Get the version from youtube_dl/version.py without importing the package
-exec(compile(open('youtube_dl/version.py').read(),
-             'youtube_dl/version.py', 'exec'))
+class build_lazy_extractors(Command):
+    description = 'Build the extractor lazy loading module'
+    user_options = []
+
+    def initialize_options(self):
+        pass
+
+    def finalize_options(self):
+        pass
+
+    def run(self):
+        spawn(
+            [sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'],
+            dry_run=self.dry_run,
+        )
  
  setup(
      name='youtube_dl',
      version=__version__,
-    description='YouTube video downloader',
-    long_description='Small command-line program to download videos from'
-    ' YouTube.com and other video sites.',
+    description=DESCRIPTION,
+    long_description=LONG_DESCRIPTION,
      url='https://github.com/rg3/youtube-dl',
      author='Ricardo Garcia',
      author_email='ytdl@yt-dl.org',
@@ -95,17 +119,19 @@ setup(
      # test_requires = ['nosetest'],
  
      classifiers=[
-        "Topic :: Multimedia :: Video",
-        "Development Status :: 5 - Production/Stable",
-        "Environment :: Console",
-        "License :: Public Domain",
-        "Programming Language :: Python :: 2.6",
-        "Programming Language :: Python :: 2.7",
-        "Programming Language :: Python :: 3",
-        "Programming Language :: Python :: 3.2",
-        "Programming Language :: Python :: 3.3",
-        "Programming Language :: Python :: 3.4",
+        'Topic :: Multimedia :: Video',
+        'Development Status :: 5 - Production/Stable',
+        'Environment :: Console',
+        'License :: Public Domain',
+        'Programming Language :: Python :: 2.6',
+        'Programming Language :: Python :: 2.7',
+        'Programming Language :: Python :: 3',
+        'Programming Language :: Python :: 3.2',
+        'Programming Language :: Python :: 3.3',
+        'Programming Language :: Python :: 3.4',
+        'Programming Language :: Python :: 3.5',
      ],
  
+    cmdclass={'build_lazy_extractors': build_lazy_extractors},
      **params
  )
diff --git a/test/helper.py b/test/helper.py

index f2d87821290095c1f9526f50db5d80ab31969d56..dfee217a9b8acb64e426c3ce8fc5c11a9c5a0121 100644 (file)
--- a/test/helper.py
+++ b/test/helper.py
@@ -24,8 +24,13 @@ from youtube_dl.utils import (
  def get_params(override=None):
      PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
                                     "parameters.json")
+    LOCAL_PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
+                                         "local_parameters.json")
      with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
          parameters = json.load(pf)
+    if os.path.exists(LOCAL_PARAMETERS_FILE):
+        with io.open(LOCAL_PARAMETERS_FILE, encoding='utf-8') as pf:
+            parameters.update(json.load(pf))
      if override:
          parameters.update(override)
      return parameters
@@ -143,6 +148,9 @@ def expect_value(self, got, expected, field):
              expect_value(self, item_got, item_expected, field)
      else:
          if isinstance(expected, compat_str) and expected.startswith('md5:'):
+            self.assertTrue(
+                isinstance(got, compat_str),
+                'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got)))
              got = 'md5:' + md5(got)
          elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
              self.assertTrue(
diff --git a/test/test_InfoExtractor.py b/test/test_InfoExtractor.py

index 938466a800122211ab0414d9aa9de831951e2903..437c7270ee6aeaa8eba588badfb3bf26d79ea37d 100644 (file)
--- a/test/test_InfoExtractor.py
+++ b/test/test_InfoExtractor.py
@@ -11,6 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
  from test.helper import FakeYDL
  from youtube_dl.extractor.common import InfoExtractor
  from youtube_dl.extractor import YoutubeIE, get_info_extractor
+from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
  
  
  class TestIE(InfoExtractor):
@@ -47,6 +48,9 @@ class TestInfoExtractor(unittest.TestCase):
          self.assertEqual(ie._og_search_property('foobar', html), 'Foo')
          self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar')
          self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar')
+        self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar')
+        self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True)
+        self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True)
  
      def test_html_search_meta(self):
          ie = self.ie
@@ -65,6 +69,21 @@ class TestInfoExtractor(unittest.TestCase):
          self.assertEqual(ie._html_search_meta('d', html), '4')
          self.assertEqual(ie._html_search_meta('e', html), '5')
          self.assertEqual(ie._html_search_meta('f', html), '6')
+        self.assertEqual(ie._html_search_meta(('a', 'b', 'c'), html), '1')
+        self.assertEqual(ie._html_search_meta(('c', 'b', 'a'), html), '3')
+        self.assertEqual(ie._html_search_meta(('z', 'x', 'c'), html), '3')
+        self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True)
+        self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True)
+
+    def test_download_json(self):
+        uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')
+        self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'})
+        uri = encode_data_uri(b'callback({"foo": "blah"})', 'application/javascript')
+        self.assertEqual(self.ie._download_json(uri, None, transform_source=strip_jsonp), {'foo': 'blah'})
+        uri = encode_data_uri(b'{"foo": invalid}', 'application/json')
+        self.assertRaises(ExtractorError, self.ie._download_json, uri, None)
+        self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
+
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_YoutubeDL.py b/test/test_YoutubeDL.py

index ca25025e23a1eb1fd82eed39a534072bffe293ac..8bf00bea9818f6b91fa7b91e89fcbbf8a6cc0dd3 100644 (file)
--- a/test/test_YoutubeDL.py
+++ b/test/test_YoutubeDL.py
@@ -335,6 +335,40 @@ class TestFormatSelection(unittest.TestCase):
              downloaded = ydl.downloaded_info_dicts[0]
              self.assertEqual(downloaded['format_id'], f1['format_id'])
  
+    def test_audio_only_extractor_format_selection(self):
+        # For extractors with incomplete formats (all formats are audio-only or
+        # video-only) best and worst should fallback to corresponding best/worst
+        # video-only or audio-only formats (as per
+        # https://github.com/rg3/youtube-dl/pull/5556)
+        formats = [
+            {'format_id': 'low', 'ext': 'mp3', 'preference': 1, 'vcodec': 'none', 'url': TEST_URL},
+            {'format_id': 'high', 'ext': 'mp3', 'preference': 2, 'vcodec': 'none', 'url': TEST_URL},
+        ]
+        info_dict = _make_result(formats)
+
+        ydl = YDL({'format': 'best'})
+        ydl.process_ie_result(info_dict.copy())
+        downloaded = ydl.downloaded_info_dicts[0]
+        self.assertEqual(downloaded['format_id'], 'high')
+
+        ydl = YDL({'format': 'worst'})
+        ydl.process_ie_result(info_dict.copy())
+        downloaded = ydl.downloaded_info_dicts[0]
+        self.assertEqual(downloaded['format_id'], 'low')
+
+    def test_format_not_available(self):
+        formats = [
+            {'format_id': 'regular', 'ext': 'mp4', 'height': 360, 'url': TEST_URL},
+            {'format_id': 'video', 'ext': 'mp4', 'height': 720, 'acodec': 'none', 'url': TEST_URL},
+        ]
+        info_dict = _make_result(formats)
+
+        # This must fail since complete video-audio format does not match filter
+        # and extractor does not provide incomplete only formats (i.e. only
+        # video-only or audio-only).
+        ydl = YDL({'format': 'best[height>360]'})
+        self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
+
      def test_invalid_format_specs(self):
          def assert_syntax_error(format_spec):
              ydl = YDL({'format': format_spec})
@@ -571,6 +605,7 @@ class TestYoutubeDL(unittest.TestCase):
              'extractor': 'TEST',
              'duration': 30,
              'filesize': 10 * 1024,
+            'playlist_id': '42',
          }
          second = {
              'id': '2',
@@ -580,6 +615,7 @@ class TestYoutubeDL(unittest.TestCase):
              'duration': 10,
              'description': 'foo',
              'filesize': 5 * 1024,
+            'playlist_id': '43',
          }
          videos = [first, second]
  
@@ -616,6 +652,10 @@ class TestYoutubeDL(unittest.TestCase):
          res = get_videos(f)
          self.assertEqual(res, ['1'])
  
+        f = match_filter_func('playlist_id = 42')
+        res = get_videos(f)
+        self.assertEqual(res, ['1'])
+
      def test_playlist_items_selection(self):
          entries = [{
              'id': compat_str(i),
diff --git a/test/test_aes.py b/test/test_aes.py

index 315a3f5ae6a597662d05f56e97672b4ff93aff10..54078a66d61ad49a05600e9efca48472194f0fa5 100644 (file)
--- a/test/test_aes.py
+++ b/test/test_aes.py
@@ -51,5 +51,6 @@ class TestAES(unittest.TestCase):
          decrypted = (aes_decrypt_text(encrypted, password, 32))
          self.assertEqual(decrypted, self.secret_msg)
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_all_urls.py b/test/test_all_urls.py

index f5af184e6e0a79ccc11a9a66c2a9f19434087108..cd1cd4b24708da589ae2b15c79801271cc71166b 100644 (file)
--- a/test/test_all_urls.py
+++ b/test/test_all_urls.py
@@ -6,6 +6,7 @@ from __future__ import unicode_literals
  import os
  import sys
  import unittest
+import collections
  sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
  
  
@@ -100,8 +101,6 @@ class TestAllURLsMatching(unittest.TestCase):
          self.assertMatch(':ytsubs', ['youtube:subscriptions'])
          self.assertMatch(':ytsubscriptions', ['youtube:subscriptions'])
          self.assertMatch(':ythistory', ['youtube:history'])
-        self.assertMatch(':thedailyshow', ['ComedyCentralShows'])
-        self.assertMatch(':tds', ['ComedyCentralShows'])
  
      def test_vimeo_matching(self):
          self.assertMatch('https://vimeo.com/channels/tributes', ['vimeo:channel'])
@@ -130,6 +129,15 @@ class TestAllURLsMatching(unittest.TestCase):
              'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
              ['Yahoo'])
  
+    def test_no_duplicated_ie_names(self):
+        name_accu = collections.defaultdict(list)
+        for ie in self.ies:
+            name_accu[ie.IE_NAME.lower()].append(type(ie).__name__)
+        for (ie_name, ie_list) in name_accu.items():
+            self.assertEqual(
+                len(ie_list), 1,
+                'Multiple extractors with the same IE_NAME "%s" (%s)' % (ie_name, ', '.join(ie_list)))
+
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_compat.py b/test/test_compat.py

index cc105807a3faf0c4a686534cacfaf1d6300f7eb8..b574249489a3ded4cf5dcd66e49c123829c2331e 100644 (file)
--- a/test/test_compat.py
+++ b/test/test_compat.py
@@ -10,13 +10,14 @@ import unittest
  sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
  
  
-from youtube_dl.utils import get_filesystem_encoding
  from youtube_dl.compat import (
      compat_getenv,
+    compat_setenv,
      compat_etree_fromstring,
      compat_expanduser,
      compat_shlex_split,
      compat_str,
+    compat_struct_unpack,
      compat_urllib_parse_unquote,
      compat_urllib_parse_unquote_plus,
      compat_urllib_parse_urlencode,
@@ -26,19 +27,22 @@ from youtube_dl.compat import (
  class TestCompat(unittest.TestCase):
      def test_compat_getenv(self):
          test_str = 'тест'
-        os.environ['YOUTUBE-DL-TEST'] = (
-            test_str if sys.version_info >= (3, 0)
-            else test_str.encode(get_filesystem_encoding()))
+        compat_setenv('YOUTUBE-DL-TEST', test_str)
          self.assertEqual(compat_getenv('YOUTUBE-DL-TEST'), test_str)
  
+    def test_compat_setenv(self):
+        test_var = 'YOUTUBE-DL-TEST'
+        test_str = 'тест'
+        compat_setenv(test_var, test_str)
+        compat_getenv(test_var)
+        self.assertEqual(compat_getenv(test_var), test_str)
+
      def test_compat_expanduser(self):
          old_home = os.environ.get('HOME')
          test_str = 'C:\Documents and Settings\тест\Application Data'
-        os.environ['HOME'] = (
-            test_str if sys.version_info >= (3, 0)
-            else test_str.encode(get_filesystem_encoding()))
+        compat_setenv('HOME', test_str)
          self.assertEqual(compat_expanduser('~'), test_str)
-        os.environ['HOME'] = old_home
+        compat_setenv('HOME', old_home or '')
  
      def test_all_present(self):
          import youtube_dl.compat
@@ -76,9 +80,15 @@ class TestCompat(unittest.TestCase):
          self.assertEqual(compat_urllib_parse_urlencode({'abc': b'def'}), 'abc=def')
          self.assertEqual(compat_urllib_parse_urlencode({b'abc': 'def'}), 'abc=def')
          self.assertEqual(compat_urllib_parse_urlencode({b'abc': b'def'}), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode([('abc', 'def')]), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode([('abc', b'def')]), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode([(b'abc', 'def')]), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode([(b'abc', b'def')]), 'abc=def')
  
      def test_compat_shlex_split(self):
          self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
+        self.assertEqual(compat_shlex_split('-option "one\ntwo" \n -flag'), ['-option', 'one\ntwo', '-flag'])
+        self.assertEqual(compat_shlex_split('-val 中文'), ['-val', '中文'])
  
      def test_compat_etree_fromstring(self):
          xml = '''
@@ -95,5 +105,15 @@ class TestCompat(unittest.TestCase):
          self.assertTrue(isinstance(doc.find('chinese').text, compat_str))
          self.assertTrue(isinstance(doc.find('foo/bar').text, compat_str))
  
+    def test_compat_etree_fromstring_doctype(self):
+        xml = '''<?xml version="1.0"?>
+<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">
+<smil xmlns="http://www.w3.org/2001/SMIL20/Language"></smil>'''
+        compat_etree_fromstring(xml)
+
+    def test_struct_unpack(self):
+        self.assertEqual(compat_struct_unpack('!B', b'\x00'), (0,))
+
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_download.py b/test/test_download.py

index a3f1c0644f32b180a2b177e76dbea44854b0983e..4639529897967ebc49883e488f5624a038c70c44 100644 (file)
--- a/test/test_download.py
+++ b/test/test_download.py
@@ -60,6 +60,7 @@ def _file_md5(fn):
      with open(fn, 'rb') as f:
          return hashlib.md5(f.read()).hexdigest()
  
+
  defs = gettestcases()
  
  
@@ -217,6 +218,7 @@ def generator(test_case):
  
      return test_template
  
+
  # And add them to TestDownload
  for n, test_case in enumerate(defs):
      test_method = generator(test_case)
diff --git a/test/test_execution.py b/test/test_execution.py

index 620db080e9bd836c7239a93e86e0944b95f793e0..11661bb68148f4eb229b50c37f67dc744491c7df 100644 (file)
--- a/test/test_execution.py
+++ b/test/test_execution.py
@@ -39,5 +39,6 @@ class TestExecution(unittest.TestCase):
          _, stderr = p.communicate()
          self.assertFalse(stderr)
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_http.py b/test/test_http.py

index 15e0ad369d57966bef222bf35c422ad9bdb4e755..7a7a3510ffb46e2791153dff5e4157bb21433056 100644 (file)
--- a/test/test_http.py
+++ b/test/test_http.py
@@ -16,6 +16,15 @@ import threading
  TEST_DIR = os.path.dirname(os.path.abspath(__file__))
  
  
+def http_server_port(httpd):
+    if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
+        # In Jython SSLSocket is not a subclass of socket.socket
+        sock = httpd.socket.sock
+    else:
+        sock = httpd.socket
+    return sock.getsockname()[1]
+
+
  class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
      def log_message(self, format, *args):
          pass
@@ -31,6 +40,22 @@ class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
              self.send_header('Content-Type', 'video/mp4')
              self.end_headers()
              self.wfile.write(b'\x00\x00\x00\x00\x20\x66\x74[video]')
+        elif self.path == '/302':
+            if sys.version_info[0] == 3:
+                # XXX: Python 3 http server does not allow non-ASCII header values
+                self.send_response(404)
+                self.end_headers()
+                return
+
+            new_url = 'http://localhost:%d/中文.html' % http_server_port(self.server)
+            self.send_response(302)
+            self.send_header(b'Location', new_url.encode('utf-8'))
+            self.end_headers()
+        elif self.path == '/%E4%B8%AD%E6%96%87.html':
+            self.send_response(200)
+            self.send_header('Content-Type', 'text/html; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(b'<html><video src="/vid.mp4" /></html>')
          else:
              assert False
  
@@ -47,18 +72,32 @@ class FakeLogger(object):
  
  
  class TestHTTP(unittest.TestCase):
+    def setUp(self):
+        self.httpd = compat_http_server.HTTPServer(
+            ('localhost', 0), HTTPTestRequestHandler)
+        self.port = http_server_port(self.httpd)
+        self.server_thread = threading.Thread(target=self.httpd.serve_forever)
+        self.server_thread.daemon = True
+        self.server_thread.start()
+
+    def test_unicode_path_redirection(self):
+        # XXX: Python 3 http server does not allow non-ASCII header values
+        if sys.version_info[0] == 3:
+            return
+
+        ydl = YoutubeDL({'logger': FakeLogger()})
+        r = ydl.extract_info('http://localhost:%d/302' % self.port)
+        self.assertEqual(r['entries'][0]['url'], 'http://localhost:%d/vid.mp4' % self.port)
+
+
+class TestHTTPS(unittest.TestCase):
      def setUp(self):
          certfn = os.path.join(TEST_DIR, 'testcert.pem')
          self.httpd = compat_http_server.HTTPServer(
              ('localhost', 0), HTTPTestRequestHandler)
          self.httpd.socket = ssl.wrap_socket(
              self.httpd.socket, certfile=certfn, server_side=True)
-        if os.name == 'java':
-            # In Jython SSLSocket is not a subclass of socket.socket
-            sock = self.httpd.socket.sock
-        else:
-            sock = self.httpd.socket
-        self.port = sock.getsockname()[1]
+        self.port = http_server_port(self.httpd)
          self.server_thread = threading.Thread(target=self.httpd.serve_forever)
          self.server_thread.daemon = True
          self.server_thread.start()
@@ -72,7 +111,7 @@ class TestHTTP(unittest.TestCase):
  
          ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True})
          r = ydl.extract_info('https://localhost:%d/video.html' % self.port)
-        self.assertEqual(r['url'], 'https://localhost:%d/vid.mp4' % self.port)
+        self.assertEqual(r['entries'][0]['url'], 'https://localhost:%d/vid.mp4' % self.port)
  
  
  def _build_proxy_handler(name):
@@ -94,32 +133,32 @@ class TestProxy(unittest.TestCase):
      def setUp(self):
          self.proxy = compat_http_server.HTTPServer(
              ('localhost', 0), _build_proxy_handler('normal'))
-        self.port = self.proxy.socket.getsockname()[1]
+        self.port = http_server_port(self.proxy)
          self.proxy_thread = threading.Thread(target=self.proxy.serve_forever)
          self.proxy_thread.daemon = True
          self.proxy_thread.start()
  
-        self.cn_proxy = compat_http_server.HTTPServer(
-            ('localhost', 0), _build_proxy_handler('cn'))
-        self.cn_port = self.cn_proxy.socket.getsockname()[1]
-        self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever)
-        self.cn_proxy_thread.daemon = True
-        self.cn_proxy_thread.start()
+        self.geo_proxy = compat_http_server.HTTPServer(
+            ('localhost', 0), _build_proxy_handler('geo'))
+        self.geo_port = http_server_port(self.geo_proxy)
+        self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
+        self.geo_proxy_thread.daemon = True
+        self.geo_proxy_thread.start()
  
      def test_proxy(self):
-        cn_proxy = 'localhost:{0}'.format(self.cn_port)
+        geo_proxy = 'localhost:{0}'.format(self.geo_port)
          ydl = YoutubeDL({
              'proxy': 'localhost:{0}'.format(self.port),
-            'cn_verification_proxy': cn_proxy,
+            'geo_verification_proxy': geo_proxy,
          })
          url = 'http://foo.com/bar'
          response = ydl.urlopen(url).read().decode('utf-8')
          self.assertEqual(response, 'normal: {0}'.format(url))
  
          req = compat_urllib_request.Request(url)
-        req.add_header('Ytdl-request-proxy', cn_proxy)
+        req.add_header('Ytdl-request-proxy', geo_proxy)
          response = ydl.urlopen(req).read().decode('utf-8')
-        self.assertEqual(response, 'cn: {0}'.format(url))
+        self.assertEqual(response, 'geo: {0}'.format(url))
  
      def test_proxy_with_idn(self):
          ydl = YoutubeDL({
@@ -130,5 +169,6 @@ class TestProxy(unittest.TestCase):
          # b'xn--fiq228c' is '中文'.encode('idna')
          self.assertEqual(response, 'normal: http://xn--fiq228c.tw/')
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_iqiyi_sdk_interpreter.py b/test/test_iqiyi_sdk_interpreter.py

index 9d95cb60618ae4ee122b46884a2eae6233dffbec..789059dbea38026362caea2be08f9d36796a7b1d 100644 (file)
--- a/test/test_iqiyi_sdk_interpreter.py
+++ b/test/test_iqiyi_sdk_interpreter.py
@@ -43,5 +43,6 @@ class TestIqiyiSDKInterpreter(unittest.TestCase):
          ie._login()
          self.assertTrue('unable to log in:' in logger.messages[0])
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_jsinterp.py b/test/test_jsinterp.py

index 63c350b8fa986fc63d70af43a6a0fdcaf5958eed..c24b8ca742acc308ca9c455378564bbac053765d 100644 (file)
--- a/test/test_jsinterp.py
+++ b/test/test_jsinterp.py
@@ -104,6 +104,14 @@ class TestJSInterpreter(unittest.TestCase):
          }''')
          self.assertEqual(jsi.call_function('x'), [20, 20, 30, 40, 50])
  
+    def test_call(self):
+        jsi = JSInterpreter('''
+        function x() { return 2; }
+        function y(a) { return x() + a; }
+        function z() { return y(3); }
+        ''')
+        self.assertEqual(jsi.call_function('z'), 5)
+
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_socks.py b/test/test_socks.py

new file mode 100644 (file)

index 0000000..1e68eb0
--- /dev/null
+++ b/test/test_socks.py
@@ -0,0 +1,118 @@
+#!/usr/bin/env python
+# coding: utf-8
+from __future__ import unicode_literals
+
+# Allow direct execution
+import os
+import sys
+import unittest
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+import random
+import subprocess
+
+from test.helper import (
+    FakeYDL,
+    get_params,
+)
+from youtube_dl.compat import (
+    compat_str,
+    compat_urllib_request,
+)
+
+
+class TestMultipleSocks(unittest.TestCase):
+    @staticmethod
+    def _check_params(attrs):
+        params = get_params()
+        for attr in attrs:
+            if attr not in params:
+                print('Missing %s. Skipping.' % attr)
+                return
+        return params
+
+    def test_proxy_http(self):
+        params = self._check_params(['primary_proxy', 'primary_server_ip'])
+        if params is None:
+            return
+        ydl = FakeYDL({
+            'proxy': params['primary_proxy']
+        })
+        self.assertEqual(
+            ydl.urlopen('http://yt-dl.org/ip').read().decode('utf-8'),
+            params['primary_server_ip'])
+
+    def test_proxy_https(self):
+        params = self._check_params(['primary_proxy', 'primary_server_ip'])
+        if params is None:
+            return
+        ydl = FakeYDL({
+            'proxy': params['primary_proxy']
+        })
+        self.assertEqual(
+            ydl.urlopen('https://yt-dl.org/ip').read().decode('utf-8'),
+            params['primary_server_ip'])
+
+    def test_secondary_proxy_http(self):
+        params = self._check_params(['secondary_proxy', 'secondary_server_ip'])
+        if params is None:
+            return
+        ydl = FakeYDL()
+        req = compat_urllib_request.Request('http://yt-dl.org/ip')
+        req.add_header('Ytdl-request-proxy', params['secondary_proxy'])
+        self.assertEqual(
+            ydl.urlopen(req).read().decode('utf-8'),
+            params['secondary_server_ip'])
+
+    def test_secondary_proxy_https(self):
+        params = self._check_params(['secondary_proxy', 'secondary_server_ip'])
+        if params is None:
+            return
+        ydl = FakeYDL()
+        req = compat_urllib_request.Request('https://yt-dl.org/ip')
+        req.add_header('Ytdl-request-proxy', params['secondary_proxy'])
+        self.assertEqual(
+            ydl.urlopen(req).read().decode('utf-8'),
+            params['secondary_server_ip'])
+
+
+class TestSocks(unittest.TestCase):
+    _SKIP_SOCKS_TEST = True
+
+    def setUp(self):
+        if self._SKIP_SOCKS_TEST:
+            return
+
+        self.port = random.randint(20000, 30000)
+        self.server_process = subprocess.Popen([
+            'srelay', '-f', '-i', '127.0.0.1:%d' % self.port],
+            stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+
+    def tearDown(self):
+        if self._SKIP_SOCKS_TEST:
+            return
+
+        self.server_process.terminate()
+        self.server_process.communicate()
+
+    def _get_ip(self, protocol):
+        if self._SKIP_SOCKS_TEST:
+            return '127.0.0.1'
+
+        ydl = FakeYDL({
+            'proxy': '%s://127.0.0.1:%d' % (protocol, self.port),
+        })
+        return ydl.urlopen('http://yt-dl.org/ip').read().decode('utf-8')
+
+    def test_socks4(self):
+        self.assertTrue(isinstance(self._get_ip('socks4'), compat_str))
+
+    def test_socks4a(self):
+        self.assertTrue(isinstance(self._get_ip('socks4a'), compat_str))
+
+    def test_socks5(self):
+        self.assertTrue(isinstance(self._get_ip('socks5'), compat_str))
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/test/test_utils.py b/test/test_utils.py

index a35debfe121f5b5b7e084b6eaa1c0e8dd0fd9bea..2e3cd0179db9dd97792fafa695f7e7a043542a38 100644 (file)
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -20,6 +20,7 @@ from youtube_dl.utils import (
      args_to_str,
      encode_base_n,
      clean_html,
+    date_from_str,
      DateRange,
      detect_exe_version,
      determine_ext,
@@ -32,14 +33,18 @@ from youtube_dl.utils import (
      ExtractorError,
      find_xpath_attr,
      fix_xml_ampersands,
+    get_element_by_class,
      InAdvancePagedList,
      intlist_to_bytes,
      is_html,
      js_to_json,
      limit_length,
+    mimetype2ext,
+    month_by_name,
      ohdave_rsa_encrypt,
      OnDemandPagedList,
      orderedSet,
+    parse_age_limit,
      parse_duration,
      parse_filesize,
      parse_count,
@@ -49,20 +54,24 @@ from youtube_dl.utils import (
      sanitize_path,
      prepend_extension,
      replace_extension,
+    remove_start,
+    remove_end,
      remove_quotes,
      shell_quote,
      smuggle_url,
      str_to_int,
      strip_jsonp,
-    struct_unpack,
      timeconvert,
      unescapeHTML,
      unified_strdate,
+    unified_timestamp,
      unsmuggle_url,
      uppercase_escape,
      lowercase_escape,
      url_basename,
+    base_url,
      urlencode_postdata,
+    urshift,
      update_url_query,
      version_tuple,
      xpath_with_ns,
@@ -76,6 +85,7 @@ from youtube_dl.utils import (
      cli_option,
      cli_valueless_option,
      cli_bool_option,
+    parse_codecs,
  )
  from youtube_dl.compat import (
      compat_chr,
@@ -138,8 +148,8 @@ class TestUtil(unittest.TestCase):
          self.assertEqual('yes_no', sanitize_filename('yes? no', restricted=True))
          self.assertEqual('this_-_that', sanitize_filename('this: that', restricted=True))
  
-        tests = 'a\xe4b\u4e2d\u56fd\u7684c'
-        self.assertEqual(sanitize_filename(tests, restricted=True), 'a_b_c')
+        tests = 'aäb\u4e2d\u56fd\u7684c'
+        self.assertEqual(sanitize_filename(tests, restricted=True), 'aab_c')
          self.assertTrue(sanitize_filename('\xf6', restricted=True) != '')  # No empty filename
  
          forbidden = '"\0\\/&!: \'\t\n()[]{}$;`^,#'
@@ -154,6 +164,10 @@ class TestUtil(unittest.TestCase):
          self.assertTrue(sanitize_filename('-', restricted=True) != '')
          self.assertTrue(sanitize_filename(':', restricted=True) != '')
  
+        self.assertEqual(sanitize_filename(
+            'ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖŐØŒÙÚÛÜŰÝÞßàáâãäåæçèéêëìíîïðñòóôõöőøœùúûüűýþÿ', restricted=True),
+            'AAAAAAAECEEEEIIIIDNOOOOOOOOEUUUUUYPssaaaaaaaeceeeeiiiionooooooooeuuuuuypy')
+
      def test_sanitize_ids(self):
          self.assertEqual(sanitize_filename('_n_cd26wFpw', is_id=True), '_n_cd26wFpw')
          self.assertEqual(sanitize_filename('_BD_eEpuzXw', is_id=True), '_BD_eEpuzXw')
@@ -211,6 +225,16 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
          self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
  
+    def test_remove_start(self):
+        self.assertEqual(remove_start(None, 'A - '), None)
+        self.assertEqual(remove_start('A - B', 'A - '), 'B')
+        self.assertEqual(remove_start('B - A', 'A - '), 'B - A')
+
+    def test_remove_end(self):
+        self.assertEqual(remove_end(None, ' - B'), None)
+        self.assertEqual(remove_end('A - B', ' - B'), 'A')
+        self.assertEqual(remove_end('B - A', ' - B'), 'B - A')
+
      def test_remove_quotes(self):
          self.assertEqual(remove_quotes(None), None)
          self.assertEqual(remove_quotes('"'), '"')
@@ -233,6 +257,15 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(unescapeHTML('&#47;'), '/')
          self.assertEqual(unescapeHTML('&eacute;'), 'é')
          self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
+        # HTML5 entities
+        self.assertEqual(unescapeHTML('&period;&apos;'), '.\'')
+
+    def test_date_from_str(self):
+        self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day'))
+        self.assertEqual(date_from_str('now+7day'), date_from_str('now+1week'))
+        self.assertEqual(date_from_str('now+14day'), date_from_str('now+2week'))
+        self.assertEqual(date_from_str('now+365day'), date_from_str('now+1year'))
+        self.assertEqual(date_from_str('now+30day'), date_from_str('now+1month'))
  
      def test_daterange(self):
          _20century = DateRange("19000101", "20000101")
@@ -258,7 +291,30 @@ class TestUtil(unittest.TestCase):
              '20150202')
          self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214')
          self.assertEqual(unified_strdate('25-09-2014'), '20140925')
+        self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
          self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
+        self.assertEqual(unified_strdate('Feb 7, 2016 at 6:35 pm'), '20160207')
+
+    def test_unified_timestamps(self):
+        self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
+        self.assertEqual(unified_timestamp('8/7/2009'), 1247011200)
+        self.assertEqual(unified_timestamp('Dec 14, 2012'), 1355443200)
+        self.assertEqual(unified_timestamp('2012/10/11 01:56:38 +0000'), 1349920598)
+        self.assertEqual(unified_timestamp('1968 12 10'), -33436800)
+        self.assertEqual(unified_timestamp('1968-12-10'), -33436800)
+        self.assertEqual(unified_timestamp('28/01/2014 21:00:00 +0100'), 1390939200)
+        self.assertEqual(
+            unified_timestamp('11/26/2014 11:30:00 AM PST', day_first=False),
+            1417001400)
+        self.assertEqual(
+            unified_timestamp('2/2/2015 6:47:40 PM', day_first=False),
+            1422902860)
+        self.assertEqual(unified_timestamp('Feb 14th 2016 5:45PM'), 1455471900)
+        self.assertEqual(unified_timestamp('25-09-2014'), 1411603200)
+        self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
+        self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
+        self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
+        self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
  
      def test_determine_ext(self):
          self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@@ -358,6 +414,12 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(res_url, url)
          self.assertEqual(res_data, None)
  
+        smug_url = smuggle_url(url, {'a': 'b'})
+        smug_smug_url = smuggle_url(smug_url, {'c': 'd'})
+        res_url, res_data = unsmuggle_url(smug_smug_url)
+        self.assertEqual(res_url, url)
+        self.assertEqual(res_data, {'a': 'b', 'c': 'd'})
+
      def test_shell_quote(self):
          args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
          self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
@@ -376,6 +438,27 @@ class TestUtil(unittest.TestCase):
              url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
              'trailer.mp4')
  
+    def test_base_url(self):
+        self.assertEqual(base_url('http://foo.de/'), 'http://foo.de/')
+        self.assertEqual(base_url('http://foo.de/bar'), 'http://foo.de/')
+        self.assertEqual(base_url('http://foo.de/bar/'), 'http://foo.de/bar/')
+        self.assertEqual(base_url('http://foo.de/bar/baz'), 'http://foo.de/bar/')
+        self.assertEqual(base_url('http://foo.de/bar/baz?x=z/x/c'), 'http://foo.de/bar/')
+
+    def test_parse_age_limit(self):
+        self.assertEqual(parse_age_limit(None), None)
+        self.assertEqual(parse_age_limit(False), None)
+        self.assertEqual(parse_age_limit('invalid'), None)
+        self.assertEqual(parse_age_limit(0), 0)
+        self.assertEqual(parse_age_limit(18), 18)
+        self.assertEqual(parse_age_limit(21), 21)
+        self.assertEqual(parse_age_limit(22), None)
+        self.assertEqual(parse_age_limit('18'), 18)
+        self.assertEqual(parse_age_limit('18+'), 18)
+        self.assertEqual(parse_age_limit('PG-13'), 13)
+        self.assertEqual(parse_age_limit('TV-14'), 14)
+        self.assertEqual(parse_age_limit('TV-MA'), 17)
+
      def test_parse_duration(self):
          self.assertEqual(parse_duration(None), None)
          self.assertEqual(parse_duration(False), None)
@@ -405,6 +488,7 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(parse_duration('01:02:03:04'), 93784)
          self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
          self.assertEqual(parse_duration('87 Min.'), 5220)
+        self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
  
      def test_fix_xml_ampersands(self):
          self.assertEqual(
@@ -444,9 +528,6 @@ class TestUtil(unittest.TestCase):
          testPL(5, 2, (2, 99), [2, 3, 4])
          testPL(5, 2, (20, 99), [])
  
-    def test_struct_unpack(self):
-        self.assertEqual(struct_unpack('!B', b'\x00'), (0,))
-
      def test_read_batch_urls(self):
          f = io.StringIO('''\xef\xbb\xbf foo
              bar\r
@@ -556,6 +637,45 @@ class TestUtil(unittest.TestCase):
              limit_length('foo bar baz asd', 12).startswith('foo bar'))
          self.assertTrue('...' in limit_length('foo bar baz asd', 12))
  
+    def test_mimetype2ext(self):
+        self.assertEqual(mimetype2ext(None), None)
+        self.assertEqual(mimetype2ext('video/x-flv'), 'flv')
+        self.assertEqual(mimetype2ext('application/x-mpegURL'), 'm3u8')
+        self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
+        self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
+        self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
+
+    def test_month_by_name(self):
+        self.assertEqual(month_by_name(None), None)
+        self.assertEqual(month_by_name('December', 'en'), 12)
+        self.assertEqual(month_by_name('décembre', 'fr'), 12)
+        self.assertEqual(month_by_name('December'), 12)
+        self.assertEqual(month_by_name('décembre'), None)
+        self.assertEqual(month_by_name('Unknown', 'unknown'), None)
+
+    def test_parse_codecs(self):
+        self.assertEqual(parse_codecs(''), {})
+        self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {
+            'vcodec': 'avc1.77.30',
+            'acodec': 'mp4a.40.2',
+        })
+        self.assertEqual(parse_codecs('mp4a.40.2'), {
+            'vcodec': 'none',
+            'acodec': 'mp4a.40.2',
+        })
+        self.assertEqual(parse_codecs('mp4a.40.5,avc1.42001e'), {
+            'vcodec': 'avc1.42001e',
+            'acodec': 'mp4a.40.5',
+        })
+        self.assertEqual(parse_codecs('avc3.640028'), {
+            'vcodec': 'avc3.640028',
+            'acodec': 'none',
+        })
+        self.assertEqual(parse_codecs(', h264,,newcodec,aac'), {
+            'vcodec': 'h264',
+            'acodec': 'aac',
+        })
+
      def test_escape_rfc3986(self):
          reserved = "!*'();:@&=+$,/?#[]"
          unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~'
@@ -608,6 +728,21 @@ class TestUtil(unittest.TestCase):
          json_code = js_to_json(inp)
          self.assertEqual(json.loads(json_code), json.loads(inp))
  
+        inp = '''{
+            0:{src:'skipped', type: 'application/dash+xml'},
+            1:{src:'skipped', type: 'application/vnd.apple.mpegURL'},
+        }'''
+        self.assertEqual(js_to_json(inp), '''{
+            "0":{"src":"skipped", "type": "application/dash+xml"},
+            "1":{"src":"skipped", "type": "application/vnd.apple.mpegURL"}
+        }''')
+
+        inp = '''{"foo":101}'''
+        self.assertEqual(js_to_json(inp), '''{"foo":101}''')
+
+        inp = '''{"duration": "00:01:07"}'''
+        self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
+
      def test_js_to_json_edgecases(self):
          on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
          self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@@ -631,6 +766,27 @@ class TestUtil(unittest.TestCase):
          on = js_to_json('{"abc": "def",}')
          self.assertEqual(json.loads(on), {'abc': 'def'})
  
+        on = js_to_json('{ 0: /* " \n */ ",]" , }')
+        self.assertEqual(json.loads(on), {'0': ',]'})
+
+        on = js_to_json(r'["<p>x<\/p>"]')
+        self.assertEqual(json.loads(on), ['<p>x</p>'])
+
+        on = js_to_json(r'["\xaa"]')
+        self.assertEqual(json.loads(on), ['\u00aa'])
+
+        on = js_to_json("['a\\\nb']")
+        self.assertEqual(json.loads(on), ['ab'])
+
+        on = js_to_json('{0xff:0xff}')
+        self.assertEqual(json.loads(on), {'255': 255})
+
+        on = js_to_json('{077:077}')
+        self.assertEqual(json.loads(on), {'63': 63})
+
+        on = js_to_json('{42:42}')
+        self.assertEqual(json.loads(on), {'42': 42})
+
      def test_extract_attributes(self):
          self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
          self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
@@ -692,7 +848,10 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(parse_filesize('2 MiB'), 2097152)
          self.assertEqual(parse_filesize('5 GB'), 5000000000)
          self.assertEqual(parse_filesize('1.2Tb'), 1200000000000)
+        self.assertEqual(parse_filesize('1.2tb'), 1200000000000)
          self.assertEqual(parse_filesize('1,24 KB'), 1240)
+        self.assertEqual(parse_filesize('1,24 kb'), 1240)
+        self.assertEqual(parse_filesize('8.5 megabytes'), 8500000)
  
      def test_parse_count(self):
          self.assertEqual(parse_count(None), None)
@@ -843,6 +1002,7 @@ The first line
          self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128'])
          self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), [])
          self.assertEqual(cli_option({}, '--proxy', 'proxy'), [])
+        self.assertEqual(cli_option({'retries': 10}, '--retries', 'retries'), ['--retries', '10'])
  
      def test_cli_valueless_option(self):
          self.assertEqual(cli_valueless_option(
@@ -903,5 +1063,18 @@ The first line
          self.assertRaises(ValueError, encode_base_n, 0, 70)
          self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
  
+    def test_urshift(self):
+        self.assertEqual(urshift(3, 1), 1)
+        self.assertEqual(urshift(-3, 1), 2147483646)
+
+    def test_get_element_by_class(self):
+        html = '''
+            <span class="foo bar">nice</span>
+        '''
+
+        self.assertEqual(get_element_by_class('foo', html), 'nice')
+        self.assertEqual(get_element_by_class('no-such-class', html), None)
+
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_verbose_output.py b/test/test_verbose_output.py

new file mode 100644 (file)

index 0000000..c1465fe
--- /dev/null
+++ b/test/test_verbose_output.py
@@ -0,0 +1,71 @@
+#!/usr/bin/env python
+# coding: utf-8
+
+from __future__ import unicode_literals
+
+import unittest
+
+import sys
+import os
+import subprocess
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+
+
+class TestVerboseOutput(unittest.TestCase):
+    def test_private_info_arg(self):
+        outp = subprocess.Popen(
+            [
+                sys.executable, 'youtube_dl/__main__.py', '-v',
+                '--username', 'johnsmith@gmail.com',
+                '--password', 'secret',
+            ], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+        sout, serr = outp.communicate()
+        self.assertTrue(b'--username' in serr)
+        self.assertTrue(b'johnsmith' not in serr)
+        self.assertTrue(b'--password' in serr)
+        self.assertTrue(b'secret' not in serr)
+
+    def test_private_info_shortarg(self):
+        outp = subprocess.Popen(
+            [
+                sys.executable, 'youtube_dl/__main__.py', '-v',
+                '-u', 'johnsmith@gmail.com',
+                '-p', 'secret',
+            ], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+        sout, serr = outp.communicate()
+        self.assertTrue(b'-u' in serr)
+        self.assertTrue(b'johnsmith' not in serr)
+        self.assertTrue(b'-p' in serr)
+        self.assertTrue(b'secret' not in serr)
+
+    def test_private_info_eq(self):
+        outp = subprocess.Popen(
+            [
+                sys.executable, 'youtube_dl/__main__.py', '-v',
+                '--username=johnsmith@gmail.com',
+                '--password=secret',
+            ], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+        sout, serr = outp.communicate()
+        self.assertTrue(b'--username' in serr)
+        self.assertTrue(b'johnsmith' not in serr)
+        self.assertTrue(b'--password' in serr)
+        self.assertTrue(b'secret' not in serr)
+
+    def test_private_info_shortarg_eq(self):
+        outp = subprocess.Popen(
+            [
+                sys.executable, 'youtube_dl/__main__.py', '-v',
+                '-u=johnsmith@gmail.com',
+                '-p=secret',
+            ], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+        sout, serr = outp.communicate()
+        self.assertTrue(b'-u' in serr)
+        self.assertTrue(b'johnsmith' not in serr)
+        self.assertTrue(b'-p' in serr)
+        self.assertTrue(b'secret' not in serr)
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/test/test_write_annotations.py b/test/test_write_annotations.py

index 8de08f2d6d3974bd2d28265c323e7ff76d1317a3..41abdfe3b99eaabf562ebabc222fc50fead77631 100644 (file)
--- a/test/test_write_annotations.py
+++ b/test/test_write_annotations.py
@@ -24,6 +24,7 @@ class YoutubeDL(youtube_dl.YoutubeDL):
          super(YoutubeDL, self).__init__(*args, **kwargs)
          self.to_stderr = self.to_screen
  
+
  params = get_params({
      'writeannotations': True,
      'skip_download': True,
@@ -74,5 +75,6 @@ class TestAnnotations(unittest.TestCase):
      def tearDown(self):
          try_rm(ANNOTATIONS_FILE)
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_youtube_lists.py b/test/test_youtube_lists.py

index 47df0f348d862e4ab455058d00321636123db807..7a33dbf88e90f2d901b144759ffa90552787885c 100644 (file)
--- a/test/test_youtube_lists.py
+++ b/test/test_youtube_lists.py
@@ -44,7 +44,7 @@ class TestYoutubeLists(unittest.TestCase):
          ie = YoutubePlaylistIE(dl)
          result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
          entries = result['entries']
-        self.assertTrue(len(entries) >= 20)
+        self.assertTrue(len(entries) >= 50)
          original_video = entries[0]
          self.assertEqual(original_video['id'], 'OQpdSVF_k_w')
  
@@ -66,5 +66,6 @@ class TestYoutubeLists(unittest.TestCase):
          for entry in result['entries']:
              self.assertTrue(entry.get('title'))
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_youtube_signature.py b/test/test_youtube_signature.py

index 060864434fe2ab81839dcde17475e6e9f61db0f2..f0c370eeedc8942abc0b8cd8c10e57b4361d00c2 100644 (file)
--- a/test/test_youtube_signature.py
+++ b/test/test_youtube_signature.py
@@ -114,6 +114,7 @@ def make_tfunc(url, stype, sig_input, expected_sig):
      test_func.__name__ = str('test_signature_' + stype + '_' + test_id)
      setattr(TestSignature, test_func.__name__, test_func)
  
+
  for test_spec in _TESTS:
      make_tfunc(*test_spec)
  
diff --git a/tox.ini b/tox.ini

index 2d71340050bf8f8a971acb3931621f62ded02176..9c4e4a3d1eab285d8def7fce06e7d5ceb108952e 100644 (file)
--- a/tox.ini
+++ b/tox.ini
@@ -9,5 +9,6 @@ passenv = HOME
  defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
      --exclude test_subtitles.py --exclude test_write_annotations.py
      --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
+    --exclude test_socks.py
  commands = nosetests --verbose {posargs:{[testenv]defaultargs}}  # --with-coverage --cover-package=youtube_dl --cover-html
                                                 # test.test_download:TestDownload.test_NowVideo
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index d7aa951ff39fc54238c5759ee5a57f2c70d9de0d..53f20ac2cb1bd16398e160db329004b49d6bf424 100755 (executable)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -1,10 +1,11 @@
  #!/usr/bin/env python
-# -*- coding: utf-8 -*-
+# coding: utf-8
  
  from __future__ import absolute_import, unicode_literals
  
  import collections
  import contextlib
+import copy
  import datetime
  import errno
  import fileinput
@@ -64,6 +65,7 @@ from .utils import (
      PostProcessingError,
      preferredencoding,
      prepend_extension,
+    register_socks_protocols,
      render_table,
      replace_extension,
      SameFileError,
@@ -82,7 +84,7 @@ from .utils import (
      YoutubeDLHandler,
  )
  from .cache import Cache
-from .extractor import get_info_extractor, gen_extractors
+from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
  from .downloader import get_suitable_downloader
  from .downloader.rtmp import rtmpdump_version
  from .postprocessor import (
@@ -129,6 +131,9 @@ class YoutubeDL(object):
      username:          Username for authentication purposes.
      password:          Password for authentication purposes.
      videopassword:     Password for accessing a video.
+    ap_mso:            Adobe Pass multiple-system operator identifier.
+    ap_username:       Multiple-system operator account username.
+    ap_password:       Multiple-system operator account password.
      usenetrc:          Use netrc for authentication instead.
      verbose:           Print additional info to stdout.
      quiet:             Do not print messages to stdout.
@@ -195,8 +200,8 @@ class YoutubeDL(object):
      prefer_insecure:   Use HTTP instead of HTTPS to retrieve information.
                         At the moment, this is only supported by YouTube.
      proxy:             URL of the proxy server to use
-    cn_verification_proxy:  URL of the proxy to use for IP address verification
-                       on Chinese sites. (Experimental)
+    geo_verification_proxy:  URL of the proxy to use for IP address verification
+                       on geo-restricted sites. (Experimental)
      socket_timeout:    Time to wait for unresponsive hosts, in seconds
      bidi_workaround:   Work around buggy terminals without bidirectional text
                         support, using fridibi
@@ -247,7 +252,16 @@ class YoutubeDL(object):
      source_address:    (Experimental) Client-side IP address to bind to.
      call_home:         Boolean, true iff we are allowed to contact the
                         youtube-dl servers for debugging.
-    sleep_interval:    Number of seconds to sleep before each download.
+    sleep_interval:    Number of seconds to sleep before each download when
+                       used alone or a lower bound of a range for randomized
+                       sleep before each download (minimum possible number
+                       of seconds to sleep) when used along with
+                       max_sleep_interval.
+    max_sleep_interval:Upper bound of a range for randomized sleep before each
+                       download (maximum possible number of seconds to sleep).
+                       Must only be used along with sleep_interval.
+                       Actual sleep time will be a random float from range
+                       [sleep_interval; max_sleep_interval].
      listformats:       Print an overview of available video formats and exit.
      list_thumbnails:   Print a table of all thumbnails and exit.
      match_filter:      A function that gets called with the info_dict of
@@ -260,7 +274,9 @@ class YoutubeDL(object):
      The following options determine which downloader is picked:
      external_downloader: Executable of the external downloader to call.
                         None or unset for standard (built-in) downloader.
-    hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv.
+    hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv
+                       if True, otherwise use ffmpeg/avconv if False, otherwise
+                       use downloader suggested by extractor if None.
  
      The following parameters are not used by YoutubeDL itself, they are used by
      the downloader (see youtube_dl/downloader/common.py):
@@ -301,6 +317,11 @@ class YoutubeDL(object):
          self.params.update(params)
          self.cache = Cache(self)
  
+        if self.params.get('cn_verification_proxy') is not None:
+            self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
+            if self.params.get('geo_verification_proxy') is None:
+                self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
+
          if params.get('bidi_workaround', False):
              try:
                  import pty
@@ -323,7 +344,7 @@ class YoutubeDL(object):
                          ['fribidi', '-c', 'UTF-8'] + width_args, **sp_kwargs)
                  self._output_channel = os.fdopen(master, 'rb')
              except OSError as ose:
-                if ose.errno == 2:
+                if ose.errno == errno.ENOENT:
                      self.report_warning('Could not find fribidi executable, ignoring --bidi-workaround . Make sure that  fribidi  is an executable file in one of the directories in your $PATH.')
                  else:
                      raise
@@ -359,6 +380,8 @@ class YoutubeDL(object):
          for ph in self.params.get('progress_hooks', []):
              self.add_progress_hook(ph)
  
+        register_socks_protocols()
+
      def warn_if_short_id(self, argv):
          # short YouTube ID starting with dash?
          idxs = [
@@ -378,8 +401,9 @@ class YoutubeDL(object):
      def add_info_extractor(self, ie):
          """Add an InfoExtractor object to the end of the list."""
          self._ies.append(ie)
-        self._ies_instances[ie.ie_key()] = ie
-        ie.set_downloader(self)
+        if not isinstance(ie, type):
+            self._ies_instances[ie.ie_key()] = ie
+            ie.set_downloader(self)
  
      def get_info_extractor(self, ie_key):
          """
@@ -397,7 +421,7 @@ class YoutubeDL(object):
          """
          Add the InfoExtractors returned by gen_extractors to the end of the list
          """
-        for ie in gen_extractors():
+        for ie in gen_extractor_classes():
              self.add_info_extractor(ie)
  
      def add_post_processor(self, pp):
@@ -577,7 +601,7 @@ class YoutubeDL(object):
                  is_id=(k == 'id'))
              template_dict = dict((k, sanitize(k, v))
                                   for k, v in template_dict.items()
-                                 if v is not None)
+                                 if v is not None and not isinstance(v, (list, tuple, dict)))
              template_dict = collections.defaultdict(lambda: 'NA', template_dict)
  
              outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
@@ -661,6 +685,7 @@ class YoutubeDL(object):
              if not ie.suitable(url):
                  continue
  
+            ie = self.get_info_extractor(ie.ie_key())
              if not ie.working():
                  self.report_warning('The program functionality for this site has been marked as broken, '
                                      'and will probably not work.')
@@ -713,6 +738,7 @@ class YoutubeDL(object):
          result_type = ie_result.get('_type', 'video')
  
          if result_type in ('url', 'url_transparent'):
+            ie_result['url'] = sanitize_url(ie_result['url'])
              extract_flat = self.params.get('extract_flat', False)
              if ((extract_flat == 'in_playlist' and 'playlist' in extra_info) or
                      extract_flat is True):
@@ -1038,9 +1064,9 @@ class YoutubeDL(object):
              if isinstance(selector, list):
                  fs = [_build_selector_function(s) for s in selector]
  
-                def selector_function(formats):
+                def selector_function(ctx):
                      for f in fs:
-                        for format in f(formats):
+                        for format in f(ctx):
                              yield format
                  return selector_function
              elif selector.type == GROUP:
@@ -1048,17 +1074,17 @@ class YoutubeDL(object):
              elif selector.type == PICKFIRST:
                  fs = [_build_selector_function(s) for s in selector.selector]
  
-                def selector_function(formats):
+                def selector_function(ctx):
                      for f in fs:
-                        picked_formats = list(f(formats))
+                        picked_formats = list(f(ctx))
                          if picked_formats:
                              return picked_formats
                      return []
              elif selector.type == SINGLE:
                  format_spec = selector.selector
  
-                def selector_function(formats):
-                    formats = list(formats)
+                def selector_function(ctx):
+                    formats = list(ctx['formats'])
                      if not formats:
                          return
                      if format_spec == 'all':
@@ -1071,9 +1097,10 @@ class YoutubeDL(object):
                              if f.get('vcodec') != 'none' and f.get('acodec') != 'none']
                          if audiovideo_formats:
                              yield audiovideo_formats[format_idx]
-                        # for audio only (soundcloud) or video only (imgur) urls, select the best/worst audio format
-                        elif (all(f.get('acodec') != 'none' for f in formats) or
-                              all(f.get('vcodec') != 'none' for f in formats)):
+                        # for extractors with incomplete formats (audio only (soundcloud)
+                        # or video only (imgur)) we will fallback to best/worst
+                        # {video,audio}-only format
+                        elif ctx['incomplete_formats']:
                              yield formats[format_idx]
                      elif format_spec == 'bestaudio':
                          audio_formats = [
@@ -1147,17 +1174,18 @@ class YoutubeDL(object):
                      }
                  video_selector, audio_selector = map(_build_selector_function, selector.selector)
  
-                def selector_function(formats):
-                    formats = list(formats)
-                    for pair in itertools.product(video_selector(formats), audio_selector(formats)):
+                def selector_function(ctx):
+                    for pair in itertools.product(
+                            video_selector(copy.deepcopy(ctx)), audio_selector(copy.deepcopy(ctx))):
                          yield _merge(pair)
  
              filters = [self._build_format_filter(f) for f in selector.filters]
  
-            def final_selector(formats):
+            def final_selector(ctx):
+                ctx_copy = copy.deepcopy(ctx)
                  for _filter in filters:
-                    formats = list(filter(_filter, formats))
-                return selector_function(formats)
+                    ctx_copy['formats'] = list(filter(_filter, ctx_copy['formats']))
+                return selector_function(ctx_copy)
              return final_selector
  
          stream = io.BytesIO(format_spec.encode('utf-8'))
@@ -1215,6 +1243,10 @@ class YoutubeDL(object):
          if 'title' not in info_dict:
              raise ExtractorError('Missing "title" field in extractor result')
  
+        if not isinstance(info_dict['id'], compat_str):
+            self.report_warning('"id" field is not a string - forcing string conversion')
+            info_dict['id'] = compat_str(info_dict['id'])
+
          if 'playlist' not in info_dict:
              # It isn't part of a playlist
              info_dict['playlist'] = None
@@ -1227,8 +1259,10 @@ class YoutubeDL(object):
                  info_dict['thumbnails'] = thumbnails = [{'url': thumbnail}]
          if thumbnails:
              thumbnails.sort(key=lambda t: (
-                t.get('preference'), t.get('width'), t.get('height'),
-                t.get('id'), t.get('url')))
+                t.get('preference') if t.get('preference') is not None else -1,
+                t.get('width') if t.get('width') is not None else -1,
+                t.get('height') if t.get('height') is not None else -1,
+                t.get('id') if t.get('id') is not None else '', t.get('url')))
              for i, t in enumerate(thumbnails):
                  t['url'] = sanitize_url(t['url'])
                  if t.get('width') and t.get('height'):
@@ -1240,7 +1274,10 @@ class YoutubeDL(object):
              self.list_thumbnails(info_dict)
              return
  
-        if thumbnails and 'thumbnail' not in info_dict:
+        thumbnail = info_dict.get('thumbnail')
+        if thumbnail:
+            info_dict['thumbnail'] = sanitize_url(thumbnail)
+        elif thumbnails:
              info_dict['thumbnail'] = thumbnails[-1]['url']
  
          if 'display_id' not in info_dict and 'id' in info_dict:
@@ -1267,7 +1304,7 @@ class YoutubeDL(object):
                  for subtitle_format in subtitle:
                      if subtitle_format.get('url'):
                          subtitle_format['url'] = sanitize_url(subtitle_format['url'])
-                    if 'ext' not in subtitle_format:
+                    if subtitle_format.get('ext') is None:
                          subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
  
          if self.params.get('listsubtitles', False):
@@ -1322,7 +1359,7 @@ class YoutubeDL(object):
                      note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '',
                  )
              # Automatically determine file extension if missing
-            if 'ext' not in format:
+            if format.get('ext') is None:
                  format['ext'] = determine_ext(format['url']).lower()
              # Automatically determine protocol if missing (useful for format
              # selection purposes)
@@ -1357,7 +1394,34 @@ class YoutubeDL(object):
              req_format_list.append('best')
              req_format = '/'.join(req_format_list)
          format_selector = self.build_format_selector(req_format)
-        formats_to_download = list(format_selector(formats))
+
+        # While in format selection we may need to have an access to the original
+        # format set in order to calculate some metrics or do some processing.
+        # For now we need to be able to guess whether original formats provided
+        # by extractor are incomplete or not (i.e. whether extractor provides only
+        # video-only or audio-only formats) for proper formats selection for
+        # extractors with such incomplete formats (see
+        # https://github.com/rg3/youtube-dl/pull/5556).
+        # Since formats may be filtered during format selection and may not match
+        # the original formats the results may be incorrect. Thus original formats
+        # or pre-calculated metrics should be passed to format selection routines
+        # as well.
+        # We will pass a context object containing all necessary additional data
+        # instead of just formats.
+        # This fixes incorrect format selection issue (see
+        # https://github.com/rg3/youtube-dl/issues/10083).
+        incomplete_formats = (
+            # All formats are video-only or
+            all(f.get('vcodec') != 'none' and f.get('acodec') == 'none' for f in formats) or
+            # all formats are audio-only
+            all(f.get('vcodec') == 'none' and f.get('acodec') != 'none' for f in formats))
+
+        ctx = {
+            'formats': formats,
+            'incomplete_formats': incomplete_formats,
+        }
+
+        formats_to_download = list(format_selector(ctx))
          if not formats_to_download:
              raise ExtractorError('requested format not available',
                                   expected=True)
@@ -1544,7 +1608,9 @@ class YoutubeDL(object):
                          self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
                      else:
                          self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
-                        with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
+                        # Use newline='' to prevent conversion of newline characters
+                        # See https://github.com/rg3/youtube-dl/issues/10268
+                        with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
                              subfile.write(sub_data)
                  except (OSError, IOError):
                      self.report_error('Cannot write subtitles file ' + sub_filename)
@@ -1592,7 +1658,7 @@ class YoutubeDL(object):
                          video_ext, audio_ext = audio.get('ext'), video.get('ext')
                          if video_ext and audio_ext:
                              COMPATIBLE_EXTS = (
-                                ('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v'),
+                                ('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'),
                                  ('webm')
                              )
                              for exts in COMPATIBLE_EXTS:
@@ -1632,7 +1698,7 @@ class YoutubeDL(object):
                      # Just a single file
                      success = dl(filename, info_dict)
              except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-                self.report_error('unable to download video data: %s' % str(err))
+                self.report_error('unable to download video data: %s' % error_to_compat_str(err))
                  return
              except (OSError, IOError) as err:
                  raise UnavailableVideoError(err)
@@ -1954,6 +2020,8 @@ class YoutubeDL(object):
          write_string(encoding_str, encoding=None)
  
          self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
+        if _LAZY_LOADER:
+            self._write_string('[debug] Lazy loading extractors enabled' + '\n')
          try:
              sp = subprocess.Popen(
                  ['git', 'rev-parse', '--short', 'HEAD'],
@@ -2009,6 +2077,7 @@ class YoutubeDL(object):
          if opts_cookiefile is None:
              self.cookiejar = compat_cookiejar.CookieJar()
          else:
+            opts_cookiefile = compat_expanduser(opts_cookiefile)
              self.cookiejar = compat_cookiejar.MozillaCookieJar(
                  opts_cookiefile)
              if os.access(opts_cookiefile, os.R_OK):
diff --git a/youtube_dl/__init__.py b/youtube_dl/__init__.py

index 737f6545d4136401dd3d8ddd691ad52e86894bb0..6850d95e1ff359453571a6ac635d6ffa99ae038f 100644 (file)
--- a/youtube_dl/__init__.py
+++ b/youtube_dl/__init__.py
@@ -1,5 +1,5 @@
  #!/usr/bin/env python
-# -*- coding: utf-8 -*-
+# coding: utf-8
  
  from __future__ import unicode_literals
  
@@ -18,7 +18,6 @@ from .options import (
  from .compat import (
      compat_expanduser,
      compat_getpass,
-    compat_print,
      compat_shlex_split,
      workaround_optparse_bug9161,
  )
@@ -35,12 +34,14 @@ from .utils import (
      setproctitle,
      std_headers,
      write_string,
+    render_table,
  )
  from .update import update_self
  from .downloader import (
      FileDownloader,
  )
  from .extractor import gen_extractors, list_extractors
+from .extractor.adobepass import MSO_INFO
  from .YoutubeDL import YoutubeDL
  
  
@@ -67,16 +68,16 @@ def _real_main(argv=None):
      # Custom HTTP headers
      if opts.headers is not None:
          for h in opts.headers:
-            if h.find(':', 1) < 0:
+            if ':' not in h:
                  parser.error('wrong header formatting, it should be key:value, not "%s"' % h)
-            key, value = h.split(':', 2)
+            key, value = h.split(':', 1)
              if opts.verbose:
                  write_string('[debug] Adding header from command line option %s:%s\n' % (key, value))
              std_headers[key] = value
  
      # Dump user agent
      if opts.dump_user_agent:
-        compat_print(std_headers['User-Agent'])
+        write_string(std_headers['User-Agent'] + '\n', out=sys.stdout)
          sys.exit(0)
  
      # Batch file verification
@@ -86,23 +87,24 @@ def _real_main(argv=None):
              if opts.batchfile == '-':
                  batchfd = sys.stdin
              else:
-                batchfd = io.open(opts.batchfile, 'r', encoding='utf-8', errors='ignore')
+                batchfd = io.open(
+                    compat_expanduser(opts.batchfile),
+                    'r', encoding='utf-8', errors='ignore')
              batch_urls = read_batch_urls(batchfd)
              if opts.verbose:
                  write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
          except IOError:
              sys.exit('ERROR: batch file could not be read')
-    all_urls = batch_urls + args
-    all_urls = [url.strip() for url in all_urls]
+    all_urls = batch_urls + [url.strip() for url in args]  # batch_urls are already striped in read_batch_urls
      _enc = preferredencoding()
      all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]
  
      if opts.list_extractors:
          for ie in list_extractors(opts.age_limit):
-            compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else ''))
+            write_string(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else '') + '\n', out=sys.stdout)
              matchedUrls = [url for url in all_urls if ie.suitable(url)]
              for mu in matchedUrls:
-                compat_print('  ' + mu)
+                write_string('  ' + mu + '\n', out=sys.stdout)
          sys.exit(0)
      if opts.list_extractor_descriptions:
          for ie in list_extractors(opts.age_limit):
@@ -115,7 +117,11 @@ def _real_main(argv=None):
                  _SEARCHES = ('cute kittens', 'slithering pythons', 'falling cat', 'angry poodle', 'purple fish', 'running tortoise', 'sleeping bunny', 'burping cow')
                  _COUNTS = ('', '5', '10', 'all')
                  desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
-            compat_print(desc)
+            write_string(desc + '\n', out=sys.stdout)
+        sys.exit(0)
+    if opts.ap_list_mso:
+        table = [[mso_id, mso_info['name']] for mso_id, mso_info in MSO_INFO.items()]
+        write_string('Supported TV Providers:\n' + render_table(['mso', 'mso name'], table) + '\n', out=sys.stdout)
          sys.exit(0)
  
      # Conflicting, missing and erroneous options
@@ -123,12 +129,16 @@ def _real_main(argv=None):
          parser.error('using .netrc conflicts with giving username/password')
      if opts.password is not None and opts.username is None:
          parser.error('account username missing\n')
+    if opts.ap_password is not None and opts.ap_username is None:
+        parser.error('TV Provider account username missing\n')
      if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
          parser.error('using output template conflicts with using title, video ID or auto number')
      if opts.usetitle and opts.useid:
          parser.error('using title conflicts with using video ID')
      if opts.username is not None and opts.password is None:
          opts.password = compat_getpass('Type account password and press [Return]: ')
+    if opts.ap_username is not None and opts.ap_password is None:
+        opts.ap_password = compat_getpass('Type TV provider account password and press [Return]: ')
      if opts.ratelimit is not None:
          numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
          if numeric_limit is None:
@@ -144,6 +154,18 @@ def _real_main(argv=None):
          if numeric_limit is None:
              parser.error('invalid max_filesize specified')
          opts.max_filesize = numeric_limit
+    if opts.sleep_interval is not None:
+        if opts.sleep_interval < 0:
+            parser.error('sleep interval must be positive or 0')
+    if opts.max_sleep_interval is not None:
+        if opts.max_sleep_interval < 0:
+            parser.error('max sleep interval must be positive or 0')
+        if opts.max_sleep_interval < opts.sleep_interval:
+            parser.error('max sleep interval must be greater than or equal to min sleep interval')
+    else:
+        opts.max_sleep_interval = opts.sleep_interval
+    if opts.ap_mso and opts.ap_mso not in MSO_INFO:
+        parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
  
      def parse_retries(retries):
          if retries in ('inf', 'infinite'):
@@ -243,8 +265,6 @@ def _real_main(argv=None):
          postprocessors.append({
              'key': 'FFmpegEmbedSubtitle',
          })
-    if opts.xattrs:
-        postprocessors.append({'key': 'XAttrMetadata'})
      if opts.embedthumbnail:
          already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
          postprocessors.append({
@@ -253,6 +273,10 @@ def _real_main(argv=None):
          })
          if not already_have_thumbnail:
              opts.writethumbnail = True
+    # XAttrMetadataPP should be run after post-processors that may change file
+    # contents
+    if opts.xattrs:
+        postprocessors.append({'key': 'XAttrMetadata'})
      # Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
      # So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
      if opts.exec_cmd:
@@ -260,12 +284,6 @@ def _real_main(argv=None):
              'key': 'ExecAfterDownload',
              'exec_cmd': opts.exec_cmd,
          })
-    if opts.xattr_set_filesize:
-        try:
-            import xattr
-            xattr  # Confuse flake8
-        except ImportError:
-            parser.error('setting filesize xattr requested but python-xattr is not available')
      external_downloader_args = None
      if opts.external_downloader_args:
          external_downloader_args = compat_shlex_split(opts.external_downloader_args)
@@ -282,6 +300,9 @@ def _real_main(argv=None):
          'password': opts.password,
          'twofactor': opts.twofactor,
          'videopassword': opts.videopassword,
+        'ap_mso': opts.ap_mso,
+        'ap_username': opts.ap_username,
+        'ap_password': opts.ap_password,
          'quiet': (opts.quiet or any_getting or any_printing),
          'no_warnings': opts.no_warnings,
          'forceurl': opts.geturl,
@@ -307,6 +328,7 @@ def _real_main(argv=None):
          'nooverwrites': opts.nooverwrites,
          'retries': opts.retries,
          'fragment_retries': opts.fragment_retries,
+        'skip_unavailable_fragments': opts.skip_unavailable_fragments,
          'buffersize': opts.buffersize,
          'noresizebuffer': opts.noresizebuffer,
          'continuedl': opts.continue_dl,
@@ -369,6 +391,7 @@ def _real_main(argv=None):
          'source_address': opts.source_address,
          'call_home': opts.call_home,
          'sleep_interval': opts.sleep_interval,
+        'max_sleep_interval': opts.max_sleep_interval,
          'external_downloader': opts.external_downloader,
          'list_thumbnails': opts.list_thumbnails,
          'playlist_items': opts.playlist_items,
@@ -381,6 +404,8 @@ def _real_main(argv=None):
          'external_downloader_args': external_downloader_args,
          'postprocessor_args': postprocessor_args,
          'cn_verification_proxy': opts.cn_verification_proxy,
+        'geo_verification_proxy': opts.geo_verification_proxy,
+
      }
  
      with YoutubeDL(ydl_opts) as ydl:
@@ -404,7 +429,7 @@ def _real_main(argv=None):
  
          try:
              if opts.load_info_filename is not None:
-                retcode = ydl.download_with_info_file(opts.load_info_filename)
+                retcode = ydl.download_with_info_file(compat_expanduser(opts.load_info_filename))
              else:
                  retcode = ydl.download(all_urls)
          except MaxDownloadsReached:
@@ -424,4 +449,5 @@ def main(argv=None):
      except KeyboardInterrupt:
          sys.exit('\nERROR: Interrupted by user')
  
+
  __all__ = ['main', 'YoutubeDL', 'gen_extractors', 'list_extractors']
diff --git a/youtube_dl/aes.py b/youtube_dl/aes.py

index a01c367de4f6cf5e6f9ce4d9b86de4991fa859dc..b8ff4548116403dc5166825250fedad65c20f665 100644 (file)
--- a/youtube_dl/aes.py
+++ b/youtube_dl/aes.py
@@ -174,6 +174,7 @@ def aes_decrypt_text(data, password, key_size_bytes):
  
      return plaintext
  
+
  RCON = (0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36)
  SBOX = (0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
          0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
@@ -328,4 +329,5 @@ def inc(data):
              break
      return data
  
+
  __all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_cbc_decrypt', 'aes_decrypt_text']
diff --git a/youtube_dl/compat.py b/youtube_dl/compat.py

index 76b6b0e3838c65c2d5814d0206c18dfe713d6435..83ee7e25747532c61f344aaea921021690669f61 100644 (file)
--- a/youtube_dl/compat.py
+++ b/youtube_dl/compat.py
@@ -1,3 +1,4 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  import binascii
@@ -11,6 +12,7 @@ import re
  import shlex
  import shutil
  import socket
+import struct
  import subprocess
  import sys
  import itertools
@@ -62,6 +64,2244 @@ try:
  except ImportError:  # Python 2
      import htmlentitydefs as compat_html_entities
  
+try:  # Python >= 3.3
+    compat_html_entities_html5 = compat_html_entities.html5
+except AttributeError:
+    # Copied from CPython 3.5.1 html/entities.py
+    compat_html_entities_html5 = {
+        'Aacute': '\xc1',
+        'aacute': '\xe1',
+        'Aacute;': '\xc1',
+        'aacute;': '\xe1',
+        'Abreve;': '\u0102',
+        'abreve;': '\u0103',
+        'ac;': '\u223e',
+        'acd;': '\u223f',
+        'acE;': '\u223e\u0333',
+        'Acirc': '\xc2',
+        'acirc': '\xe2',
+        'Acirc;': '\xc2',
+        'acirc;': '\xe2',
+        'acute': '\xb4',
+        'acute;': '\xb4',
+        'Acy;': '\u0410',
+        'acy;': '\u0430',
+        'AElig': '\xc6',
+        'aelig': '\xe6',
+        'AElig;': '\xc6',
+        'aelig;': '\xe6',
+        'af;': '\u2061',
+        'Afr;': '\U0001d504',
+        'afr;': '\U0001d51e',
+        'Agrave': '\xc0',
+        'agrave': '\xe0',
+        'Agrave;': '\xc0',
+        'agrave;': '\xe0',
+        'alefsym;': '\u2135',
+        'aleph;': '\u2135',
+        'Alpha;': '\u0391',
+        'alpha;': '\u03b1',
+        'Amacr;': '\u0100',
+        'amacr;': '\u0101',
+        'amalg;': '\u2a3f',
+        'AMP': '&',
+        'amp': '&',
+        'AMP;': '&',
+        'amp;': '&',
+        'And;': '\u2a53',
+        'and;': '\u2227',
+        'andand;': '\u2a55',
+        'andd;': '\u2a5c',
+        'andslope;': '\u2a58',
+        'andv;': '\u2a5a',
+        'ang;': '\u2220',
+        'ange;': '\u29a4',
+        'angle;': '\u2220',
+        'angmsd;': '\u2221',
+        'angmsdaa;': '\u29a8',
+        'angmsdab;': '\u29a9',
+        'angmsdac;': '\u29aa',
+        'angmsdad;': '\u29ab',
+        'angmsdae;': '\u29ac',
+        'angmsdaf;': '\u29ad',
+        'angmsdag;': '\u29ae',
+        'angmsdah;': '\u29af',
+        'angrt;': '\u221f',
+        'angrtvb;': '\u22be',
+        'angrtvbd;': '\u299d',
+        'angsph;': '\u2222',
+        'angst;': '\xc5',
+        'angzarr;': '\u237c',
+        'Aogon;': '\u0104',
+        'aogon;': '\u0105',
+        'Aopf;': '\U0001d538',
+        'aopf;': '\U0001d552',
+        'ap;': '\u2248',
+        'apacir;': '\u2a6f',
+        'apE;': '\u2a70',
+        'ape;': '\u224a',
+        'apid;': '\u224b',
+        'apos;': "'",
+        'ApplyFunction;': '\u2061',
+        'approx;': '\u2248',
+        'approxeq;': '\u224a',
+        'Aring': '\xc5',
+        'aring': '\xe5',
+        'Aring;': '\xc5',
+        'aring;': '\xe5',
+        'Ascr;': '\U0001d49c',
+        'ascr;': '\U0001d4b6',
+        'Assign;': '\u2254',
+        'ast;': '*',
+        'asymp;': '\u2248',
+        'asympeq;': '\u224d',
+        'Atilde': '\xc3',
+        'atilde': '\xe3',
+        'Atilde;': '\xc3',
+        'atilde;': '\xe3',
+        'Auml': '\xc4',
+        'auml': '\xe4',
+        'Auml;': '\xc4',
+        'auml;': '\xe4',
+        'awconint;': '\u2233',
+        'awint;': '\u2a11',
+        'backcong;': '\u224c',
+        'backepsilon;': '\u03f6',
+        'backprime;': '\u2035',
+        'backsim;': '\u223d',
+        'backsimeq;': '\u22cd',
+        'Backslash;': '\u2216',
+        'Barv;': '\u2ae7',
+        'barvee;': '\u22bd',
+        'Barwed;': '\u2306',
+        'barwed;': '\u2305',
+        'barwedge;': '\u2305',
+        'bbrk;': '\u23b5',
+        'bbrktbrk;': '\u23b6',
+        'bcong;': '\u224c',
+        'Bcy;': '\u0411',
+        'bcy;': '\u0431',
+        'bdquo;': '\u201e',
+        'becaus;': '\u2235',
+        'Because;': '\u2235',
+        'because;': '\u2235',
+        'bemptyv;': '\u29b0',
+        'bepsi;': '\u03f6',
+        'bernou;': '\u212c',
+        'Bernoullis;': '\u212c',
+        'Beta;': '\u0392',
+        'beta;': '\u03b2',
+        'beth;': '\u2136',
+        'between;': '\u226c',
+        'Bfr;': '\U0001d505',
+        'bfr;': '\U0001d51f',
+        'bigcap;': '\u22c2',
+        'bigcirc;': '\u25ef',
+        'bigcup;': '\u22c3',
+        'bigodot;': '\u2a00',
+        'bigoplus;': '\u2a01',
+        'bigotimes;': '\u2a02',
+        'bigsqcup;': '\u2a06',
+        'bigstar;': '\u2605',
+        'bigtriangledown;': '\u25bd',
+        'bigtriangleup;': '\u25b3',
+        'biguplus;': '\u2a04',
+        'bigvee;': '\u22c1',
+        'bigwedge;': '\u22c0',
+        'bkarow;': '\u290d',
+        'blacklozenge;': '\u29eb',
+        'blacksquare;': '\u25aa',
+        'blacktriangle;': '\u25b4',
+        'blacktriangledown;': '\u25be',
+        'blacktriangleleft;': '\u25c2',
+        'blacktriangleright;': '\u25b8',
+        'blank;': '\u2423',
+        'blk12;': '\u2592',
+        'blk14;': '\u2591',
+        'blk34;': '\u2593',
+        'block;': '\u2588',
+        'bne;': '=\u20e5',
+        'bnequiv;': '\u2261\u20e5',
+        'bNot;': '\u2aed',
+        'bnot;': '\u2310',
+        'Bopf;': '\U0001d539',
+        'bopf;': '\U0001d553',
+        'bot;': '\u22a5',
+        'bottom;': '\u22a5',
+        'bowtie;': '\u22c8',
+        'boxbox;': '\u29c9',
+        'boxDL;': '\u2557',
+        'boxDl;': '\u2556',
+        'boxdL;': '\u2555',
+        'boxdl;': '\u2510',
+        'boxDR;': '\u2554',
+        'boxDr;': '\u2553',
+        'boxdR;': '\u2552',
+        'boxdr;': '\u250c',
+        'boxH;': '\u2550',
+        'boxh;': '\u2500',
+        'boxHD;': '\u2566',
+        'boxHd;': '\u2564',
+        'boxhD;': '\u2565',
+        'boxhd;': '\u252c',
+        'boxHU;': '\u2569',
+        'boxHu;': '\u2567',
+        'boxhU;': '\u2568',
+        'boxhu;': '\u2534',
+        'boxminus;': '\u229f',
+        'boxplus;': '\u229e',
+        'boxtimes;': '\u22a0',
+        'boxUL;': '\u255d',
+        'boxUl;': '\u255c',
+        'boxuL;': '\u255b',
+        'boxul;': '\u2518',
+        'boxUR;': '\u255a',
+        'boxUr;': '\u2559',
+        'boxuR;': '\u2558',
+        'boxur;': '\u2514',
+        'boxV;': '\u2551',
+        'boxv;': '\u2502',
+        'boxVH;': '\u256c',
+        'boxVh;': '\u256b',
+        'boxvH;': '\u256a',
+        'boxvh;': '\u253c',
+        'boxVL;': '\u2563',
+        'boxVl;': '\u2562',
+        'boxvL;': '\u2561',
+        'boxvl;': '\u2524',
+        'boxVR;': '\u2560',
+        'boxVr;': '\u255f',
+        'boxvR;': '\u255e',
+        'boxvr;': '\u251c',
+        'bprime;': '\u2035',
+        'Breve;': '\u02d8',
+        'breve;': '\u02d8',
+        'brvbar': '\xa6',
+        'brvbar;': '\xa6',
+        'Bscr;': '\u212c',
+        'bscr;': '\U0001d4b7',
+        'bsemi;': '\u204f',
+        'bsim;': '\u223d',
+        'bsime;': '\u22cd',
+        'bsol;': '\\',
+        'bsolb;': '\u29c5',
+        'bsolhsub;': '\u27c8',
+        'bull;': '\u2022',
+        'bullet;': '\u2022',
+        'bump;': '\u224e',
+        'bumpE;': '\u2aae',
+        'bumpe;': '\u224f',
+        'Bumpeq;': '\u224e',
+        'bumpeq;': '\u224f',
+        'Cacute;': '\u0106',
+        'cacute;': '\u0107',
+        'Cap;': '\u22d2',
+        'cap;': '\u2229',
+        'capand;': '\u2a44',
+        'capbrcup;': '\u2a49',
+        'capcap;': '\u2a4b',
+        'capcup;': '\u2a47',
+        'capdot;': '\u2a40',
+        'CapitalDifferentialD;': '\u2145',
+        'caps;': '\u2229\ufe00',
+        'caret;': '\u2041',
+        'caron;': '\u02c7',
+        'Cayleys;': '\u212d',
+        'ccaps;': '\u2a4d',
+        'Ccaron;': '\u010c',
+        'ccaron;': '\u010d',
+        'Ccedil': '\xc7',
+        'ccedil': '\xe7',
+        'Ccedil;': '\xc7',
+        'ccedil;': '\xe7',
+        'Ccirc;': '\u0108',
+        'ccirc;': '\u0109',
+        'Cconint;': '\u2230',
+        'ccups;': '\u2a4c',
+        'ccupssm;': '\u2a50',
+        'Cdot;': '\u010a',
+        'cdot;': '\u010b',
+        'cedil': '\xb8',
+        'cedil;': '\xb8',
+        'Cedilla;': '\xb8',
+        'cemptyv;': '\u29b2',
+        'cent': '\xa2',
+        'cent;': '\xa2',
+        'CenterDot;': '\xb7',
+        'centerdot;': '\xb7',
+        'Cfr;': '\u212d',
+        'cfr;': '\U0001d520',
+        'CHcy;': '\u0427',
+        'chcy;': '\u0447',
+        'check;': '\u2713',
+        'checkmark;': '\u2713',
+        'Chi;': '\u03a7',
+        'chi;': '\u03c7',
+        'cir;': '\u25cb',
+        'circ;': '\u02c6',
+        'circeq;': '\u2257',
+        'circlearrowleft;': '\u21ba',
+        'circlearrowright;': '\u21bb',
+        'circledast;': '\u229b',
+        'circledcirc;': '\u229a',
+        'circleddash;': '\u229d',
+        'CircleDot;': '\u2299',
+        'circledR;': '\xae',
+        'circledS;': '\u24c8',
+        'CircleMinus;': '\u2296',
+        'CirclePlus;': '\u2295',
+        'CircleTimes;': '\u2297',
+        'cirE;': '\u29c3',
+        'cire;': '\u2257',
+        'cirfnint;': '\u2a10',
+        'cirmid;': '\u2aef',
+        'cirscir;': '\u29c2',
+        'ClockwiseContourIntegral;': '\u2232',
+        'CloseCurlyDoubleQuote;': '\u201d',
+        'CloseCurlyQuote;': '\u2019',
+        'clubs;': '\u2663',
+        'clubsuit;': '\u2663',
+        'Colon;': '\u2237',
+        'colon;': ':',
+        'Colone;': '\u2a74',
+        'colone;': '\u2254',
+        'coloneq;': '\u2254',
+        'comma;': ',',
+        'commat;': '@',
+        'comp;': '\u2201',
+        'compfn;': '\u2218',
+        'complement;': '\u2201',
+        'complexes;': '\u2102',
+        'cong;': '\u2245',
+        'congdot;': '\u2a6d',
+        'Congruent;': '\u2261',
+        'Conint;': '\u222f',
+        'conint;': '\u222e',
+        'ContourIntegral;': '\u222e',
+        'Copf;': '\u2102',
+        'copf;': '\U0001d554',
+        'coprod;': '\u2210',
+        'Coproduct;': '\u2210',
+        'COPY': '\xa9',
+        'copy': '\xa9',
+        'COPY;': '\xa9',
+        'copy;': '\xa9',
+        'copysr;': '\u2117',
+        'CounterClockwiseContourIntegral;': '\u2233',
+        'crarr;': '\u21b5',
+        'Cross;': '\u2a2f',
+        'cross;': '\u2717',
+        'Cscr;': '\U0001d49e',
+        'cscr;': '\U0001d4b8',
+        'csub;': '\u2acf',
+        'csube;': '\u2ad1',
+        'csup;': '\u2ad0',
+        'csupe;': '\u2ad2',
+        'ctdot;': '\u22ef',
+        'cudarrl;': '\u2938',
+        'cudarrr;': '\u2935',
+        'cuepr;': '\u22de',
+        'cuesc;': '\u22df',
+        'cularr;': '\u21b6',
+        'cularrp;': '\u293d',
+        'Cup;': '\u22d3',
+        'cup;': '\u222a',
+        'cupbrcap;': '\u2a48',
+        'CupCap;': '\u224d',
+        'cupcap;': '\u2a46',
+        'cupcup;': '\u2a4a',
+        'cupdot;': '\u228d',
+        'cupor;': '\u2a45',
+        'cups;': '\u222a\ufe00',
+        'curarr;': '\u21b7',
+        'curarrm;': '\u293c',
+        'curlyeqprec;': '\u22de',
+        'curlyeqsucc;': '\u22df',
+        'curlyvee;': '\u22ce',
+        'curlywedge;': '\u22cf',
+        'curren': '\xa4',
+        'curren;': '\xa4',
+        'curvearrowleft;': '\u21b6',
+        'curvearrowright;': '\u21b7',
+        'cuvee;': '\u22ce',
+        'cuwed;': '\u22cf',
+        'cwconint;': '\u2232',
+        'cwint;': '\u2231',
+        'cylcty;': '\u232d',
+        'Dagger;': '\u2021',
+        'dagger;': '\u2020',
+        'daleth;': '\u2138',
+        'Darr;': '\u21a1',
+        'dArr;': '\u21d3',
+        'darr;': '\u2193',
+        'dash;': '\u2010',
+        'Dashv;': '\u2ae4',
+        'dashv;': '\u22a3',
+        'dbkarow;': '\u290f',
+        'dblac;': '\u02dd',
+        'Dcaron;': '\u010e',
+        'dcaron;': '\u010f',
+        'Dcy;': '\u0414',
+        'dcy;': '\u0434',
+        'DD;': '\u2145',
+        'dd;': '\u2146',
+        'ddagger;': '\u2021',
+        'ddarr;': '\u21ca',
+        'DDotrahd;': '\u2911',
+        'ddotseq;': '\u2a77',
+        'deg': '\xb0',
+        'deg;': '\xb0',
+        'Del;': '\u2207',
+        'Delta;': '\u0394',
+        'delta;': '\u03b4',
+        'demptyv;': '\u29b1',
+        'dfisht;': '\u297f',
+        'Dfr;': '\U0001d507',
+        'dfr;': '\U0001d521',
+        'dHar;': '\u2965',
+        'dharl;': '\u21c3',
+        'dharr;': '\u21c2',
+        'DiacriticalAcute;': '\xb4',
+        'DiacriticalDot;': '\u02d9',
+        'DiacriticalDoubleAcute;': '\u02dd',
+        'DiacriticalGrave;': '`',
+        'DiacriticalTilde;': '\u02dc',
+        'diam;': '\u22c4',
+        'Diamond;': '\u22c4',
+        'diamond;': '\u22c4',
+        'diamondsuit;': '\u2666',
+        'diams;': '\u2666',
+        'die;': '\xa8',
+        'DifferentialD;': '\u2146',
+        'digamma;': '\u03dd',
+        'disin;': '\u22f2',
+        'div;': '\xf7',
+        'divide': '\xf7',
+        'divide;': '\xf7',
+        'divideontimes;': '\u22c7',
+        'divonx;': '\u22c7',
+        'DJcy;': '\u0402',
+        'djcy;': '\u0452',
+        'dlcorn;': '\u231e',
+        'dlcrop;': '\u230d',
+        'dollar;': '$',
+        'Dopf;': '\U0001d53b',
+        'dopf;': '\U0001d555',
+        'Dot;': '\xa8',
+        'dot;': '\u02d9',
+        'DotDot;': '\u20dc',
+        'doteq;': '\u2250',
+        'doteqdot;': '\u2251',
+        'DotEqual;': '\u2250',
+        'dotminus;': '\u2238',
+        'dotplus;': '\u2214',
+        'dotsquare;': '\u22a1',
+        'doublebarwedge;': '\u2306',
+        'DoubleContourIntegral;': '\u222f',
+        'DoubleDot;': '\xa8',
+        'DoubleDownArrow;': '\u21d3',
+        'DoubleLeftArrow;': '\u21d0',
+        'DoubleLeftRightArrow;': '\u21d4',
+        'DoubleLeftTee;': '\u2ae4',
+        'DoubleLongLeftArrow;': '\u27f8',
+        'DoubleLongLeftRightArrow;': '\u27fa',
+        'DoubleLongRightArrow;': '\u27f9',
+        'DoubleRightArrow;': '\u21d2',
+        'DoubleRightTee;': '\u22a8',
+        'DoubleUpArrow;': '\u21d1',
+        'DoubleUpDownArrow;': '\u21d5',
+        'DoubleVerticalBar;': '\u2225',
+        'DownArrow;': '\u2193',
+        'Downarrow;': '\u21d3',
+        'downarrow;': '\u2193',
+        'DownArrowBar;': '\u2913',
+        'DownArrowUpArrow;': '\u21f5',
+        'DownBreve;': '\u0311',
+        'downdownarrows;': '\u21ca',
+        'downharpoonleft;': '\u21c3',
+        'downharpoonright;': '\u21c2',
+        'DownLeftRightVector;': '\u2950',
+        'DownLeftTeeVector;': '\u295e',
+        'DownLeftVector;': '\u21bd',
+        'DownLeftVectorBar;': '\u2956',
+        'DownRightTeeVector;': '\u295f',
+        'DownRightVector;': '\u21c1',
+        'DownRightVectorBar;': '\u2957',
+        'DownTee;': '\u22a4',
+        'DownTeeArrow;': '\u21a7',
+        'drbkarow;': '\u2910',
+        'drcorn;': '\u231f',
+        'drcrop;': '\u230c',
+        'Dscr;': '\U0001d49f',
+        'dscr;': '\U0001d4b9',
+        'DScy;': '\u0405',
+        'dscy;': '\u0455',
+        'dsol;': '\u29f6',
+        'Dstrok;': '\u0110',
+        'dstrok;': '\u0111',
+        'dtdot;': '\u22f1',
+        'dtri;': '\u25bf',
+        'dtrif;': '\u25be',
+        'duarr;': '\u21f5',
+        'duhar;': '\u296f',
+        'dwangle;': '\u29a6',
+        'DZcy;': '\u040f',
+        'dzcy;': '\u045f',
+        'dzigrarr;': '\u27ff',
+        'Eacute': '\xc9',
+        'eacute': '\xe9',
+        'Eacute;': '\xc9',
+        'eacute;': '\xe9',
+        'easter;': '\u2a6e',
+        'Ecaron;': '\u011a',
+        'ecaron;': '\u011b',
+        'ecir;': '\u2256',
+        'Ecirc': '\xca',
+        'ecirc': '\xea',
+        'Ecirc;': '\xca',
+        'ecirc;': '\xea',
+        'ecolon;': '\u2255',
+        'Ecy;': '\u042d',
+        'ecy;': '\u044d',
+        'eDDot;': '\u2a77',
+        'Edot;': '\u0116',
+        'eDot;': '\u2251',
+        'edot;': '\u0117',
+        'ee;': '\u2147',
+        'efDot;': '\u2252',
+        'Efr;': '\U0001d508',
+        'efr;': '\U0001d522',
+        'eg;': '\u2a9a',
+        'Egrave': '\xc8',
+        'egrave': '\xe8',
+        'Egrave;': '\xc8',
+        'egrave;': '\xe8',
+        'egs;': '\u2a96',
+        'egsdot;': '\u2a98',
+        'el;': '\u2a99',
+        'Element;': '\u2208',
+        'elinters;': '\u23e7',
+        'ell;': '\u2113',
+        'els;': '\u2a95',
+        'elsdot;': '\u2a97',
+        'Emacr;': '\u0112',
+        'emacr;': '\u0113',
+        'empty;': '\u2205',
+        'emptyset;': '\u2205',
+        'EmptySmallSquare;': '\u25fb',
+        'emptyv;': '\u2205',
+        'EmptyVerySmallSquare;': '\u25ab',
+        'emsp13;': '\u2004',
+        'emsp14;': '\u2005',
+        'emsp;': '\u2003',
+        'ENG;': '\u014a',
+        'eng;': '\u014b',
+        'ensp;': '\u2002',
+        'Eogon;': '\u0118',
+        'eogon;': '\u0119',
+        'Eopf;': '\U0001d53c',
+        'eopf;': '\U0001d556',
+        'epar;': '\u22d5',
+        'eparsl;': '\u29e3',
+        'eplus;': '\u2a71',
+        'epsi;': '\u03b5',
+        'Epsilon;': '\u0395',
+        'epsilon;': '\u03b5',
+        'epsiv;': '\u03f5',
+        'eqcirc;': '\u2256',
+        'eqcolon;': '\u2255',
+        'eqsim;': '\u2242',
+        'eqslantgtr;': '\u2a96',
+        'eqslantless;': '\u2a95',
+        'Equal;': '\u2a75',
+        'equals;': '=',
+        'EqualTilde;': '\u2242',
+        'equest;': '\u225f',
+        'Equilibrium;': '\u21cc',
+        'equiv;': '\u2261',
+        'equivDD;': '\u2a78',
+        'eqvparsl;': '\u29e5',
+        'erarr;': '\u2971',
+        'erDot;': '\u2253',
+        'Escr;': '\u2130',
+        'escr;': '\u212f',
+        'esdot;': '\u2250',
+        'Esim;': '\u2a73',
+        'esim;': '\u2242',
+        'Eta;': '\u0397',
+        'eta;': '\u03b7',
+        'ETH': '\xd0',
+        'eth': '\xf0',
+        'ETH;': '\xd0',
+        'eth;': '\xf0',
+        'Euml': '\xcb',
+        'euml': '\xeb',
+        'Euml;': '\xcb',
+        'euml;': '\xeb',
+        'euro;': '\u20ac',
+        'excl;': '!',
+        'exist;': '\u2203',
+        'Exists;': '\u2203',
+        'expectation;': '\u2130',
+        'ExponentialE;': '\u2147',
+        'exponentiale;': '\u2147',
+        'fallingdotseq;': '\u2252',
+        'Fcy;': '\u0424',
+        'fcy;': '\u0444',
+        'female;': '\u2640',
+        'ffilig;': '\ufb03',
+        'fflig;': '\ufb00',
+        'ffllig;': '\ufb04',
+        'Ffr;': '\U0001d509',
+        'ffr;': '\U0001d523',
+        'filig;': '\ufb01',
+        'FilledSmallSquare;': '\u25fc',
+        'FilledVerySmallSquare;': '\u25aa',
+        'fjlig;': 'fj',
+        'flat;': '\u266d',
+        'fllig;': '\ufb02',
+        'fltns;': '\u25b1',
+        'fnof;': '\u0192',
+        'Fopf;': '\U0001d53d',
+        'fopf;': '\U0001d557',
+        'ForAll;': '\u2200',
+        'forall;': '\u2200',
+        'fork;': '\u22d4',
+        'forkv;': '\u2ad9',
+        'Fouriertrf;': '\u2131',
+        'fpartint;': '\u2a0d',
+        'frac12': '\xbd',
+        'frac12;': '\xbd',
+        'frac13;': '\u2153',
+        'frac14': '\xbc',
+        'frac14;': '\xbc',
+        'frac15;': '\u2155',
+        'frac16;': '\u2159',
+        'frac18;': '\u215b',
+        'frac23;': '\u2154',
+        'frac25;': '\u2156',
+        'frac34': '\xbe',
+        'frac34;': '\xbe',
+        'frac35;': '\u2157',
+        'frac38;': '\u215c',
+        'frac45;': '\u2158',
+        'frac56;': '\u215a',
+        'frac58;': '\u215d',
+        'frac78;': '\u215e',
+        'frasl;': '\u2044',
+        'frown;': '\u2322',
+        'Fscr;': '\u2131',
+        'fscr;': '\U0001d4bb',
+        'gacute;': '\u01f5',
+        'Gamma;': '\u0393',
+        'gamma;': '\u03b3',
+        'Gammad;': '\u03dc',
+        'gammad;': '\u03dd',
+        'gap;': '\u2a86',
+        'Gbreve;': '\u011e',
+        'gbreve;': '\u011f',
+        'Gcedil;': '\u0122',
+        'Gcirc;': '\u011c',
+        'gcirc;': '\u011d',
+        'Gcy;': '\u0413',
+        'gcy;': '\u0433',
+        'Gdot;': '\u0120',
+        'gdot;': '\u0121',
+        'gE;': '\u2267',
+        'ge;': '\u2265',
+        'gEl;': '\u2a8c',
+        'gel;': '\u22db',
+        'geq;': '\u2265',
+        'geqq;': '\u2267',
+        'geqslant;': '\u2a7e',
+        'ges;': '\u2a7e',
+        'gescc;': '\u2aa9',
+        'gesdot;': '\u2a80',
+        'gesdoto;': '\u2a82',
+        'gesdotol;': '\u2a84',
+        'gesl;': '\u22db\ufe00',
+        'gesles;': '\u2a94',
+        'Gfr;': '\U0001d50a',
+        'gfr;': '\U0001d524',
+        'Gg;': '\u22d9',
+        'gg;': '\u226b',
+        'ggg;': '\u22d9',
+        'gimel;': '\u2137',
+        'GJcy;': '\u0403',
+        'gjcy;': '\u0453',
+        'gl;': '\u2277',
+        'gla;': '\u2aa5',
+        'glE;': '\u2a92',
+        'glj;': '\u2aa4',
+        'gnap;': '\u2a8a',
+        'gnapprox;': '\u2a8a',
+        'gnE;': '\u2269',
+        'gne;': '\u2a88',
+        'gneq;': '\u2a88',
+        'gneqq;': '\u2269',
+        'gnsim;': '\u22e7',
+        'Gopf;': '\U0001d53e',
+        'gopf;': '\U0001d558',
+        'grave;': '`',
+        'GreaterEqual;': '\u2265',
+        'GreaterEqualLess;': '\u22db',
+        'GreaterFullEqual;': '\u2267',
+        'GreaterGreater;': '\u2aa2',
+        'GreaterLess;': '\u2277',
+        'GreaterSlantEqual;': '\u2a7e',
+        'GreaterTilde;': '\u2273',
+        'Gscr;': '\U0001d4a2',
+        'gscr;': '\u210a',
+        'gsim;': '\u2273',
+        'gsime;': '\u2a8e',
+        'gsiml;': '\u2a90',
+        'GT': '>',
+        'gt': '>',
+        'GT;': '>',
+        'Gt;': '\u226b',
+        'gt;': '>',
+        'gtcc;': '\u2aa7',
+        'gtcir;': '\u2a7a',
+        'gtdot;': '\u22d7',
+        'gtlPar;': '\u2995',
+        'gtquest;': '\u2a7c',
+        'gtrapprox;': '\u2a86',
+        'gtrarr;': '\u2978',
+        'gtrdot;': '\u22d7',
+        'gtreqless;': '\u22db',
+        'gtreqqless;': '\u2a8c',
+        'gtrless;': '\u2277',
+        'gtrsim;': '\u2273',
+        'gvertneqq;': '\u2269\ufe00',
+        'gvnE;': '\u2269\ufe00',
+        'Hacek;': '\u02c7',
+        'hairsp;': '\u200a',
+        'half;': '\xbd',
+        'hamilt;': '\u210b',
+        'HARDcy;': '\u042a',
+        'hardcy;': '\u044a',
+        'hArr;': '\u21d4',
+        'harr;': '\u2194',
+        'harrcir;': '\u2948',
+        'harrw;': '\u21ad',
+        'Hat;': '^',
+        'hbar;': '\u210f',
+        'Hcirc;': '\u0124',
+        'hcirc;': '\u0125',
+        'hearts;': '\u2665',
+        'heartsuit;': '\u2665',
+        'hellip;': '\u2026',
+        'hercon;': '\u22b9',
+        'Hfr;': '\u210c',
+        'hfr;': '\U0001d525',
+        'HilbertSpace;': '\u210b',
+        'hksearow;': '\u2925',
+        'hkswarow;': '\u2926',
+        'hoarr;': '\u21ff',
+        'homtht;': '\u223b',
+        'hookleftarrow;': '\u21a9',
+        'hookrightarrow;': '\u21aa',
+        'Hopf;': '\u210d',
+        'hopf;': '\U0001d559',
+        'horbar;': '\u2015',
+        'HorizontalLine;': '\u2500',
+        'Hscr;': '\u210b',
+        'hscr;': '\U0001d4bd',
+        'hslash;': '\u210f',
+        'Hstrok;': '\u0126',
+        'hstrok;': '\u0127',
+        'HumpDownHump;': '\u224e',
+        'HumpEqual;': '\u224f',
+        'hybull;': '\u2043',
+        'hyphen;': '\u2010',
+        'Iacute': '\xcd',
+        'iacute': '\xed',
+        'Iacute;': '\xcd',
+        'iacute;': '\xed',
+        'ic;': '\u2063',
+        'Icirc': '\xce',
+        'icirc': '\xee',
+        'Icirc;': '\xce',
+        'icirc;': '\xee',
+        'Icy;': '\u0418',
+        'icy;': '\u0438',
+        'Idot;': '\u0130',
+        'IEcy;': '\u0415',
+        'iecy;': '\u0435',
+        'iexcl': '\xa1',
+        'iexcl;': '\xa1',
+        'iff;': '\u21d4',
+        'Ifr;': '\u2111',
+        'ifr;': '\U0001d526',
+        'Igrave': '\xcc',
+        'igrave': '\xec',
+        'Igrave;': '\xcc',
+        'igrave;': '\xec',
+        'ii;': '\u2148',
+        'iiiint;': '\u2a0c',
+        'iiint;': '\u222d',
+        'iinfin;': '\u29dc',
+        'iiota;': '\u2129',
+        'IJlig;': '\u0132',
+        'ijlig;': '\u0133',
+        'Im;': '\u2111',
+        'Imacr;': '\u012a',
+        'imacr;': '\u012b',
+        'image;': '\u2111',
+        'ImaginaryI;': '\u2148',
+        'imagline;': '\u2110',
+        'imagpart;': '\u2111',
+        'imath;': '\u0131',
+        'imof;': '\u22b7',
+        'imped;': '\u01b5',
+        'Implies;': '\u21d2',
+        'in;': '\u2208',
+        'incare;': '\u2105',
+        'infin;': '\u221e',
+        'infintie;': '\u29dd',
+        'inodot;': '\u0131',
+        'Int;': '\u222c',
+        'int;': '\u222b',
+        'intcal;': '\u22ba',
+        'integers;': '\u2124',
+        'Integral;': '\u222b',
+        'intercal;': '\u22ba',
+        'Intersection;': '\u22c2',
+        'intlarhk;': '\u2a17',
+        'intprod;': '\u2a3c',
+        'InvisibleComma;': '\u2063',
+        'InvisibleTimes;': '\u2062',
+        'IOcy;': '\u0401',
+        'iocy;': '\u0451',
+        'Iogon;': '\u012e',
+        'iogon;': '\u012f',
+        'Iopf;': '\U0001d540',
+        'iopf;': '\U0001d55a',
+        'Iota;': '\u0399',
+        'iota;': '\u03b9',
+        'iprod;': '\u2a3c',
+        'iquest': '\xbf',
+        'iquest;': '\xbf',
+        'Iscr;': '\u2110',
+        'iscr;': '\U0001d4be',
+        'isin;': '\u2208',
+        'isindot;': '\u22f5',
+        'isinE;': '\u22f9',
+        'isins;': '\u22f4',
+        'isinsv;': '\u22f3',
+        'isinv;': '\u2208',
+        'it;': '\u2062',
+        'Itilde;': '\u0128',
+        'itilde;': '\u0129',
+        'Iukcy;': '\u0406',
+        'iukcy;': '\u0456',
+        'Iuml': '\xcf',
+        'iuml': '\xef',
+        'Iuml;': '\xcf',
+        'iuml;': '\xef',
+        'Jcirc;': '\u0134',
+        'jcirc;': '\u0135',
+        'Jcy;': '\u0419',
+        'jcy;': '\u0439',
+        'Jfr;': '\U0001d50d',
+        'jfr;': '\U0001d527',
+        'jmath;': '\u0237',
+        'Jopf;': '\U0001d541',
+        'jopf;': '\U0001d55b',
+        'Jscr;': '\U0001d4a5',
+        'jscr;': '\U0001d4bf',
+        'Jsercy;': '\u0408',
+        'jsercy;': '\u0458',
+        'Jukcy;': '\u0404',
+        'jukcy;': '\u0454',
+        'Kappa;': '\u039a',
+        'kappa;': '\u03ba',
+        'kappav;': '\u03f0',
+        'Kcedil;': '\u0136',
+        'kcedil;': '\u0137',
+        'Kcy;': '\u041a',
+        'kcy;': '\u043a',
+        'Kfr;': '\U0001d50e',
+        'kfr;': '\U0001d528',
+        'kgreen;': '\u0138',
+        'KHcy;': '\u0425',
+        'khcy;': '\u0445',
+        'KJcy;': '\u040c',
+        'kjcy;': '\u045c',
+        'Kopf;': '\U0001d542',
+        'kopf;': '\U0001d55c',
+        'Kscr;': '\U0001d4a6',
+        'kscr;': '\U0001d4c0',
+        'lAarr;': '\u21da',
+        'Lacute;': '\u0139',
+        'lacute;': '\u013a',
+        'laemptyv;': '\u29b4',
+        'lagran;': '\u2112',
+        'Lambda;': '\u039b',
+        'lambda;': '\u03bb',
+        'Lang;': '\u27ea',
+        'lang;': '\u27e8',
+        'langd;': '\u2991',
+        'langle;': '\u27e8',
+        'lap;': '\u2a85',
+        'Laplacetrf;': '\u2112',
+        'laquo': '\xab',
+        'laquo;': '\xab',
+        'Larr;': '\u219e',
+        'lArr;': '\u21d0',
+        'larr;': '\u2190',
+        'larrb;': '\u21e4',
+        'larrbfs;': '\u291f',
+        'larrfs;': '\u291d',
+        'larrhk;': '\u21a9',
+        'larrlp;': '\u21ab',
+        'larrpl;': '\u2939',
+        'larrsim;': '\u2973',
+        'larrtl;': '\u21a2',
+        'lat;': '\u2aab',
+        'lAtail;': '\u291b',
+        'latail;': '\u2919',
+        'late;': '\u2aad',
+        'lates;': '\u2aad\ufe00',
+        'lBarr;': '\u290e',
+        'lbarr;': '\u290c',
+        'lbbrk;': '\u2772',
+        'lbrace;': '{',
+        'lbrack;': '[',
+        'lbrke;': '\u298b',
+        'lbrksld;': '\u298f',
+        'lbrkslu;': '\u298d',
+        'Lcaron;': '\u013d',
+        'lcaron;': '\u013e',
+        'Lcedil;': '\u013b',
+        'lcedil;': '\u013c',
+        'lceil;': '\u2308',
+        'lcub;': '{',
+        'Lcy;': '\u041b',
+        'lcy;': '\u043b',
+        'ldca;': '\u2936',
+        'ldquo;': '\u201c',
+        'ldquor;': '\u201e',
+        'ldrdhar;': '\u2967',
+        'ldrushar;': '\u294b',
+        'ldsh;': '\u21b2',
+        'lE;': '\u2266',
+        'le;': '\u2264',
+        'LeftAngleBracket;': '\u27e8',
+        'LeftArrow;': '\u2190',
+        'Leftarrow;': '\u21d0',
+        'leftarrow;': '\u2190',
+        'LeftArrowBar;': '\u21e4',
+        'LeftArrowRightArrow;': '\u21c6',
+        'leftarrowtail;': '\u21a2',
+        'LeftCeiling;': '\u2308',
+        'LeftDoubleBracket;': '\u27e6',
+        'LeftDownTeeVector;': '\u2961',
+        'LeftDownVector;': '\u21c3',
+        'LeftDownVectorBar;': '\u2959',
+        'LeftFloor;': '\u230a',
+        'leftharpoondown;': '\u21bd',
+        'leftharpoonup;': '\u21bc',
+        'leftleftarrows;': '\u21c7',
+        'LeftRightArrow;': '\u2194',
+        'Leftrightarrow;': '\u21d4',
+        'leftrightarrow;': '\u2194',
+        'leftrightarrows;': '\u21c6',
+        'leftrightharpoons;': '\u21cb',
+        'leftrightsquigarrow;': '\u21ad',
+        'LeftRightVector;': '\u294e',
+        'LeftTee;': '\u22a3',
+        'LeftTeeArrow;': '\u21a4',
+        'LeftTeeVector;': '\u295a',
+        'leftthreetimes;': '\u22cb',
+        'LeftTriangle;': '\u22b2',
+        'LeftTriangleBar;': '\u29cf',
+        'LeftTriangleEqual;': '\u22b4',
+        'LeftUpDownVector;': '\u2951',
+        'LeftUpTeeVector;': '\u2960',
+        'LeftUpVector;': '\u21bf',
+        'LeftUpVectorBar;': '\u2958',
+        'LeftVector;': '\u21bc',
+        'LeftVectorBar;': '\u2952',
+        'lEg;': '\u2a8b',
+        'leg;': '\u22da',
+        'leq;': '\u2264',
+        'leqq;': '\u2266',
+        'leqslant;': '\u2a7d',
+        'les;': '\u2a7d',
+        'lescc;': '\u2aa8',
+        'lesdot;': '\u2a7f',
+        'lesdoto;': '\u2a81',
+        'lesdotor;': '\u2a83',
+        'lesg;': '\u22da\ufe00',
+        'lesges;': '\u2a93',
+        'lessapprox;': '\u2a85',
+        'lessdot;': '\u22d6',
+        'lesseqgtr;': '\u22da',
+        'lesseqqgtr;': '\u2a8b',
+        'LessEqualGreater;': '\u22da',
+        'LessFullEqual;': '\u2266',
+        'LessGreater;': '\u2276',
+        'lessgtr;': '\u2276',
+        'LessLess;': '\u2aa1',
+        'lesssim;': '\u2272',
+        'LessSlantEqual;': '\u2a7d',
+        'LessTilde;': '\u2272',
+        'lfisht;': '\u297c',
+        'lfloor;': '\u230a',
+        'Lfr;': '\U0001d50f',
+        'lfr;': '\U0001d529',
+        'lg;': '\u2276',
+        'lgE;': '\u2a91',
+        'lHar;': '\u2962',
+        'lhard;': '\u21bd',
+        'lharu;': '\u21bc',
+        'lharul;': '\u296a',
+        'lhblk;': '\u2584',
+        'LJcy;': '\u0409',
+        'ljcy;': '\u0459',
+        'Ll;': '\u22d8',
+        'll;': '\u226a',
+        'llarr;': '\u21c7',
+        'llcorner;': '\u231e',
+        'Lleftarrow;': '\u21da',
+        'llhard;': '\u296b',
+        'lltri;': '\u25fa',
+        'Lmidot;': '\u013f',
+        'lmidot;': '\u0140',
+        'lmoust;': '\u23b0',
+        'lmoustache;': '\u23b0',
+        'lnap;': '\u2a89',
+        'lnapprox;': '\u2a89',
+        'lnE;': '\u2268',
+        'lne;': '\u2a87',
+        'lneq;': '\u2a87',
+        'lneqq;': '\u2268',
+        'lnsim;': '\u22e6',
+        'loang;': '\u27ec',
+        'loarr;': '\u21fd',
+        'lobrk;': '\u27e6',
+        'LongLeftArrow;': '\u27f5',
+        'Longleftarrow;': '\u27f8',
+        'longleftarrow;': '\u27f5',
+        'LongLeftRightArrow;': '\u27f7',
+        'Longleftrightarrow;': '\u27fa',
+        'longleftrightarrow;': '\u27f7',
+        'longmapsto;': '\u27fc',
+        'LongRightArrow;': '\u27f6',
+        'Longrightarrow;': '\u27f9',
+        'longrightarrow;': '\u27f6',
+        'looparrowleft;': '\u21ab',
+        'looparrowright;': '\u21ac',
+        'lopar;': '\u2985',
+        'Lopf;': '\U0001d543',
+        'lopf;': '\U0001d55d',
+        'loplus;': '\u2a2d',
+        'lotimes;': '\u2a34',
+        'lowast;': '\u2217',
+        'lowbar;': '_',
+        'LowerLeftArrow;': '\u2199',
+        'LowerRightArrow;': '\u2198',
+        'loz;': '\u25ca',
+        'lozenge;': '\u25ca',
+        'lozf;': '\u29eb',
+        'lpar;': '(',
+        'lparlt;': '\u2993',
+        'lrarr;': '\u21c6',
+        'lrcorner;': '\u231f',
+        'lrhar;': '\u21cb',
+        'lrhard;': '\u296d',
+        'lrm;': '\u200e',
+        'lrtri;': '\u22bf',
+        'lsaquo;': '\u2039',
+        'Lscr;': '\u2112',
+        'lscr;': '\U0001d4c1',
+        'Lsh;': '\u21b0',
+        'lsh;': '\u21b0',
+        'lsim;': '\u2272',
+        'lsime;': '\u2a8d',
+        'lsimg;': '\u2a8f',
+        'lsqb;': '[',
+        'lsquo;': '\u2018',
+        'lsquor;': '\u201a',
+        'Lstrok;': '\u0141',
+        'lstrok;': '\u0142',
+        'LT': '<',
+        'lt': '<',
+        'LT;': '<',
+        'Lt;': '\u226a',
+        'lt;': '<',
+        'ltcc;': '\u2aa6',
+        'ltcir;': '\u2a79',
+        'ltdot;': '\u22d6',
+        'lthree;': '\u22cb',
+        'ltimes;': '\u22c9',
+        'ltlarr;': '\u2976',
+        'ltquest;': '\u2a7b',
+        'ltri;': '\u25c3',
+        'ltrie;': '\u22b4',
+        'ltrif;': '\u25c2',
+        'ltrPar;': '\u2996',
+        'lurdshar;': '\u294a',
+        'luruhar;': '\u2966',
+        'lvertneqq;': '\u2268\ufe00',
+        'lvnE;': '\u2268\ufe00',
+        'macr': '\xaf',
+        'macr;': '\xaf',
+        'male;': '\u2642',
+        'malt;': '\u2720',
+        'maltese;': '\u2720',
+        'Map;': '\u2905',
+        'map;': '\u21a6',
+        'mapsto;': '\u21a6',
+        'mapstodown;': '\u21a7',
+        'mapstoleft;': '\u21a4',
+        'mapstoup;': '\u21a5',
+        'marker;': '\u25ae',
+        'mcomma;': '\u2a29',
+        'Mcy;': '\u041c',
+        'mcy;': '\u043c',
+        'mdash;': '\u2014',
+        'mDDot;': '\u223a',
+        'measuredangle;': '\u2221',
+        'MediumSpace;': '\u205f',
+        'Mellintrf;': '\u2133',
+        'Mfr;': '\U0001d510',
+        'mfr;': '\U0001d52a',
+        'mho;': '\u2127',
+        'micro': '\xb5',
+        'micro;': '\xb5',
+        'mid;': '\u2223',
+        'midast;': '*',
+        'midcir;': '\u2af0',
+        'middot': '\xb7',
+        'middot;': '\xb7',
+        'minus;': '\u2212',
+        'minusb;': '\u229f',
+        'minusd;': '\u2238',
+        'minusdu;': '\u2a2a',
+        'MinusPlus;': '\u2213',
+        'mlcp;': '\u2adb',
+        'mldr;': '\u2026',
+        'mnplus;': '\u2213',
+        'models;': '\u22a7',
+        'Mopf;': '\U0001d544',
+        'mopf;': '\U0001d55e',
+        'mp;': '\u2213',
+        'Mscr;': '\u2133',
+        'mscr;': '\U0001d4c2',
+        'mstpos;': '\u223e',
+        'Mu;': '\u039c',
+        'mu;': '\u03bc',
+        'multimap;': '\u22b8',
+        'mumap;': '\u22b8',
+        'nabla;': '\u2207',
+        'Nacute;': '\u0143',
+        'nacute;': '\u0144',
+        'nang;': '\u2220\u20d2',
+        'nap;': '\u2249',
+        'napE;': '\u2a70\u0338',
+        'napid;': '\u224b\u0338',
+        'napos;': '\u0149',
+        'napprox;': '\u2249',
+        'natur;': '\u266e',
+        'natural;': '\u266e',
+        'naturals;': '\u2115',
+        'nbsp': '\xa0',
+        'nbsp;': '\xa0',
+        'nbump;': '\u224e\u0338',
+        'nbumpe;': '\u224f\u0338',
+        'ncap;': '\u2a43',
+        'Ncaron;': '\u0147',
+        'ncaron;': '\u0148',
+        'Ncedil;': '\u0145',
+        'ncedil;': '\u0146',
+        'ncong;': '\u2247',
+        'ncongdot;': '\u2a6d\u0338',
+        'ncup;': '\u2a42',
+        'Ncy;': '\u041d',
+        'ncy;': '\u043d',
+        'ndash;': '\u2013',
+        'ne;': '\u2260',
+        'nearhk;': '\u2924',
+        'neArr;': '\u21d7',
+        'nearr;': '\u2197',
+        'nearrow;': '\u2197',
+        'nedot;': '\u2250\u0338',
+        'NegativeMediumSpace;': '\u200b',
+        'NegativeThickSpace;': '\u200b',
+        'NegativeThinSpace;': '\u200b',
+        'NegativeVeryThinSpace;': '\u200b',
+        'nequiv;': '\u2262',
+        'nesear;': '\u2928',
+        'nesim;': '\u2242\u0338',
+        'NestedGreaterGreater;': '\u226b',
+        'NestedLessLess;': '\u226a',
+        'NewLine;': '\n',
+        'nexist;': '\u2204',
+        'nexists;': '\u2204',
+        'Nfr;': '\U0001d511',
+        'nfr;': '\U0001d52b',
+        'ngE;': '\u2267\u0338',
+        'nge;': '\u2271',
+        'ngeq;': '\u2271',
+        'ngeqq;': '\u2267\u0338',
+        'ngeqslant;': '\u2a7e\u0338',
+        'nges;': '\u2a7e\u0338',
+        'nGg;': '\u22d9\u0338',
+        'ngsim;': '\u2275',
+        'nGt;': '\u226b\u20d2',
+        'ngt;': '\u226f',
+        'ngtr;': '\u226f',
+        'nGtv;': '\u226b\u0338',
+        'nhArr;': '\u21ce',
+        'nharr;': '\u21ae',
+        'nhpar;': '\u2af2',
+        'ni;': '\u220b',
+        'nis;': '\u22fc',
+        'nisd;': '\u22fa',
+        'niv;': '\u220b',
+        'NJcy;': '\u040a',
+        'njcy;': '\u045a',
+        'nlArr;': '\u21cd',
+        'nlarr;': '\u219a',
+        'nldr;': '\u2025',
+        'nlE;': '\u2266\u0338',
+        'nle;': '\u2270',
+        'nLeftarrow;': '\u21cd',
+        'nleftarrow;': '\u219a',
+        'nLeftrightarrow;': '\u21ce',
+        'nleftrightarrow;': '\u21ae',
+        'nleq;': '\u2270',
+        'nleqq;': '\u2266\u0338',
+        'nleqslant;': '\u2a7d\u0338',
+        'nles;': '\u2a7d\u0338',
+        'nless;': '\u226e',
+        'nLl;': '\u22d8\u0338',
+        'nlsim;': '\u2274',
+        'nLt;': '\u226a\u20d2',
+        'nlt;': '\u226e',
+        'nltri;': '\u22ea',
+        'nltrie;': '\u22ec',
+        'nLtv;': '\u226a\u0338',
+        'nmid;': '\u2224',
+        'NoBreak;': '\u2060',
+        'NonBreakingSpace;': '\xa0',
+        'Nopf;': '\u2115',
+        'nopf;': '\U0001d55f',
+        'not': '\xac',
+        'Not;': '\u2aec',
+        'not;': '\xac',
+        'NotCongruent;': '\u2262',
+        'NotCupCap;': '\u226d',
+        'NotDoubleVerticalBar;': '\u2226',
+        'NotElement;': '\u2209',
+        'NotEqual;': '\u2260',
+        'NotEqualTilde;': '\u2242\u0338',
+        'NotExists;': '\u2204',
+        'NotGreater;': '\u226f',
+        'NotGreaterEqual;': '\u2271',
+        'NotGreaterFullEqual;': '\u2267\u0338',
+        'NotGreaterGreater;': '\u226b\u0338',
+        'NotGreaterLess;': '\u2279',
+        'NotGreaterSlantEqual;': '\u2a7e\u0338',
+        'NotGreaterTilde;': '\u2275',
+        'NotHumpDownHump;': '\u224e\u0338',
+        'NotHumpEqual;': '\u224f\u0338',
+        'notin;': '\u2209',
+        'notindot;': '\u22f5\u0338',
+        'notinE;': '\u22f9\u0338',
+        'notinva;': '\u2209',
+        'notinvb;': '\u22f7',
+        'notinvc;': '\u22f6',
+        'NotLeftTriangle;': '\u22ea',
+        'NotLeftTriangleBar;': '\u29cf\u0338',
+        'NotLeftTriangleEqual;': '\u22ec',
+        'NotLess;': '\u226e',
+        'NotLessEqual;': '\u2270',
+        'NotLessGreater;': '\u2278',
+        'NotLessLess;': '\u226a\u0338',
+        'NotLessSlantEqual;': '\u2a7d\u0338',
+        'NotLessTilde;': '\u2274',
+        'NotNestedGreaterGreater;': '\u2aa2\u0338',
+        'NotNestedLessLess;': '\u2aa1\u0338',
+        'notni;': '\u220c',
+        'notniva;': '\u220c',
+        'notnivb;': '\u22fe',
+        'notnivc;': '\u22fd',
+        'NotPrecedes;': '\u2280',
+        'NotPrecedesEqual;': '\u2aaf\u0338',
+        'NotPrecedesSlantEqual;': '\u22e0',
+        'NotReverseElement;': '\u220c',
+        'NotRightTriangle;': '\u22eb',
+        'NotRightTriangleBar;': '\u29d0\u0338',
+        'NotRightTriangleEqual;': '\u22ed',
+        'NotSquareSubset;': '\u228f\u0338',
+        'NotSquareSubsetEqual;': '\u22e2',
+        'NotSquareSuperset;': '\u2290\u0338',
+        'NotSquareSupersetEqual;': '\u22e3',
+        'NotSubset;': '\u2282\u20d2',
+        'NotSubsetEqual;': '\u2288',
+        'NotSucceeds;': '\u2281',
+        'NotSucceedsEqual;': '\u2ab0\u0338',
+        'NotSucceedsSlantEqual;': '\u22e1',
+        'NotSucceedsTilde;': '\u227f\u0338',
+        'NotSuperset;': '\u2283\u20d2',
+        'NotSupersetEqual;': '\u2289',
+        'NotTilde;': '\u2241',
+        'NotTildeEqual;': '\u2244',
+        'NotTildeFullEqual;': '\u2247',
+        'NotTildeTilde;': '\u2249',
+        'NotVerticalBar;': '\u2224',
+        'npar;': '\u2226',
+        'nparallel;': '\u2226',
+        'nparsl;': '\u2afd\u20e5',
+        'npart;': '\u2202\u0338',
+        'npolint;': '\u2a14',
+        'npr;': '\u2280',
+        'nprcue;': '\u22e0',
+        'npre;': '\u2aaf\u0338',
+        'nprec;': '\u2280',
+        'npreceq;': '\u2aaf\u0338',
+        'nrArr;': '\u21cf',
+        'nrarr;': '\u219b',
+        'nrarrc;': '\u2933\u0338',
+        'nrarrw;': '\u219d\u0338',
+        'nRightarrow;': '\u21cf',
+        'nrightarrow;': '\u219b',
+        'nrtri;': '\u22eb',
+        'nrtrie;': '\u22ed',
+        'nsc;': '\u2281',
+        'nsccue;': '\u22e1',
+        'nsce;': '\u2ab0\u0338',
+        'Nscr;': '\U0001d4a9',
+        'nscr;': '\U0001d4c3',
+        'nshortmid;': '\u2224',
+        'nshortparallel;': '\u2226',
+        'nsim;': '\u2241',
+        'nsime;': '\u2244',
+        'nsimeq;': '\u2244',
+        'nsmid;': '\u2224',
+        'nspar;': '\u2226',
+        'nsqsube;': '\u22e2',
+        'nsqsupe;': '\u22e3',
+        'nsub;': '\u2284',
+        'nsubE;': '\u2ac5\u0338',
+        'nsube;': '\u2288',
+        'nsubset;': '\u2282\u20d2',
+        'nsubseteq;': '\u2288',
+        'nsubseteqq;': '\u2ac5\u0338',
+        'nsucc;': '\u2281',
+        'nsucceq;': '\u2ab0\u0338',
+        'nsup;': '\u2285',
+        'nsupE;': '\u2ac6\u0338',
+        'nsupe;': '\u2289',
+        'nsupset;': '\u2283\u20d2',
+        'nsupseteq;': '\u2289',
+        'nsupseteqq;': '\u2ac6\u0338',
+        'ntgl;': '\u2279',
+        'Ntilde': '\xd1',
+        'ntilde': '\xf1',
+        'Ntilde;': '\xd1',
+        'ntilde;': '\xf1',
+        'ntlg;': '\u2278',
+        'ntriangleleft;': '\u22ea',
+        'ntrianglelefteq;': '\u22ec',
+        'ntriangleright;': '\u22eb',
+        'ntrianglerighteq;': '\u22ed',
+        'Nu;': '\u039d',
+        'nu;': '\u03bd',
+        'num;': '#',
+        'numero;': '\u2116',
+        'numsp;': '\u2007',
+        'nvap;': '\u224d\u20d2',
+        'nVDash;': '\u22af',
+        'nVdash;': '\u22ae',
+        'nvDash;': '\u22ad',
+        'nvdash;': '\u22ac',
+        'nvge;': '\u2265\u20d2',
+        'nvgt;': '>\u20d2',
+        'nvHarr;': '\u2904',
+        'nvinfin;': '\u29de',
+        'nvlArr;': '\u2902',
+        'nvle;': '\u2264\u20d2',
+        'nvlt;': '<\u20d2',
+        'nvltrie;': '\u22b4\u20d2',
+        'nvrArr;': '\u2903',
+        'nvrtrie;': '\u22b5\u20d2',
+        'nvsim;': '\u223c\u20d2',
+        'nwarhk;': '\u2923',
+        'nwArr;': '\u21d6',
+        'nwarr;': '\u2196',
+        'nwarrow;': '\u2196',
+        'nwnear;': '\u2927',
+        'Oacute': '\xd3',
+        'oacute': '\xf3',
+        'Oacute;': '\xd3',
+        'oacute;': '\xf3',
+        'oast;': '\u229b',
+        'ocir;': '\u229a',
+        'Ocirc': '\xd4',
+        'ocirc': '\xf4',
+        'Ocirc;': '\xd4',
+        'ocirc;': '\xf4',
+        'Ocy;': '\u041e',
+        'ocy;': '\u043e',
+        'odash;': '\u229d',
+        'Odblac;': '\u0150',
+        'odblac;': '\u0151',
+        'odiv;': '\u2a38',
+        'odot;': '\u2299',
+        'odsold;': '\u29bc',
+        'OElig;': '\u0152',
+        'oelig;': '\u0153',
+        'ofcir;': '\u29bf',
+        'Ofr;': '\U0001d512',
+        'ofr;': '\U0001d52c',
+        'ogon;': '\u02db',
+        'Ograve': '\xd2',
+        'ograve': '\xf2',
+        'Ograve;': '\xd2',
+        'ograve;': '\xf2',
+        'ogt;': '\u29c1',
+        'ohbar;': '\u29b5',
+        'ohm;': '\u03a9',
+        'oint;': '\u222e',
+        'olarr;': '\u21ba',
+        'olcir;': '\u29be',
+        'olcross;': '\u29bb',
+        'oline;': '\u203e',
+        'olt;': '\u29c0',
+        'Omacr;': '\u014c',
+        'omacr;': '\u014d',
+        'Omega;': '\u03a9',
+        'omega;': '\u03c9',
+        'Omicron;': '\u039f',
+        'omicron;': '\u03bf',
+        'omid;': '\u29b6',
+        'ominus;': '\u2296',
+        'Oopf;': '\U0001d546',
+        'oopf;': '\U0001d560',
+        'opar;': '\u29b7',
+        'OpenCurlyDoubleQuote;': '\u201c',
+        'OpenCurlyQuote;': '\u2018',
+        'operp;': '\u29b9',
+        'oplus;': '\u2295',
+        'Or;': '\u2a54',
+        'or;': '\u2228',
+        'orarr;': '\u21bb',
+        'ord;': '\u2a5d',
+        'order;': '\u2134',
+        'orderof;': '\u2134',
+        'ordf': '\xaa',
+        'ordf;': '\xaa',
+        'ordm': '\xba',
+        'ordm;': '\xba',
+        'origof;': '\u22b6',
+        'oror;': '\u2a56',
+        'orslope;': '\u2a57',
+        'orv;': '\u2a5b',
+        'oS;': '\u24c8',
+        'Oscr;': '\U0001d4aa',
+        'oscr;': '\u2134',
+        'Oslash': '\xd8',
+        'oslash': '\xf8',
+        'Oslash;': '\xd8',
+        'oslash;': '\xf8',
+        'osol;': '\u2298',
+        'Otilde': '\xd5',
+        'otilde': '\xf5',
+        'Otilde;': '\xd5',
+        'otilde;': '\xf5',
+        'Otimes;': '\u2a37',
+        'otimes;': '\u2297',
+        'otimesas;': '\u2a36',
+        'Ouml': '\xd6',
+        'ouml': '\xf6',
+        'Ouml;': '\xd6',
+        'ouml;': '\xf6',
+        'ovbar;': '\u233d',
+        'OverBar;': '\u203e',
+        'OverBrace;': '\u23de',
+        'OverBracket;': '\u23b4',
+        'OverParenthesis;': '\u23dc',
+        'par;': '\u2225',
+        'para': '\xb6',
+        'para;': '\xb6',
+        'parallel;': '\u2225',
+        'parsim;': '\u2af3',
+        'parsl;': '\u2afd',
+        'part;': '\u2202',
+        'PartialD;': '\u2202',
+        'Pcy;': '\u041f',
+        'pcy;': '\u043f',
+        'percnt;': '%',
+        'period;': '.',
+        'permil;': '\u2030',
+        'perp;': '\u22a5',
+        'pertenk;': '\u2031',
+        'Pfr;': '\U0001d513',
+        'pfr;': '\U0001d52d',
+        'Phi;': '\u03a6',
+        'phi;': '\u03c6',
+        'phiv;': '\u03d5',
+        'phmmat;': '\u2133',
+        'phone;': '\u260e',
+        'Pi;': '\u03a0',
+        'pi;': '\u03c0',
+        'pitchfork;': '\u22d4',
+        'piv;': '\u03d6',
+        'planck;': '\u210f',
+        'planckh;': '\u210e',
+        'plankv;': '\u210f',
+        'plus;': '+',
+        'plusacir;': '\u2a23',
+        'plusb;': '\u229e',
+        'pluscir;': '\u2a22',
+        'plusdo;': '\u2214',
+        'plusdu;': '\u2a25',
+        'pluse;': '\u2a72',
+        'PlusMinus;': '\xb1',
+        'plusmn': '\xb1',
+        'plusmn;': '\xb1',
+        'plussim;': '\u2a26',
+        'plustwo;': '\u2a27',
+        'pm;': '\xb1',
+        'Poincareplane;': '\u210c',
+        'pointint;': '\u2a15',
+        'Popf;': '\u2119',
+        'popf;': '\U0001d561',
+        'pound': '\xa3',
+        'pound;': '\xa3',
+        'Pr;': '\u2abb',
+        'pr;': '\u227a',
+        'prap;': '\u2ab7',
+        'prcue;': '\u227c',
+        'prE;': '\u2ab3',
+        'pre;': '\u2aaf',
+        'prec;': '\u227a',
+        'precapprox;': '\u2ab7',
+        'preccurlyeq;': '\u227c',
+        'Precedes;': '\u227a',
+        'PrecedesEqual;': '\u2aaf',
+        'PrecedesSlantEqual;': '\u227c',
+        'PrecedesTilde;': '\u227e',
+        'preceq;': '\u2aaf',
+        'precnapprox;': '\u2ab9',
+        'precneqq;': '\u2ab5',
+        'precnsim;': '\u22e8',
+        'precsim;': '\u227e',
+        'Prime;': '\u2033',
+        'prime;': '\u2032',
+        'primes;': '\u2119',
+        'prnap;': '\u2ab9',
+        'prnE;': '\u2ab5',
+        'prnsim;': '\u22e8',
+        'prod;': '\u220f',
+        'Product;': '\u220f',
+        'profalar;': '\u232e',
+        'profline;': '\u2312',
+        'profsurf;': '\u2313',
+        'prop;': '\u221d',
+        'Proportion;': '\u2237',
+        'Proportional;': '\u221d',
+        'propto;': '\u221d',
+        'prsim;': '\u227e',
+        'prurel;': '\u22b0',
+        'Pscr;': '\U0001d4ab',
+        'pscr;': '\U0001d4c5',
+        'Psi;': '\u03a8',
+        'psi;': '\u03c8',
+        'puncsp;': '\u2008',
+        'Qfr;': '\U0001d514',
+        'qfr;': '\U0001d52e',
+        'qint;': '\u2a0c',
+        'Qopf;': '\u211a',
+        'qopf;': '\U0001d562',
+        'qprime;': '\u2057',
+        'Qscr;': '\U0001d4ac',
+        'qscr;': '\U0001d4c6',
+        'quaternions;': '\u210d',
+        'quatint;': '\u2a16',
+        'quest;': '?',
+        'questeq;': '\u225f',
+        'QUOT': '"',
+        'quot': '"',
+        'QUOT;': '"',
+        'quot;': '"',
+        'rAarr;': '\u21db',
+        'race;': '\u223d\u0331',
+        'Racute;': '\u0154',
+        'racute;': '\u0155',
+        'radic;': '\u221a',
+        'raemptyv;': '\u29b3',
+        'Rang;': '\u27eb',
+        'rang;': '\u27e9',
+        'rangd;': '\u2992',
+        'range;': '\u29a5',
+        'rangle;': '\u27e9',
+        'raquo': '\xbb',
+        'raquo;': '\xbb',
+        'Rarr;': '\u21a0',
+        'rArr;': '\u21d2',
+        'rarr;': '\u2192',
+        'rarrap;': '\u2975',
+        'rarrb;': '\u21e5',
+        'rarrbfs;': '\u2920',
+        'rarrc;': '\u2933',
+        'rarrfs;': '\u291e',
+        'rarrhk;': '\u21aa',
+        'rarrlp;': '\u21ac',
+        'rarrpl;': '\u2945',
+        'rarrsim;': '\u2974',
+        'Rarrtl;': '\u2916',
+        'rarrtl;': '\u21a3',
+        'rarrw;': '\u219d',
+        'rAtail;': '\u291c',
+        'ratail;': '\u291a',
+        'ratio;': '\u2236',
+        'rationals;': '\u211a',
+        'RBarr;': '\u2910',
+        'rBarr;': '\u290f',
+        'rbarr;': '\u290d',
+        'rbbrk;': '\u2773',
+        'rbrace;': '}',
+        'rbrack;': ']',
+        'rbrke;': '\u298c',
+        'rbrksld;': '\u298e',
+        'rbrkslu;': '\u2990',
+        'Rcaron;': '\u0158',
+        'rcaron;': '\u0159',
+        'Rcedil;': '\u0156',
+        'rcedil;': '\u0157',
+        'rceil;': '\u2309',
+        'rcub;': '}',
+        'Rcy;': '\u0420',
+        'rcy;': '\u0440',
+        'rdca;': '\u2937',
+        'rdldhar;': '\u2969',
+        'rdquo;': '\u201d',
+        'rdquor;': '\u201d',
+        'rdsh;': '\u21b3',
+        'Re;': '\u211c',
+        'real;': '\u211c',
+        'realine;': '\u211b',
+        'realpart;': '\u211c',
+        'reals;': '\u211d',
+        'rect;': '\u25ad',
+        'REG': '\xae',
+        'reg': '\xae',
+        'REG;': '\xae',
+        'reg;': '\xae',
+        'ReverseElement;': '\u220b',
+        'ReverseEquilibrium;': '\u21cb',
+        'ReverseUpEquilibrium;': '\u296f',
+        'rfisht;': '\u297d',
+        'rfloor;': '\u230b',
+        'Rfr;': '\u211c',
+        'rfr;': '\U0001d52f',
+        'rHar;': '\u2964',
+        'rhard;': '\u21c1',
+        'rharu;': '\u21c0',
+        'rharul;': '\u296c',
+        'Rho;': '\u03a1',
+        'rho;': '\u03c1',
+        'rhov;': '\u03f1',
+        'RightAngleBracket;': '\u27e9',
+        'RightArrow;': '\u2192',
+        'Rightarrow;': '\u21d2',
+        'rightarrow;': '\u2192',
+        'RightArrowBar;': '\u21e5',
+        'RightArrowLeftArrow;': '\u21c4',
+        'rightarrowtail;': '\u21a3',
+        'RightCeiling;': '\u2309',
+        'RightDoubleBracket;': '\u27e7',
+        'RightDownTeeVector;': '\u295d',
+        'RightDownVector;': '\u21c2',
+        'RightDownVectorBar;': '\u2955',
+        'RightFloor;': '\u230b',
+        'rightharpoondown;': '\u21c1',
+        'rightharpoonup;': '\u21c0',
+        'rightleftarrows;': '\u21c4',
+        'rightleftharpoons;': '\u21cc',
+        'rightrightarrows;': '\u21c9',
+        'rightsquigarrow;': '\u219d',
+        'RightTee;': '\u22a2',
+        'RightTeeArrow;': '\u21a6',
+        'RightTeeVector;': '\u295b',
+        'rightthreetimes;': '\u22cc',
+        'RightTriangle;': '\u22b3',
+        'RightTriangleBar;': '\u29d0',
+        'RightTriangleEqual;': '\u22b5',
+        'RightUpDownVector;': '\u294f',
+        'RightUpTeeVector;': '\u295c',
+        'RightUpVector;': '\u21be',
+        'RightUpVectorBar;': '\u2954',
+        'RightVector;': '\u21c0',
+        'RightVectorBar;': '\u2953',
+        'ring;': '\u02da',
+        'risingdotseq;': '\u2253',
+        'rlarr;': '\u21c4',
+        'rlhar;': '\u21cc',
+        'rlm;': '\u200f',
+        'rmoust;': '\u23b1',
+        'rmoustache;': '\u23b1',
+        'rnmid;': '\u2aee',
+        'roang;': '\u27ed',
+        'roarr;': '\u21fe',
+        'robrk;': '\u27e7',
+        'ropar;': '\u2986',
+        'Ropf;': '\u211d',
+        'ropf;': '\U0001d563',
+        'roplus;': '\u2a2e',
+        'rotimes;': '\u2a35',
+        'RoundImplies;': '\u2970',
+        'rpar;': ')',
+        'rpargt;': '\u2994',
+        'rppolint;': '\u2a12',
+        'rrarr;': '\u21c9',
+        'Rrightarrow;': '\u21db',
+        'rsaquo;': '\u203a',
+        'Rscr;': '\u211b',
+        'rscr;': '\U0001d4c7',
+        'Rsh;': '\u21b1',
+        'rsh;': '\u21b1',
+        'rsqb;': ']',
+        'rsquo;': '\u2019',
+        'rsquor;': '\u2019',
+        'rthree;': '\u22cc',
+        'rtimes;': '\u22ca',
+        'rtri;': '\u25b9',
+        'rtrie;': '\u22b5',
+        'rtrif;': '\u25b8',
+        'rtriltri;': '\u29ce',
+        'RuleDelayed;': '\u29f4',
+        'ruluhar;': '\u2968',
+        'rx;': '\u211e',
+        'Sacute;': '\u015a',
+        'sacute;': '\u015b',
+        'sbquo;': '\u201a',
+        'Sc;': '\u2abc',
+        'sc;': '\u227b',
+        'scap;': '\u2ab8',
+        'Scaron;': '\u0160',
+        'scaron;': '\u0161',
+        'sccue;': '\u227d',
+        'scE;': '\u2ab4',
+        'sce;': '\u2ab0',
+        'Scedil;': '\u015e',
+        'scedil;': '\u015f',
+        'Scirc;': '\u015c',
+        'scirc;': '\u015d',
+        'scnap;': '\u2aba',
+        'scnE;': '\u2ab6',
+        'scnsim;': '\u22e9',
+        'scpolint;': '\u2a13',
+        'scsim;': '\u227f',
+        'Scy;': '\u0421',
+        'scy;': '\u0441',
+        'sdot;': '\u22c5',
+        'sdotb;': '\u22a1',
+        'sdote;': '\u2a66',
+        'searhk;': '\u2925',
+        'seArr;': '\u21d8',
+        'searr;': '\u2198',
+        'searrow;': '\u2198',
+        'sect': '\xa7',
+        'sect;': '\xa7',
+        'semi;': ';',
+        'seswar;': '\u2929',
+        'setminus;': '\u2216',
+        'setmn;': '\u2216',
+        'sext;': '\u2736',
+        'Sfr;': '\U0001d516',
+        'sfr;': '\U0001d530',
+        'sfrown;': '\u2322',
+        'sharp;': '\u266f',
+        'SHCHcy;': '\u0429',
+        'shchcy;': '\u0449',
+        'SHcy;': '\u0428',
+        'shcy;': '\u0448',
+        'ShortDownArrow;': '\u2193',
+        'ShortLeftArrow;': '\u2190',
+        'shortmid;': '\u2223',
+        'shortparallel;': '\u2225',
+        'ShortRightArrow;': '\u2192',
+        'ShortUpArrow;': '\u2191',
+        'shy': '\xad',
+        'shy;': '\xad',
+        'Sigma;': '\u03a3',
+        'sigma;': '\u03c3',
+        'sigmaf;': '\u03c2',
+        'sigmav;': '\u03c2',
+        'sim;': '\u223c',
+        'simdot;': '\u2a6a',
+        'sime;': '\u2243',
+        'simeq;': '\u2243',
+        'simg;': '\u2a9e',
+        'simgE;': '\u2aa0',
+        'siml;': '\u2a9d',
+        'simlE;': '\u2a9f',
+        'simne;': '\u2246',
+        'simplus;': '\u2a24',
+        'simrarr;': '\u2972',
+        'slarr;': '\u2190',
+        'SmallCircle;': '\u2218',
+        'smallsetminus;': '\u2216',
+        'smashp;': '\u2a33',
+        'smeparsl;': '\u29e4',
+        'smid;': '\u2223',
+        'smile;': '\u2323',
+        'smt;': '\u2aaa',
+        'smte;': '\u2aac',
+        'smtes;': '\u2aac\ufe00',
+        'SOFTcy;': '\u042c',
+        'softcy;': '\u044c',
+        'sol;': '/',
+        'solb;': '\u29c4',
+        'solbar;': '\u233f',
+        'Sopf;': '\U0001d54a',
+        'sopf;': '\U0001d564',
+        'spades;': '\u2660',
+        'spadesuit;': '\u2660',
+        'spar;': '\u2225',
+        'sqcap;': '\u2293',
+        'sqcaps;': '\u2293\ufe00',
+        'sqcup;': '\u2294',
+        'sqcups;': '\u2294\ufe00',
+        'Sqrt;': '\u221a',
+        'sqsub;': '\u228f',
+        'sqsube;': '\u2291',
+        'sqsubset;': '\u228f',
+        'sqsubseteq;': '\u2291',
+        'sqsup;': '\u2290',
+        'sqsupe;': '\u2292',
+        'sqsupset;': '\u2290',
+        'sqsupseteq;': '\u2292',
+        'squ;': '\u25a1',
+        'Square;': '\u25a1',
+        'square;': '\u25a1',
+        'SquareIntersection;': '\u2293',
+        'SquareSubset;': '\u228f',
+        'SquareSubsetEqual;': '\u2291',
+        'SquareSuperset;': '\u2290',
+        'SquareSupersetEqual;': '\u2292',
+        'SquareUnion;': '\u2294',
+        'squarf;': '\u25aa',
+        'squf;': '\u25aa',
+        'srarr;': '\u2192',
+        'Sscr;': '\U0001d4ae',
+        'sscr;': '\U0001d4c8',
+        'ssetmn;': '\u2216',
+        'ssmile;': '\u2323',
+        'sstarf;': '\u22c6',
+        'Star;': '\u22c6',
+        'star;': '\u2606',
+        'starf;': '\u2605',
+        'straightepsilon;': '\u03f5',
+        'straightphi;': '\u03d5',
+        'strns;': '\xaf',
+        'Sub;': '\u22d0',
+        'sub;': '\u2282',
+        'subdot;': '\u2abd',
+        'subE;': '\u2ac5',
+        'sube;': '\u2286',
+        'subedot;': '\u2ac3',
+        'submult;': '\u2ac1',
+        'subnE;': '\u2acb',
+        'subne;': '\u228a',
+        'subplus;': '\u2abf',
+        'subrarr;': '\u2979',
+        'Subset;': '\u22d0',
+        'subset;': '\u2282',
+        'subseteq;': '\u2286',
+        'subseteqq;': '\u2ac5',
+        'SubsetEqual;': '\u2286',
+        'subsetneq;': '\u228a',
+        'subsetneqq;': '\u2acb',
+        'subsim;': '\u2ac7',
+        'subsub;': '\u2ad5',
+        'subsup;': '\u2ad3',
+        'succ;': '\u227b',
+        'succapprox;': '\u2ab8',
+        'succcurlyeq;': '\u227d',
+        'Succeeds;': '\u227b',
+        'SucceedsEqual;': '\u2ab0',
+        'SucceedsSlantEqual;': '\u227d',
+        'SucceedsTilde;': '\u227f',
+        'succeq;': '\u2ab0',
+        'succnapprox;': '\u2aba',
+        'succneqq;': '\u2ab6',
+        'succnsim;': '\u22e9',
+        'succsim;': '\u227f',
+        'SuchThat;': '\u220b',
+        'Sum;': '\u2211',
+        'sum;': '\u2211',
+        'sung;': '\u266a',
+        'sup1': '\xb9',
+        'sup1;': '\xb9',
+        'sup2': '\xb2',
+        'sup2;': '\xb2',
+        'sup3': '\xb3',
+        'sup3;': '\xb3',
+        'Sup;': '\u22d1',
+        'sup;': '\u2283',
+        'supdot;': '\u2abe',
+        'supdsub;': '\u2ad8',
+        'supE;': '\u2ac6',
+        'supe;': '\u2287',
+        'supedot;': '\u2ac4',
+        'Superset;': '\u2283',
+        'SupersetEqual;': '\u2287',
+        'suphsol;': '\u27c9',
+        'suphsub;': '\u2ad7',
+        'suplarr;': '\u297b',
+        'supmult;': '\u2ac2',
+        'supnE;': '\u2acc',
+        'supne;': '\u228b',
+        'supplus;': '\u2ac0',
+        'Supset;': '\u22d1',
+        'supset;': '\u2283',
+        'supseteq;': '\u2287',
+        'supseteqq;': '\u2ac6',
+        'supsetneq;': '\u228b',
+        'supsetneqq;': '\u2acc',
+        'supsim;': '\u2ac8',
+        'supsub;': '\u2ad4',
+        'supsup;': '\u2ad6',
+        'swarhk;': '\u2926',
+        'swArr;': '\u21d9',
+        'swarr;': '\u2199',
+        'swarrow;': '\u2199',
+        'swnwar;': '\u292a',
+        'szlig': '\xdf',
+        'szlig;': '\xdf',
+        'Tab;': '\t',
+        'target;': '\u2316',
+        'Tau;': '\u03a4',
+        'tau;': '\u03c4',
+        'tbrk;': '\u23b4',
+        'Tcaron;': '\u0164',
+        'tcaron;': '\u0165',
+        'Tcedil;': '\u0162',
+        'tcedil;': '\u0163',
+        'Tcy;': '\u0422',
+        'tcy;': '\u0442',
+        'tdot;': '\u20db',
+        'telrec;': '\u2315',
+        'Tfr;': '\U0001d517',
+        'tfr;': '\U0001d531',
+        'there4;': '\u2234',
+        'Therefore;': '\u2234',
+        'therefore;': '\u2234',
+        'Theta;': '\u0398',
+        'theta;': '\u03b8',
+        'thetasym;': '\u03d1',
+        'thetav;': '\u03d1',
+        'thickapprox;': '\u2248',
+        'thicksim;': '\u223c',
+        'ThickSpace;': '\u205f\u200a',
+        'thinsp;': '\u2009',
+        'ThinSpace;': '\u2009',
+        'thkap;': '\u2248',
+        'thksim;': '\u223c',
+        'THORN': '\xde',
+        'thorn': '\xfe',
+        'THORN;': '\xde',
+        'thorn;': '\xfe',
+        'Tilde;': '\u223c',
+        'tilde;': '\u02dc',
+        'TildeEqual;': '\u2243',
+        'TildeFullEqual;': '\u2245',
+        'TildeTilde;': '\u2248',
+        'times': '\xd7',
+        'times;': '\xd7',
+        'timesb;': '\u22a0',
+        'timesbar;': '\u2a31',
+        'timesd;': '\u2a30',
+        'tint;': '\u222d',
+        'toea;': '\u2928',
+        'top;': '\u22a4',
+        'topbot;': '\u2336',
+        'topcir;': '\u2af1',
+        'Topf;': '\U0001d54b',
+        'topf;': '\U0001d565',
+        'topfork;': '\u2ada',
+        'tosa;': '\u2929',
+        'tprime;': '\u2034',
+        'TRADE;': '\u2122',
+        'trade;': '\u2122',
+        'triangle;': '\u25b5',
+        'triangledown;': '\u25bf',
+        'triangleleft;': '\u25c3',
+        'trianglelefteq;': '\u22b4',
+        'triangleq;': '\u225c',
+        'triangleright;': '\u25b9',
+        'trianglerighteq;': '\u22b5',
+        'tridot;': '\u25ec',
+        'trie;': '\u225c',
+        'triminus;': '\u2a3a',
+        'TripleDot;': '\u20db',
+        'triplus;': '\u2a39',
+        'trisb;': '\u29cd',
+        'tritime;': '\u2a3b',
+        'trpezium;': '\u23e2',
+        'Tscr;': '\U0001d4af',
+        'tscr;': '\U0001d4c9',
+        'TScy;': '\u0426',
+        'tscy;': '\u0446',
+        'TSHcy;': '\u040b',
+        'tshcy;': '\u045b',
+        'Tstrok;': '\u0166',
+        'tstrok;': '\u0167',
+        'twixt;': '\u226c',
+        'twoheadleftarrow;': '\u219e',
+        'twoheadrightarrow;': '\u21a0',
+        'Uacute': '\xda',
+        'uacute': '\xfa',
+        'Uacute;': '\xda',
+        'uacute;': '\xfa',
+        'Uarr;': '\u219f',
+        'uArr;': '\u21d1',
+        'uarr;': '\u2191',
+        'Uarrocir;': '\u2949',
+        'Ubrcy;': '\u040e',
+        'ubrcy;': '\u045e',
+        'Ubreve;': '\u016c',
+        'ubreve;': '\u016d',
+        'Ucirc': '\xdb',
+        'ucirc': '\xfb',
+        'Ucirc;': '\xdb',
+        'ucirc;': '\xfb',
+        'Ucy;': '\u0423',
+        'ucy;': '\u0443',
+        'udarr;': '\u21c5',
+        'Udblac;': '\u0170',
+        'udblac;': '\u0171',
+        'udhar;': '\u296e',
+        'ufisht;': '\u297e',
+        'Ufr;': '\U0001d518',
+        'ufr;': '\U0001d532',
+        'Ugrave': '\xd9',
+        'ugrave': '\xf9',
+        'Ugrave;': '\xd9',
+        'ugrave;': '\xf9',
+        'uHar;': '\u2963',
+        'uharl;': '\u21bf',
+        'uharr;': '\u21be',
+        'uhblk;': '\u2580',
+        'ulcorn;': '\u231c',
+        'ulcorner;': '\u231c',
+        'ulcrop;': '\u230f',
+        'ultri;': '\u25f8',
+        'Umacr;': '\u016a',
+        'umacr;': '\u016b',
+        'uml': '\xa8',
+        'uml;': '\xa8',
+        'UnderBar;': '_',
+        'UnderBrace;': '\u23df',
+        'UnderBracket;': '\u23b5',
+        'UnderParenthesis;': '\u23dd',
+        'Union;': '\u22c3',
+        'UnionPlus;': '\u228e',
+        'Uogon;': '\u0172',
+        'uogon;': '\u0173',
+        'Uopf;': '\U0001d54c',
+        'uopf;': '\U0001d566',
+        'UpArrow;': '\u2191',
+        'Uparrow;': '\u21d1',
+        'uparrow;': '\u2191',
+        'UpArrowBar;': '\u2912',
+        'UpArrowDownArrow;': '\u21c5',
+        'UpDownArrow;': '\u2195',
+        'Updownarrow;': '\u21d5',
+        'updownarrow;': '\u2195',
+        'UpEquilibrium;': '\u296e',
+        'upharpoonleft;': '\u21bf',
+        'upharpoonright;': '\u21be',
+        'uplus;': '\u228e',
+        'UpperLeftArrow;': '\u2196',
+        'UpperRightArrow;': '\u2197',
+        'Upsi;': '\u03d2',
+        'upsi;': '\u03c5',
+        'upsih;': '\u03d2',
+        'Upsilon;': '\u03a5',
+        'upsilon;': '\u03c5',
+        'UpTee;': '\u22a5',
+        'UpTeeArrow;': '\u21a5',
+        'upuparrows;': '\u21c8',
+        'urcorn;': '\u231d',
+        'urcorner;': '\u231d',
+        'urcrop;': '\u230e',
+        'Uring;': '\u016e',
+        'uring;': '\u016f',
+        'urtri;': '\u25f9',
+        'Uscr;': '\U0001d4b0',
+        'uscr;': '\U0001d4ca',
+        'utdot;': '\u22f0',
+        'Utilde;': '\u0168',
+        'utilde;': '\u0169',
+        'utri;': '\u25b5',
+        'utrif;': '\u25b4',
+        'uuarr;': '\u21c8',
+        'Uuml': '\xdc',
+        'uuml': '\xfc',
+        'Uuml;': '\xdc',
+        'uuml;': '\xfc',
+        'uwangle;': '\u29a7',
+        'vangrt;': '\u299c',
+        'varepsilon;': '\u03f5',
+        'varkappa;': '\u03f0',
+        'varnothing;': '\u2205',
+        'varphi;': '\u03d5',
+        'varpi;': '\u03d6',
+        'varpropto;': '\u221d',
+        'vArr;': '\u21d5',
+        'varr;': '\u2195',
+        'varrho;': '\u03f1',
+        'varsigma;': '\u03c2',
+        'varsubsetneq;': '\u228a\ufe00',
+        'varsubsetneqq;': '\u2acb\ufe00',
+        'varsupsetneq;': '\u228b\ufe00',
+        'varsupsetneqq;': '\u2acc\ufe00',
+        'vartheta;': '\u03d1',
+        'vartriangleleft;': '\u22b2',
+        'vartriangleright;': '\u22b3',
+        'Vbar;': '\u2aeb',
+        'vBar;': '\u2ae8',
+        'vBarv;': '\u2ae9',
+        'Vcy;': '\u0412',
+        'vcy;': '\u0432',
+        'VDash;': '\u22ab',
+        'Vdash;': '\u22a9',
+        'vDash;': '\u22a8',
+        'vdash;': '\u22a2',
+        'Vdashl;': '\u2ae6',
+        'Vee;': '\u22c1',
+        'vee;': '\u2228',
+        'veebar;': '\u22bb',
+        'veeeq;': '\u225a',
+        'vellip;': '\u22ee',
+        'Verbar;': '\u2016',
+        'verbar;': '|',
+        'Vert;': '\u2016',
+        'vert;': '|',
+        'VerticalBar;': '\u2223',
+        'VerticalLine;': '|',
+        'VerticalSeparator;': '\u2758',
+        'VerticalTilde;': '\u2240',
+        'VeryThinSpace;': '\u200a',
+        'Vfr;': '\U0001d519',
+        'vfr;': '\U0001d533',
+        'vltri;': '\u22b2',
+        'vnsub;': '\u2282\u20d2',
+        'vnsup;': '\u2283\u20d2',
+        'Vopf;': '\U0001d54d',
+        'vopf;': '\U0001d567',
+        'vprop;': '\u221d',
+        'vrtri;': '\u22b3',
+        'Vscr;': '\U0001d4b1',
+        'vscr;': '\U0001d4cb',
+        'vsubnE;': '\u2acb\ufe00',
+        'vsubne;': '\u228a\ufe00',
+        'vsupnE;': '\u2acc\ufe00',
+        'vsupne;': '\u228b\ufe00',
+        'Vvdash;': '\u22aa',
+        'vzigzag;': '\u299a',
+        'Wcirc;': '\u0174',
+        'wcirc;': '\u0175',
+        'wedbar;': '\u2a5f',
+        'Wedge;': '\u22c0',
+        'wedge;': '\u2227',
+        'wedgeq;': '\u2259',
+        'weierp;': '\u2118',
+        'Wfr;': '\U0001d51a',
+        'wfr;': '\U0001d534',
+        'Wopf;': '\U0001d54e',
+        'wopf;': '\U0001d568',
+        'wp;': '\u2118',
+        'wr;': '\u2240',
+        'wreath;': '\u2240',
+        'Wscr;': '\U0001d4b2',
+        'wscr;': '\U0001d4cc',
+        'xcap;': '\u22c2',
+        'xcirc;': '\u25ef',
+        'xcup;': '\u22c3',
+        'xdtri;': '\u25bd',
+        'Xfr;': '\U0001d51b',
+        'xfr;': '\U0001d535',
+        'xhArr;': '\u27fa',
+        'xharr;': '\u27f7',
+        'Xi;': '\u039e',
+        'xi;': '\u03be',
+        'xlArr;': '\u27f8',
+        'xlarr;': '\u27f5',
+        'xmap;': '\u27fc',
+        'xnis;': '\u22fb',
+        'xodot;': '\u2a00',
+        'Xopf;': '\U0001d54f',
+        'xopf;': '\U0001d569',
+        'xoplus;': '\u2a01',
+        'xotime;': '\u2a02',
+        'xrArr;': '\u27f9',
+        'xrarr;': '\u27f6',
+        'Xscr;': '\U0001d4b3',
+        'xscr;': '\U0001d4cd',
+        'xsqcup;': '\u2a06',
+        'xuplus;': '\u2a04',
+        'xutri;': '\u25b3',
+        'xvee;': '\u22c1',
+        'xwedge;': '\u22c0',
+        'Yacute': '\xdd',
+        'yacute': '\xfd',
+        'Yacute;': '\xdd',
+        'yacute;': '\xfd',
+        'YAcy;': '\u042f',
+        'yacy;': '\u044f',
+        'Ycirc;': '\u0176',
+        'ycirc;': '\u0177',
+        'Ycy;': '\u042b',
+        'ycy;': '\u044b',
+        'yen': '\xa5',
+        'yen;': '\xa5',
+        'Yfr;': '\U0001d51c',
+        'yfr;': '\U0001d536',
+        'YIcy;': '\u0407',
+        'yicy;': '\u0457',
+        'Yopf;': '\U0001d550',
+        'yopf;': '\U0001d56a',
+        'Yscr;': '\U0001d4b4',
+        'yscr;': '\U0001d4ce',
+        'YUcy;': '\u042e',
+        'yucy;': '\u044e',
+        'yuml': '\xff',
+        'Yuml;': '\u0178',
+        'yuml;': '\xff',
+        'Zacute;': '\u0179',
+        'zacute;': '\u017a',
+        'Zcaron;': '\u017d',
+        'zcaron;': '\u017e',
+        'Zcy;': '\u0417',
+        'zcy;': '\u0437',
+        'Zdot;': '\u017b',
+        'zdot;': '\u017c',
+        'zeetrf;': '\u2128',
+        'ZeroWidthSpace;': '\u200b',
+        'Zeta;': '\u0396',
+        'zeta;': '\u03b6',
+        'Zfr;': '\u2128',
+        'zfr;': '\U0001d537',
+        'ZHcy;': '\u0416',
+        'zhcy;': '\u0436',
+        'zigrarr;': '\u21dd',
+        'Zopf;': '\u2124',
+        'zopf;': '\U0001d56b',
+        'Zscr;': '\U0001d4b5',
+        'zscr;': '\U0001d4cf',
+        'zwj;': '\u200d',
+        'zwnj;': '\u200c',
+    }
+
  try:
      import http.client as compat_http_client
  except ImportError:  # Python 2
@@ -82,7 +2322,6 @@ try:
  except ImportError:  # Python 2
      from HTMLParser import HTMLParser as compat_HTMLParser
  
-
  try:
      from subprocess import DEVNULL
      compat_subprocess_get_DEVNULL = lambda: DEVNULL
@@ -181,7 +2420,8 @@ except ImportError:  # Python 2
              if isinstance(e, dict):
                  e = encode_dict(e)
              elif isinstance(e, (list, tuple,)):
-                e = encode_list(e)
+                list_e = encode_list(e)
+                e = tuple(list_e) if isinstance(e, tuple) else list_e
              elif isinstance(e, compat_str):
                  e = e.encode(encoding)
              return e
@@ -243,13 +2483,21 @@ try:
  except ImportError:  # Python 2.6
      from xml.parsers.expat import ExpatError as compat_xml_parse_error
  
+
+etree = xml.etree.ElementTree
+
+
+class _TreeBuilder(etree.TreeBuilder):
+    def doctype(self, name, pubid, system):
+        pass
+
+
  if sys.version_info[0] >= 3:
-    compat_etree_fromstring = xml.etree.ElementTree.fromstring
+    def compat_etree_fromstring(text):
+        return etree.XML(text, parser=etree.XMLParser(target=_TreeBuilder()))
  else:
      # python 2.x tries to encode unicode strings with ascii (see the
      # XMLParser._fixtext method)
-    etree = xml.etree.ElementTree
-
      try:
          _etree_iter = etree.Element.iter
      except AttributeError:  # Python <=2.6
@@ -263,7 +2511,7 @@ else:
      # 2.7 source
      def _XML(text, parser=None):
          if not parser:
-            parser = etree.XMLParser(target=etree.TreeBuilder())
+            parser = etree.XMLParser(target=_TreeBuilder())
          parser.feed(text)
          return parser.close()
  
@@ -275,7 +2523,7 @@ else:
          return el
  
      def compat_etree_fromstring(text):
-        doc = _XML(text, parser=etree.XMLParser(target=etree.TreeBuilder(element_factory=_element_factory)))
+        doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
          for el in _etree_iter(doc):
              if el.text is not None and isinstance(el.text, bytes):
                  el.text = el.text.decode('utf-8')
@@ -339,24 +2587,28 @@ except ImportError:  # Python 2
          return parsed_result
  
  try:
-    from shlex import quote as shlex_quote
+    from shlex import quote as compat_shlex_quote
  except ImportError:  # Python < 3.3
-    def shlex_quote(s):
+    def compat_shlex_quote(s):
          if re.match(r'^[-_\w./]+$', s):
              return s
          else:
              return "'" + s.replace("'", "'\"'\"'") + "'"
  
  
-if sys.version_info >= (2, 7, 3):
+try:
+    args = shlex.split('中文')
+    assert (isinstance(args, list) and
+            isinstance(args[0], compat_str) and
+            args[0] == '中文')
      compat_shlex_split = shlex.split
-else:
+except (AssertionError, UnicodeEncodeError):
      # Working around shlex issue with unicode strings on some python 2
      # versions (see http://bugs.python.org/issue1548891)
      def compat_shlex_split(s, comments=False, posix=True):
          if isinstance(s, compat_str):
              s = s.encode('utf-8')
-        return shlex.split(s, comments, posix)
+        return list(map(lambda s: s.decode('utf-8'), shlex.split(s, comments, posix)))
  
  
  def compat_ord(c):
@@ -372,6 +2624,9 @@ compat_os_name = os._name if os.name == 'java' else os.name
  if sys.version_info >= (3, 0):
      compat_getenv = os.getenv
      compat_expanduser = os.path.expanduser
+
+    def compat_setenv(key, value, env=os.environ):
+        env[key] = value
  else:
      # Environment variables should be decoded with filesystem encoding.
      # Otherwise it will fail if any non-ASCII characters present (see #3854 #3217 #2918)
@@ -383,6 +2638,12 @@ else:
              env = env.decode(get_filesystem_encoding())
          return env
  
+    def compat_setenv(key, value, env=os.environ):
+        def encode(v):
+            from .utils import get_filesystem_encoding
+            return v.encode(get_filesystem_encoding()) if isinstance(v, compat_str) else v
+        env[encode(key)] = encode(value)
+
      # HACK: The default implementations of os.path.expanduser from cpython do not decode
      # environment variables with filesystem encoding. We will work around this by
      # providing adjusted implementations.
@@ -455,18 +2716,6 @@ else:
          print(s)
  
  
-try:
-    subprocess_check_output = subprocess.check_output
-except AttributeError:
-    def subprocess_check_output(*args, **kwargs):
-        assert 'input' not in kwargs
-        p = subprocess.Popen(*args, stdout=subprocess.PIPE, **kwargs)
-        output, _ = p.communicate()
-        ret = p.poll()
-        if ret:
-            raise subprocess.CalledProcessError(ret, p.args, output=output)
-        return output
-
  if sys.version_info < (3, 0) and sys.platform == 'win32':
      def compat_getpass(prompt, *args, **kwargs):
          if isinstance(prompt, compat_str):
@@ -476,6 +2725,11 @@ if sys.version_info < (3, 0) and sys.platform == 'win32':
  else:
      compat_getpass = getpass.getpass
  
+try:
+    compat_input = raw_input
+except NameError:  # Python 3
+    compat_input = input
+
  # Python < 2.6.5 require kwargs to be bytes
  try:
      def _testfunc(x):
@@ -534,6 +2788,7 @@ def workaround_optparse_bug9161():
              return real_add_option(self, *bargs, **bkwargs)
          optparse.OptionGroup.add_option = _compat_add_option
  
+
  if hasattr(shutil, 'get_terminal_size'):  # Python >= 3.3
      compat_get_terminal_size = shutil.get_terminal_size
  else:
@@ -582,6 +2837,26 @@ if sys.version_info >= (3, 0):
  else:
      from tokenize import generate_tokens as compat_tokenize_tokenize
  
+
+try:
+    struct.pack('!I', 0)
+except TypeError:
+    # In Python 2.6 and 2.7.x < 2.7.7, struct requires a bytes argument
+    # See https://bugs.python.org/issue19099
+    def compat_struct_pack(spec, *args):
+        if isinstance(spec, compat_str):
+            spec = spec.encode('ascii')
+        return struct.pack(spec, *args)
+
+    def compat_struct_unpack(spec, *args):
+        if isinstance(spec, compat_str):
+            spec = spec.encode('ascii')
+        return struct.unpack(spec, *args)
+else:
+    compat_struct_pack = struct.pack
+    compat_struct_unpack = struct.unpack
+
+
  __all__ = [
      'compat_HTMLParser',
      'compat_HTTPError',
@@ -595,17 +2870,23 @@ __all__ = [
      'compat_getenv',
      'compat_getpass',
      'compat_html_entities',
+    'compat_html_entities_html5',
      'compat_http_client',
      'compat_http_server',
+    'compat_input',
      'compat_itertools_count',
      'compat_kwargs',
      'compat_ord',
      'compat_os_name',
      'compat_parse_qs',
      'compat_print',
+    'compat_setenv',
+    'compat_shlex_quote',
      'compat_shlex_split',
      'compat_socket_create_connection',
      'compat_str',
+    'compat_struct_pack',
+    'compat_struct_unpack',
      'compat_subprocess_get_DEVNULL',
      'compat_tokenize_tokenize',
      'compat_urllib_error',
@@ -622,7 +2903,5 @@ __all__ = [
      'compat_urlretrieve',
      'compat_xml_parse_error',
      'compat_xpath',
-    'shlex_quote',
-    'subprocess_check_output',
      'workaround_optparse_bug9161',
  ]
diff --git a/youtube_dl/downloader/__init__.py b/youtube_dl/downloader/__init__.py

index 73b34fdae96262000b290df8031a4a1f6eb5e721..16952e359bc19337dc4f8682061d169ff956e264 100644 (file)
--- a/youtube_dl/downloader/__init__.py
+++ b/youtube_dl/downloader/__init__.py
@@ -7,6 +7,7 @@ from .http import HttpFD
  from .rtmp import RtmpFD
  from .dash import DashSegmentsFD
  from .rtsp import RtspFD
+from .ism import IsmFD
  from .external import (
      get_external_downloader,
      FFmpegFD,
@@ -24,6 +25,7 @@ PROTOCOL_MAP = {
      'rtsp': RtspFD,
      'f4m': F4mFD,
      'http_dash_segments': DashSegmentsFD,
+    'ism': IsmFD,
  }
  
  
@@ -41,9 +43,12 @@ def get_suitable_downloader(info_dict, params={}):
          if ed.can_download(info_dict):
              return ed
  
-    if protocol == 'm3u8' and params.get('hls_prefer_native'):
+    if protocol == 'm3u8' and params.get('hls_prefer_native') is True:
          return HlsFD
  
+    if protocol == 'm3u8_native' and params.get('hls_prefer_native') is False:
+        return FFmpegFD
+
      return PROTOCOL_MAP.get(protocol, HttpFD)
  
  
diff --git a/youtube_dl/downloader/common.py b/youtube_dl/downloader/common.py

index 1dba9f49a8b9b586b8428c6c7d65ca641de02c58..3dc144b4e19f208d4075d6423ce3278b3a614330 100644 (file)
--- a/youtube_dl/downloader/common.py
+++ b/youtube_dl/downloader/common.py
@@ -4,6 +4,7 @@ import os
  import re
  import sys
  import time
+import random
  
  from ..compat import compat_os_name
  from ..utils import (
@@ -342,8 +343,10 @@ class FileDownloader(object):
              })
              return True
  
-        sleep_interval = self.params.get('sleep_interval')
-        if sleep_interval:
+        min_sleep_interval = self.params.get('sleep_interval')
+        if min_sleep_interval:
+            max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
+            sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
              self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
              time.sleep(sleep_interval)
  
diff --git a/youtube_dl/downloader/dash.py b/youtube_dl/downloader/dash.py

index 8bbab9dbc596c659db622fe9910d0ae90018a598..8437dde30ca2afe031afb1ff2882ed12ac4b49b5 100644 (file)
--- a/youtube_dl/downloader/dash.py
+++ b/youtube_dl/downloader/dash.py
@@ -1,7 +1,6 @@
  from __future__ import unicode_literals
  
  import os
-import re
  
  from .fragment import FragmentFD
  from ..compat import compat_urllib_error
@@ -19,32 +18,32 @@ class DashSegmentsFD(FragmentFD):
      FD_NAME = 'dashsegments'
  
      def real_download(self, filename, info_dict):
-        base_url = info_dict['url']
-        segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls']
-        initialization_url = info_dict.get('initialization_url')
+        segments = info_dict['fragments'][:1] if self.params.get(
+            'test', False) else info_dict['fragments']
  
          ctx = {
              'filename': filename,
-            'total_frags': len(segment_urls) + (1 if initialization_url else 0),
+            'total_frags': len(segments),
          }
  
          self._prepare_and_start_frag_download(ctx)
  
-        def combine_url(base_url, target_url):
-            if re.match(r'^https?://', target_url):
-                return target_url
-            return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
-
          segments_filenames = []
  
          fragment_retries = self.params.get('fragment_retries', 0)
+        skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
  
-        def append_url_to_file(target_url, tmp_filename, segment_name):
+        def process_segment(segment, tmp_filename, num):
+            segment_url = segment['url']
+            segment_name = 'Frag%d' % num
              target_filename = '%s-%s' % (tmp_filename, segment_name)
+            # In DASH, the first segment contains necessary headers to
+            # generate a valid MP4 file, so always abort for the first segment
+            fatal = num == 0 or not skip_unavailable_fragments
              count = 0
              while count <= fragment_retries:
                  try:
-                    success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
+                    success = ctx['dl'].download(target_filename, {'url': segment_url})
                      if not success:
                          return False
                      down, target_sanitized = sanitize_open(target_filename, 'rb')
@@ -52,26 +51,27 @@ class DashSegmentsFD(FragmentFD):
                      down.close()
                      segments_filenames.append(target_sanitized)
                      break
-                except (compat_urllib_error.HTTPError, ) as err:
+                except compat_urllib_error.HTTPError as err:
                      # YouTube may often return 404 HTTP error for a fragment causing the
                      # whole download to fail. However if the same fragment is immediately
                      # retried with the same request data this usually succeeds (1-2 attemps
                      # is usually enough) thus allowing to download the whole file successfully.
-                    # So, we will retry all fragments that fail with 404 HTTP error for now.
-                    if err.code != 404:
-                        raise
-                    # Retry fragment
+                    # To be future-proof we will retry all fragments that fail with any
+                    # HTTP error.
                      count += 1
                      if count <= fragment_retries:
-                        self.report_retry_fragment(segment_name, count, fragment_retries)
+                        self.report_retry_fragment(err, segment_name, count, fragment_retries)
              if count > fragment_retries:
+                if not fatal:
+                    self.report_skip_fragment(segment_name)
+                    return True
                  self.report_error('giving up after %s fragment retries' % fragment_retries)
                  return False
+            return True
  
-        if initialization_url:
-            append_url_to_file(initialization_url, ctx['tmpfilename'], 'Init')
-        for i, segment_url in enumerate(segment_urls):
-            append_url_to_file(segment_url, ctx['tmpfilename'], 'Seg%d' % i)
+        for i, segment in enumerate(segments):
+            if not process_segment(segment, ctx['tmpfilename'], i):
+                return False
  
          self._finish_frag_download(ctx)
  
diff --git a/youtube_dl/downloader/external.py b/youtube_dl/downloader/external.py

index 30277dc205787d226360cfc950e5c08796a5fc03..5d3e5d8d3d748d98ea187e8eca4444c5504e07fb 100644 (file)
--- a/youtube_dl/downloader/external.py
+++ b/youtube_dl/downloader/external.py
@@ -6,6 +6,7 @@ import sys
  import re
  
  from .common import FileDownloader
+from ..compat import compat_setenv
  from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
  from ..utils import (
      cli_option,
@@ -84,7 +85,7 @@ class ExternalFD(FileDownloader):
              cmd, stderr=subprocess.PIPE)
          _, stderr = p.communicate()
          if p.returncode != 0:
-            self.to_stderr(stderr)
+            self.to_stderr(stderr.decode('utf-8', 'replace'))
          return p.returncode
  
  
@@ -95,6 +96,12 @@ class CurlFD(ExternalFD):
          cmd = [self.exe, '--location', '-o', tmpfilename]
          for key, val in info_dict['http_headers'].items():
              cmd += ['--header', '%s: %s' % (key, val)]
+        cmd += self._bool_option('--continue-at', 'continuedl', '-', '0')
+        cmd += self._valueless_option('--silent', 'noprogress')
+        cmd += self._valueless_option('--verbose', 'verbose')
+        cmd += self._option('--limit-rate', 'ratelimit')
+        cmd += self._option('--retry', 'retries')
+        cmd += self._option('--max-filesize', 'max_filesize')
          cmd += self._option('--interface', 'source_address')
          cmd += self._option('--proxy', 'proxy')
          cmd += self._valueless_option('--insecure', 'nocheckcertificate')
@@ -102,6 +109,16 @@ class CurlFD(ExternalFD):
          cmd += ['--', info_dict['url']]
          return cmd
  
+    def _call_downloader(self, tmpfilename, info_dict):
+        cmd = [encodeArgument(a) for a in self._make_cmd(tmpfilename, info_dict)]
+
+        self._debug_cmd(cmd)
+
+        # curl writes the progress to stderr so don't capture it.
+        p = subprocess.Popen(cmd)
+        p.communicate()
+        return p.returncode
+
  
  class AxelFD(ExternalFD):
      AVAILABLE_OPT = '-V'
@@ -198,6 +215,25 @@ class FFmpegFD(ExternalFD):
                  '-headers',
                  ''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
  
+        env = None
+        proxy = self.params.get('proxy')
+        if proxy:
+            if not re.match(r'^[\da-zA-Z]+://', proxy):
+                proxy = 'http://%s' % proxy
+
+            if proxy.startswith('socks'):
+                self.report_warning(
+                    '%s does not support SOCKS proxies. Downloading is likely to fail. '
+                    'Consider adding --hls-prefer-native to your command.' % self.get_basename())
+
+            # Since December 2015 ffmpeg supports -http_proxy option (see
+            # http://git.videolan.org/?p=ffmpeg.git;a=commit;h=b4eb1f29ebddd60c41a2eb39f5af701e38e0d3fd)
+            # We could switch to the following code if we are able to detect version properly
+            # args += ['-http_proxy', proxy]
+            env = os.environ.copy()
+            compat_setenv('HTTP_PROXY', proxy, env=env)
+            compat_setenv('http_proxy', proxy, env=env)
+
          protocol = info_dict.get('protocol')
  
          if protocol == 'rtmp':
@@ -224,8 +260,8 @@ class FFmpegFD(ExternalFD):
                  args += ['-rtmp_live', 'live']
  
          args += ['-i', url, '-c', 'copy']
-        if protocol == 'm3u8':
-            if self.params.get('hls_use_mpegts', False):
+        if protocol in ('m3u8', 'm3u8_native'):
+            if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
                  args += ['-f', 'mpegts']
              else:
                  args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
@@ -239,7 +275,7 @@ class FFmpegFD(ExternalFD):
  
          self._debug_cmd(args)
  
-        proc = subprocess.Popen(args, stdin=subprocess.PIPE)
+        proc = subprocess.Popen(args, stdin=subprocess.PIPE, env=env)
          try:
              retval = proc.wait()
          except KeyboardInterrupt:
@@ -257,6 +293,7 @@ class FFmpegFD(ExternalFD):
  class AVconvFD(FFmpegFD):
      pass
  
+
  _BY_NAME = dict(
      (klass.get_basename(), klass)
      for name, klass in globals().items()
diff --git a/youtube_dl/downloader/f4m.py b/youtube_dl/downloader/f4m.py

index 664d87543d07f7c357b803e0a0058034b71276a6..688e086eb0536c55ef184ae68fa09a6ffb41462d 100644 (file)
--- a/youtube_dl/downloader/f4m.py
+++ b/youtube_dl/downloader/f4m.py
@@ -12,37 +12,49 @@ from ..compat import (
      compat_urlparse,
      compat_urllib_error,
      compat_urllib_parse_urlparse,
+    compat_struct_pack,
+    compat_struct_unpack,
  )
  from ..utils import (
      encodeFilename,
      fix_xml_ampersands,
      sanitize_open,
-    struct_pack,
-    struct_unpack,
      xpath_text,
  )
  
  
+class DataTruncatedError(Exception):
+    pass
+
+
  class FlvReader(io.BytesIO):
      """
      Reader for Flv files
      The file format is documented in https://www.adobe.com/devnet/f4v.html
      """
  
+    def read_bytes(self, n):
+        data = self.read(n)
+        if len(data) < n:
+            raise DataTruncatedError(
+                'FlvReader error: need %d bytes while only %d bytes got' % (
+                    n, len(data)))
+        return data
+
      # Utility functions for reading numbers and strings
      def read_unsigned_long_long(self):
-        return struct_unpack('!Q', self.read(8))[0]
+        return compat_struct_unpack('!Q', self.read_bytes(8))[0]
  
      def read_unsigned_int(self):
-        return struct_unpack('!I', self.read(4))[0]
+        return compat_struct_unpack('!I', self.read_bytes(4))[0]
  
      def read_unsigned_char(self):
-        return struct_unpack('!B', self.read(1))[0]
+        return compat_struct_unpack('!B', self.read_bytes(1))[0]
  
      def read_string(self):
          res = b''
          while True:
-            char = self.read(1)
+            char = self.read_bytes(1)
              if char == b'\x00':
                  break
              res += char
@@ -53,18 +65,18 @@ class FlvReader(io.BytesIO):
          Read a box and return the info as a tuple: (box_size, box_type, box_data)
          """
          real_size = size = self.read_unsigned_int()
-        box_type = self.read(4)
+        box_type = self.read_bytes(4)
          header_end = 8
          if size == 1:
              real_size = self.read_unsigned_long_long()
              header_end = 16
-        return real_size, box_type, self.read(real_size - header_end)
+        return real_size, box_type, self.read_bytes(real_size - header_end)
  
      def read_asrt(self):
          # version
          self.read_unsigned_char()
          # flags
-        self.read(3)
+        self.read_bytes(3)
          quality_entry_count = self.read_unsigned_char()
          # QualityEntryCount
          for i in range(quality_entry_count):
@@ -85,7 +97,7 @@ class FlvReader(io.BytesIO):
          # version
          self.read_unsigned_char()
          # flags
-        self.read(3)
+        self.read_bytes(3)
          # time scale
          self.read_unsigned_int()
  
@@ -119,7 +131,7 @@ class FlvReader(io.BytesIO):
          # version
          self.read_unsigned_char()
          # flags
-        self.read(3)
+        self.read_bytes(3)
  
          self.read_unsigned_int()  # BootstrapinfoVersion
          # Profile,Live,Update,Reserved
@@ -184,6 +196,11 @@ def build_fragments_list(boot_info):
      first_frag_number = fragment_run_entry_table[0]['first']
      fragments_counter = itertools.count(first_frag_number)
      for segment, fragments_count in segment_run_table['segment_run']:
+        # In some live HDS streams (for example Rai), `fragments_count` is
+        # abnormal and causing out-of-memory errors. It's OK to change the
+        # number of fragments for live streams as they are updated periodically
+        if fragments_count == 4294967295 and boot_info['live']:
+            fragments_count = 2
          for _ in range(fragments_count):
              res.append((segment, next(fragments_counter)))
  
@@ -194,11 +211,11 @@ def build_fragments_list(boot_info):
  
  
  def write_unsigned_int(stream, val):
-    stream.write(struct_pack('!I', val))
+    stream.write(compat_struct_pack('!I', val))
  
  
  def write_unsigned_int_24(stream, val):
-    stream.write(struct_pack('!I', val)[1:])
+    stream.write(compat_struct_pack('!I', val)[1:])
  
  
  def write_flv_header(stream):
@@ -297,7 +314,8 @@ class F4mFD(FragmentFD):
          man_url = info_dict['url']
          requested_bitrate = info_dict.get('tbr')
          self.to_screen('[%s] Downloading f4m manifest' % self.FD_NAME)
-        urlh = self.ydl.urlopen(man_url)
+
+        urlh = self.ydl.urlopen(self._prepare_url(info_dict, man_url))
          man_url = urlh.geturl()
          # Some manifests may be malformed, e.g. prosiebensat1 generated manifests
          # (see https://github.com/rg3/youtube-dl/issues/6215#issuecomment-121704244
@@ -307,7 +325,7 @@ class F4mFD(FragmentFD):
          doc = compat_etree_fromstring(manifest)
          formats = [(int(f.attrib.get('bitrate', -1)), f)
                     for f in self._get_unencrypted_media(doc)]
-        if requested_bitrate is None:
+        if requested_bitrate is None or len(formats) == 1:
              # get the best format
              formats = sorted(formats, key=lambda f: f[0])
              rate, media = formats[-1]
@@ -317,7 +335,11 @@ class F4mFD(FragmentFD):
  
          base_url = compat_urlparse.urljoin(man_url, media.attrib['url'])
          bootstrap_node = doc.find(_add_ns('bootstrapInfo'))
-        boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, base_url)
+        # From Adobe F4M 3.0 spec:
+        # The <baseURL> element SHALL be the base URL for all relative
+        # (HTTP-based) URLs in the manifest. If <baseURL> is not present, said
+        # URLs should be relative to the location of the containing document.
+        boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, man_url)
          live = boot_info['live']
          metadata_node = media.find(_add_ns('metadata'))
          if metadata_node is not None:
@@ -366,7 +388,10 @@ class F4mFD(FragmentFD):
              url_parsed = base_url_parsed._replace(path=base_url_parsed.path + name, query='&'.join(query))
              frag_filename = '%s-%s' % (ctx['tmpfilename'], name)
              try:
-                success = ctx['dl'].download(frag_filename, {'url': url_parsed.geturl()})
+                success = ctx['dl'].download(frag_filename, {
+                    'url': url_parsed.geturl(),
+                    'http_headers': info_dict.get('http_headers'),
+                })
                  if not success:
                      return False
                  (down, frag_sanitized) = sanitize_open(frag_filename, 'rb')
@@ -374,7 +399,17 @@ class F4mFD(FragmentFD):
                  down.close()
                  reader = FlvReader(down_data)
                  while True:
-                    _, box_type, box_data = reader.read_box_info()
+                    try:
+                        _, box_type, box_data = reader.read_box_info()
+                    except DataTruncatedError:
+                        if test:
+                            # In tests, segments may be truncated, and thus
+                            # FlvReader may not be able to parse the whole
+                            # chunk. If so, write the segment as is
+                            # See https://github.com/rg3/youtube-dl/issues/9214
+                            dest_stream.write(down_data)
+                            break
+                        raise
                      if box_type == b'mdat':
                          dest_stream.write(box_data)
                          break
diff --git a/youtube_dl/downloader/fragment.py b/youtube_dl/downloader/fragment.py

index ba903ae103a7bc6817378389a34713d9a5550e19..60df627a65dfc589899f009fa5df9ce76a441ae5 100644 (file)
--- a/youtube_dl/downloader/fragment.py
+++ b/youtube_dl/downloader/fragment.py
@@ -6,8 +6,10 @@ import time
  from .common import FileDownloader
  from .http import HttpFD
  from ..utils import (
+    error_to_compat_str,
      encodeFilename,
      sanitize_open,
+    sanitized_Request,
  )
  
  
@@ -22,13 +24,23 @@ class FragmentFD(FileDownloader):
  
      Available options:
  
-    fragment_retries:   Number of times to retry a fragment for HTTP error (DASH only)
+    fragment_retries:   Number of times to retry a fragment for HTTP error (DASH
+                        and hlsnative only)
+    skip_unavailable_fragments:
+                        Skip unavailable fragments (DASH and hlsnative only)
      """
  
-    def report_retry_fragment(self, fragment_name, count, retries):
+    def report_retry_fragment(self, err, fragment_name, count, retries):
          self.to_screen(
-            '[download] Got server HTTP error. Retrying fragment %s (attempt %d of %s)...'
-            % (fragment_name, count, self.format_retries(retries)))
+            '[download] Got server HTTP error: %s. Retrying fragment %s (attempt %d of %s)...'
+            % (error_to_compat_str(err), fragment_name, count, self.format_retries(retries)))
+
+    def report_skip_fragment(self, fragment_name):
+        self.to_screen('[download] Skipping fragment %s...' % fragment_name)
+
+    def _prepare_url(self, info_dict, url):
+        headers = info_dict.get('http_headers')
+        return sanitized_Request(url, None, headers) if headers else url
  
      def _prepare_and_start_frag_download(self, ctx):
          self._prepare_frag_download(ctx)
diff --git a/youtube_dl/downloader/hls.py b/youtube_dl/downloader/hls.py

index a01dac031aa3b0c012a4262d210d16ef2b10a47a..7373ec05fd0d4a1d983f48668229b21d98977581 100644 (file)
--- a/youtube_dl/downloader/hls.py
+++ b/youtube_dl/downloader/hls.py
@@ -2,13 +2,26 @@ from __future__ import unicode_literals
  
  import os.path
  import re
+import binascii
+try:
+    from Crypto.Cipher import AES
+    can_decrypt_frag = True
+except ImportError:
+    can_decrypt_frag = False
  
  from .fragment import FragmentFD
+from .external import FFmpegFD
  
-from ..compat import compat_urlparse
+from ..compat import (
+    compat_urllib_error,
+    compat_urlparse,
+    compat_struct_pack,
+)
  from ..utils import (
      encodeFilename,
      sanitize_open,
+    parse_m3u8_attributes,
+    update_url_query,
  )
  
  
@@ -17,42 +30,140 @@ class HlsFD(FragmentFD):
  
      FD_NAME = 'hlsnative'
  
+    @staticmethod
+    def can_download(manifest, info_dict):
+        UNSUPPORTED_FEATURES = (
+            r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)',  # encrypted streams [1]
+            r'#EXT-X-BYTERANGE',  # playlists composed of byte ranges of media files [2]
+
+            # Live streams heuristic does not always work (e.g. geo restricted to Germany
+            # http://hls-geo.daserste.de/i/videoportal/Film/c_620000/622873/format,716451,716457,716450,716458,716459,.mp4.csmil/index_4_av.m3u8?null=0)
+            # r'#EXT-X-MEDIA-SEQUENCE:(?!0$)',  # live streams [3]
+
+            # This heuristic also is not correct since segments may not be appended as well.
+            # Twitch vods of finished streams have EXT-X-PLAYLIST-TYPE:EVENT despite
+            # no segments will definitely be appended to the end of the playlist.
+            # r'#EXT-X-PLAYLIST-TYPE:EVENT',  # media segments may be appended to the end of
+            #                                 # event media playlists [4]
+
+            # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
+            # 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
+            # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
+            # 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
+        )
+        check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
+        check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
+        check_results.append(not info_dict.get('is_live'))
+        return all(check_results)
+
      def real_download(self, filename, info_dict):
          man_url = info_dict['url']
          self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME)
-        manifest = self.ydl.urlopen(man_url).read()
+
+        manifest = self.ydl.urlopen(self._prepare_url(info_dict, man_url)).read()
  
          s = manifest.decode('utf-8', 'ignore')
-        fragment_urls = []
+
+        if not self.can_download(s, info_dict):
+            self.report_warning(
+                'hlsnative has detected features it does not support, '
+                'extraction will be delegated to ffmpeg')
+            fd = FFmpegFD(self.ydl, self.params)
+            for ph in self._progress_hooks:
+                fd.add_progress_hook(ph)
+            return fd.real_download(filename, info_dict)
+
+        total_frags = 0
          for line in s.splitlines():
              line = line.strip()
              if line and not line.startswith('#'):
-                segment_url = (
-                    line
-                    if re.match(r'^https?://', line)
-                    else compat_urlparse.urljoin(man_url, line))
-                fragment_urls.append(segment_url)
-                # We only download the first fragment during the test
-                if self.params.get('test', False):
-                    break
+                total_frags += 1
  
          ctx = {
              'filename': filename,
-            'total_frags': len(fragment_urls),
+            'total_frags': total_frags,
          }
  
          self._prepare_and_start_frag_download(ctx)
  
+        fragment_retries = self.params.get('fragment_retries', 0)
+        skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
+        test = self.params.get('test', False)
+
+        extra_query = None
+        extra_param_to_segment_url = info_dict.get('extra_param_to_segment_url')
+        if extra_param_to_segment_url:
+            extra_query = compat_urlparse.parse_qs(extra_param_to_segment_url)
+        i = 0
+        media_sequence = 0
+        decrypt_info = {'METHOD': 'NONE'}
          frags_filenames = []
-        for i, frag_url in enumerate(fragment_urls):
-            frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
-            success = ctx['dl'].download(frag_filename, {'url': frag_url})
-            if not success:
-                return False
-            down, frag_sanitized = sanitize_open(frag_filename, 'rb')
-            ctx['dest_stream'].write(down.read())
-            down.close()
-            frags_filenames.append(frag_sanitized)
+        for line in s.splitlines():
+            line = line.strip()
+            if line:
+                if not line.startswith('#'):
+                    frag_url = (
+                        line
+                        if re.match(r'^https?://', line)
+                        else compat_urlparse.urljoin(man_url, line))
+                    frag_name = 'Frag%d' % i
+                    frag_filename = '%s-%s' % (ctx['tmpfilename'], frag_name)
+                    if extra_query:
+                        frag_url = update_url_query(frag_url, extra_query)
+                    count = 0
+                    while count <= fragment_retries:
+                        try:
+                            success = ctx['dl'].download(frag_filename, {
+                                'url': frag_url,
+                                'http_headers': info_dict.get('http_headers'),
+                            })
+                            if not success:
+                                return False
+                            down, frag_sanitized = sanitize_open(frag_filename, 'rb')
+                            frag_content = down.read()
+                            down.close()
+                            break
+                        except compat_urllib_error.HTTPError as err:
+                            # Unavailable (possibly temporary) fragments may be served.
+                            # First we try to retry then either skip or abort.
+                            # See https://github.com/rg3/youtube-dl/issues/10165,
+                            # https://github.com/rg3/youtube-dl/issues/10448).
+                            count += 1
+                            if count <= fragment_retries:
+                                self.report_retry_fragment(err, frag_name, count, fragment_retries)
+                    if count > fragment_retries:
+                        if skip_unavailable_fragments:
+                            i += 1
+                            media_sequence += 1
+                            self.report_skip_fragment(frag_name)
+                            continue
+                        self.report_error(
+                            'giving up after %s fragment retries' % fragment_retries)
+                        return False
+                    if decrypt_info['METHOD'] == 'AES-128':
+                        iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
+                        frag_content = AES.new(
+                            decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
+                    ctx['dest_stream'].write(frag_content)
+                    frags_filenames.append(frag_sanitized)
+                    # We only download the first fragment during the test
+                    if test:
+                        break
+                    i += 1
+                    media_sequence += 1
+                elif line.startswith('#EXT-X-KEY'):
+                    decrypt_info = parse_m3u8_attributes(line[11:])
+                    if decrypt_info['METHOD'] == 'AES-128':
+                        if 'IV' in decrypt_info:
+                            decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:].zfill(32))
+                        if not re.match(r'^https?://', decrypt_info['URI']):
+                            decrypt_info['URI'] = compat_urlparse.urljoin(
+                                man_url, decrypt_info['URI'])
+                        if extra_query:
+                            decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_query)
+                        decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
+                elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
+                    media_sequence = int(line[22:])
  
          self._finish_frag_download(ctx)
  
diff --git a/youtube_dl/downloader/http.py b/youtube_dl/downloader/http.py

index f8b69d186ac5ee93c8402f85bc66e7ed59570118..af405b9509572bfd42bb11bd48bec5300d8105b3 100644 (file)
--- a/youtube_dl/downloader/http.py
+++ b/youtube_dl/downloader/http.py
@@ -13,6 +13,9 @@ from ..utils import (
      encodeFilename,
      sanitize_open,
      sanitized_Request,
+    write_xattr,
+    XAttrMetadataError,
+    XAttrUnavailableError,
  )
  
  
@@ -179,9 +182,8 @@ class HttpFD(FileDownloader):
  
                  if self.params.get('xattr_set_filesize', False) and data_len is not None:
                      try:
-                        import xattr
-                        xattr.setxattr(tmpfilename, 'user.ytdl.filesize', str(data_len))
-                    except(OSError, IOError, ImportError) as err:
+                        write_xattr(tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
+                    except (XAttrUnavailableError, XAttrMetadataError) as err:
                          self.report_error('unable to set filesize xattr: %s' % str(err))
  
              try:
diff --git a/youtube_dl/downloader/ism.py b/youtube_dl/downloader/ism.py

new file mode 100644 (file)

index 0000000..93cac5e
--- /dev/null
+++ b/youtube_dl/downloader/ism.py
@@ -0,0 +1,271 @@
+from __future__ import unicode_literals
+
+import os
+import time
+import struct
+import binascii
+import io
+
+from .fragment import FragmentFD
+from ..compat import compat_urllib_error
+from ..utils import (
+    sanitize_open,
+    encodeFilename,
+)
+
+
+u8 = struct.Struct(b'>B')
+u88 = struct.Struct(b'>Bx')
+u16 = struct.Struct(b'>H')
+u1616 = struct.Struct(b'>Hxx')
+u32 = struct.Struct(b'>I')
+u64 = struct.Struct(b'>Q')
+
+s88 = struct.Struct(b'>bx')
+s16 = struct.Struct(b'>h')
+s1616 = struct.Struct(b'>hxx')
+s32 = struct.Struct(b'>i')
+
+unity_matrix = (s32.pack(0x10000) + s32.pack(0) * 3) * 2 + s32.pack(0x40000000)
+
+TRACK_ENABLED = 0x1
+TRACK_IN_MOVIE = 0x2
+TRACK_IN_PREVIEW = 0x4
+
+SELF_CONTAINED = 0x1
+
+
+def box(box_type, payload):
+    return u32.pack(8 + len(payload)) + box_type + payload
+
+
+def full_box(box_type, version, flags, payload):
+    return box(box_type, u8.pack(version) + u32.pack(flags)[1:] + payload)
+
+
+def write_piff_header(stream, params):
+    track_id = params['track_id']
+    fourcc = params['fourcc']
+    duration = params['duration']
+    timescale = params.get('timescale', 10000000)
+    language = params.get('language', 'und')
+    height = params.get('height', 0)
+    width = params.get('width', 0)
+    is_audio = width == 0 and height == 0
+    creation_time = modification_time = int(time.time())
+
+    ftyp_payload = b'isml'  # major brand
+    ftyp_payload += u32.pack(1)  # minor version
+    ftyp_payload += b'piff' + b'iso2'  # compatible brands
+    stream.write(box(b'ftyp', ftyp_payload))  # File Type Box
+
+    mvhd_payload = u64.pack(creation_time)
+    mvhd_payload += u64.pack(modification_time)
+    mvhd_payload += u32.pack(timescale)
+    mvhd_payload += u64.pack(duration)
+    mvhd_payload += s1616.pack(1)  # rate
+    mvhd_payload += s88.pack(1)  # volume
+    mvhd_payload += u16.pack(0)  # reserved
+    mvhd_payload += u32.pack(0) * 2  # reserved
+    mvhd_payload += unity_matrix
+    mvhd_payload += u32.pack(0) * 6  # pre defined
+    mvhd_payload += u32.pack(0xffffffff)  # next track id
+    moov_payload = full_box(b'mvhd', 1, 0, mvhd_payload)  # Movie Header Box
+
+    tkhd_payload = u64.pack(creation_time)
+    tkhd_payload += u64.pack(modification_time)
+    tkhd_payload += u32.pack(track_id)  # track id
+    tkhd_payload += u32.pack(0)  # reserved
+    tkhd_payload += u64.pack(duration)
+    tkhd_payload += u32.pack(0) * 2  # reserved
+    tkhd_payload += s16.pack(0)  # layer
+    tkhd_payload += s16.pack(0)  # alternate group
+    tkhd_payload += s88.pack(1 if is_audio else 0)  # volume
+    tkhd_payload += u16.pack(0)  # reserved
+    tkhd_payload += unity_matrix
+    tkhd_payload += u1616.pack(width)
+    tkhd_payload += u1616.pack(height)
+    trak_payload = full_box(b'tkhd', 1, TRACK_ENABLED | TRACK_IN_MOVIE | TRACK_IN_PREVIEW, tkhd_payload)  # Track Header Box
+
+    mdhd_payload = u64.pack(creation_time)
+    mdhd_payload += u64.pack(modification_time)
+    mdhd_payload += u32.pack(timescale)
+    mdhd_payload += u64.pack(duration)
+    mdhd_payload += u16.pack(((ord(language[0]) - 0x60) << 10) | ((ord(language[1]) - 0x60) << 5) | (ord(language[2]) - 0x60))
+    mdhd_payload += u16.pack(0)  # pre defined
+    mdia_payload = full_box(b'mdhd', 1, 0, mdhd_payload)  # Media Header Box
+
+    hdlr_payload = u32.pack(0)  # pre defined
+    hdlr_payload += b'soun' if is_audio else b'vide'  # handler type
+    hdlr_payload += u32.pack(0) * 3  # reserved
+    hdlr_payload += (b'Sound' if is_audio else b'Video') + b'Handler\0'  # name
+    mdia_payload += full_box(b'hdlr', 0, 0, hdlr_payload)  # Handler Reference Box
+
+    if is_audio:
+        smhd_payload = s88.pack(0)  # balance
+        smhd_payload = u16.pack(0)  # reserved
+        media_header_box = full_box(b'smhd', 0, 0, smhd_payload)  # Sound Media Header
+    else:
+        vmhd_payload = u16.pack(0)  # graphics mode
+        vmhd_payload += u16.pack(0) * 3  # opcolor
+        media_header_box = full_box(b'vmhd', 0, 1, vmhd_payload)  # Video Media Header
+    minf_payload = media_header_box
+
+    dref_payload = u32.pack(1)  # entry count
+    dref_payload += full_box(b'url ', 0, SELF_CONTAINED, b'')  # Data Entry URL Box
+    dinf_payload = full_box(b'dref', 0, 0, dref_payload)  # Data Reference Box
+    minf_payload += box(b'dinf', dinf_payload)  # Data Information Box
+
+    stsd_payload = u32.pack(1)  # entry count
+
+    sample_entry_payload = u8.pack(0) * 6  # reserved
+    sample_entry_payload += u16.pack(1)  # data reference index
+    if is_audio:
+        sample_entry_payload += u32.pack(0) * 2  # reserved
+        sample_entry_payload += u16.pack(params.get('channels', 2))
+        sample_entry_payload += u16.pack(params.get('bits_per_sample', 16))
+        sample_entry_payload += u16.pack(0)  # pre defined
+        sample_entry_payload += u16.pack(0)  # reserved
+        sample_entry_payload += u1616.pack(params['sampling_rate'])
+
+        if fourcc == 'AACL':
+            sample_entry_box = box(b'mp4a', sample_entry_payload)
+    else:
+        sample_entry_payload = sample_entry_payload
+        sample_entry_payload += u16.pack(0)  # pre defined
+        sample_entry_payload += u16.pack(0)  # reserved
+        sample_entry_payload += u32.pack(0) * 3  # pre defined
+        sample_entry_payload += u16.pack(width)
+        sample_entry_payload += u16.pack(height)
+        sample_entry_payload += u1616.pack(0x48)  # horiz resolution 72 dpi
+        sample_entry_payload += u1616.pack(0x48)  # vert resolution 72 dpi
+        sample_entry_payload += u32.pack(0)  # reserved
+        sample_entry_payload += u16.pack(1)  # frame count
+        sample_entry_payload += u8.pack(0) * 32  # compressor name
+        sample_entry_payload += u16.pack(0x18)  # depth
+        sample_entry_payload += s16.pack(-1)  # pre defined
+
+        codec_private_data = binascii.unhexlify(params['codec_private_data'])
+        if fourcc in ('H264', 'AVC1'):
+            sps, pps = codec_private_data.split(u32.pack(1))[1:]
+            avcc_payload = u8.pack(1)  # configuration version
+            avcc_payload += sps[1:4]  # avc profile indication + profile compatibility + avc level indication
+            avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1))  # complete represenation (1) + reserved (11111) + length size minus one
+            avcc_payload += u8.pack(1)  # reserved (0) + number of sps (0000001)
+            avcc_payload += u16.pack(len(sps))
+            avcc_payload += sps
+            avcc_payload += u8.pack(1)  # number of pps
+            avcc_payload += u16.pack(len(pps))
+            avcc_payload += pps
+            sample_entry_payload += box(b'avcC', avcc_payload)  # AVC Decoder Configuration Record
+            sample_entry_box = box(b'avc1', sample_entry_payload)  # AVC Simple Entry
+    stsd_payload += sample_entry_box
+
+    stbl_payload = full_box(b'stsd', 0, 0, stsd_payload)  # Sample Description Box
+
+    stts_payload = u32.pack(0)  # entry count
+    stbl_payload += full_box(b'stts', 0, 0, stts_payload)  # Decoding Time to Sample Box
+
+    stsc_payload = u32.pack(0)  # entry count
+    stbl_payload += full_box(b'stsc', 0, 0, stsc_payload)  # Sample To Chunk Box
+
+    stco_payload = u32.pack(0)  # entry count
+    stbl_payload += full_box(b'stco', 0, 0, stco_payload)  # Chunk Offset Box
+
+    minf_payload += box(b'stbl', stbl_payload)  # Sample Table Box
+
+    mdia_payload += box(b'minf', minf_payload)  # Media Information Box
+
+    trak_payload += box(b'mdia', mdia_payload)  # Media Box
+
+    moov_payload += box(b'trak', trak_payload)  # Track Box
+
+    mehd_payload = u64.pack(duration)
+    mvex_payload = full_box(b'mehd', 1, 0, mehd_payload)  # Movie Extends Header Box
+
+    trex_payload = u32.pack(track_id)  # track id
+    trex_payload += u32.pack(1)  # default sample description index
+    trex_payload += u32.pack(0)  # default sample duration
+    trex_payload += u32.pack(0)  # default sample size
+    trex_payload += u32.pack(0)  # default sample flags
+    mvex_payload += full_box(b'trex', 0, 0, trex_payload)  # Track Extends Box
+
+    moov_payload += box(b'mvex', mvex_payload)  # Movie Extends Box
+    stream.write(box(b'moov', moov_payload))  # Movie Box
+
+
+def extract_box_data(data, box_sequence):
+    data_reader = io.BytesIO(data)
+    while True:
+        box_size = u32.unpack(data_reader.read(4))[0]
+        box_type = data_reader.read(4)
+        if box_type == box_sequence[0]:
+            box_data = data_reader.read(box_size - 8)
+            if len(box_sequence) == 1:
+                return box_data
+            return extract_box_data(box_data, box_sequence[1:])
+        data_reader.seek(box_size - 8, 1)
+
+
+class IsmFD(FragmentFD):
+    """
+    Download segments in a ISM manifest
+    """
+
+    FD_NAME = 'ism'
+
+    def real_download(self, filename, info_dict):
+        segments = info_dict['fragments'][:1] if self.params.get(
+            'test', False) else info_dict['fragments']
+
+        ctx = {
+            'filename': filename,
+            'total_frags': len(segments),
+        }
+
+        self._prepare_and_start_frag_download(ctx)
+
+        segments_filenames = []
+
+        fragment_retries = self.params.get('fragment_retries', 0)
+        skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
+
+        track_written = False
+        for i, segment in enumerate(segments):
+            segment_url = segment['url']
+            segment_name = 'Frag%d' % i
+            target_filename = '%s-%s' % (ctx['tmpfilename'], segment_name)
+            count = 0
+            while count <= fragment_retries:
+                try:
+                    success = ctx['dl'].download(target_filename, {'url': segment_url})
+                    if not success:
+                        return False
+                    down, target_sanitized = sanitize_open(target_filename, 'rb')
+                    down_data = down.read()
+                    if not track_written:
+                        tfhd_data = extract_box_data(down_data, [b'moof', b'traf', b'tfhd'])
+                        info_dict['_download_params']['track_id'] = u32.unpack(tfhd_data[4:8])[0]
+                        write_piff_header(ctx['dest_stream'], info_dict['_download_params'])
+                        track_written = True
+                    ctx['dest_stream'].write(down_data)
+                    down.close()
+                    segments_filenames.append(target_sanitized)
+                    break
+                except compat_urllib_error.HTTPError as err:
+                    count += 1
+                    if count <= fragment_retries:
+                        self.report_retry_fragment(err, segment_name, count, fragment_retries)
+            if count > fragment_retries:
+                if skip_unavailable_fragments:
+                    self.report_skip_fragment(segment_name)
+                    continue
+                self.report_error('giving up after %s fragment retries' % fragment_retries)
+                return False
+
+        self._finish_frag_download(ctx)
+
+        for segment_file in segments_filenames:
+            os.remove(encodeFilename(segment_file))
+
+        return True
diff --git a/youtube_dl/downloader/rtsp.py b/youtube_dl/downloader/rtsp.py

index 3eb29526cbc90cb3351c75876698a1b238c07ef8..939358b2a2f00edaca5283d311d89ab220d26966 100644 (file)
--- a/youtube_dl/downloader/rtsp.py
+++ b/youtube_dl/downloader/rtsp.py
@@ -27,6 +27,8 @@ class RtspFD(FileDownloader):
              self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.')
              return False
  
+        self._debug_cmd(args)
+
          retval = subprocess.call(args)
          if retval == 0:
              fsize = os.path.getsize(encodeFilename(tmpfilename))
diff --git a/youtube_dl/extractor/__init__.py b/youtube_dl/extractor/__init__.py

index 1e4b078a4aa45e6b5ead9e5ad0956959f04d1a6c..18d8dbcd6672f82776a9bd9f6f4cc63cac91129d 100644 (file)
--- a/youtube_dl/extractor/__init__.py
+++ b/youtube_dl/extractor/__init__.py
@@ -1,1001 +1,33 @@
  from __future__ import unicode_literals
  
-from .abc import ABCIE
-from .abc7news import Abc7NewsIE
-from .academicearth import AcademicEarthCourseIE
-from .acast import (
-    ACastIE,
-    ACastChannelIE,
-)
-from .addanime import AddAnimeIE
-from .adobetv import (
-    AdobeTVIE,
-    AdobeTVShowIE,
-    AdobeTVChannelIE,
-    AdobeTVVideoIE,
-)
-from .adultswim import AdultSwimIE
-from .aenetworks import AENetworksIE
-from .aftonbladet import AftonbladetIE
-from .airmozilla import AirMozillaIE
-from .aljazeera import AlJazeeraIE
-from .alphaporno import AlphaPornoIE
-from .animeondemand import AnimeOnDemandIE
-from .anitube import AnitubeIE
-from .anysex import AnySexIE
-from .aol import (
-    AolIE,
-    AolFeaturesIE,
-)
-from .allocine import AllocineIE
-from .aparat import AparatIE
-from .appleconnect import AppleConnectIE
-from .appletrailers import (
-    AppleTrailersIE,
-    AppleTrailersSectionIE,
-)
-from .archiveorg import ArchiveOrgIE
-from .ard import (
-    ARDIE,
-    ARDMediathekIE,
-    SportschauIE,
-)
-from .arte import (
-    ArteTvIE,
-    ArteTVPlus7IE,
-    ArteTVCreativeIE,
-    ArteTVConcertIE,
-    ArteTVFutureIE,
-    ArteTVCinemaIE,
-    ArteTVDDCIE,
-    ArteTVMagazineIE,
-    ArteTVEmbedIE,
-)
-from .atresplayer import AtresPlayerIE
-from .atttechchannel import ATTTechChannelIE
-from .audimedia import AudiMediaIE
-from .audioboom import AudioBoomIE
-from .audiomack import AudiomackIE, AudiomackAlbumIE
-from .azubu import AzubuIE, AzubuLiveIE
-from .baidu import BaiduVideoIE
-from .bambuser import BambuserIE, BambuserChannelIE
-from .bandcamp import BandcampIE, BandcampAlbumIE
-from .bbc import (
-    BBCCoUkIE,
-    BBCCoUkArticleIE,
-    BBCIE,
-)
-from .beeg import BeegIE
-from .behindkink import BehindKinkIE
-from .beatportpro import BeatportProIE
-from .bet import BetIE
-from .bigflix import BigflixIE
-from .bild import BildIE
-from .bilibili import BiliBiliIE
-from .biobiochiletv import BioBioChileTVIE
-from .bleacherreport import (
-    BleacherReportIE,
-    BleacherReportCMSIE,
-)
-from .blinkx import BlinkxIE
-from .bloomberg import BloombergIE
-from .bokecc import BokeCCIE
-from .bpb import BpbIE
-from .br import BRIE
-from .bravotv import BravoTVIE
-from .breakcom import BreakIE
-from .brightcove import (
-    BrightcoveLegacyIE,
-    BrightcoveNewIE,
-)
-from .buzzfeed import BuzzFeedIE
-from .byutv import BYUtvIE
-from .c56 import C56IE
-from .camdemy import (
-    CamdemyIE,
-    CamdemyFolderIE
-)
-from .canalplus import CanalplusIE
-from .canalc2 import Canalc2IE
-from .canvas import CanvasIE
-from .cbc import (
-    CBCIE,
-    CBCPlayerIE,
-)
-from .cbs import CBSIE
-from .cbsnews import (
-    CBSNewsIE,
-    CBSNewsLiveVideoIE,
-)
-from .cbssports import CBSSportsIE
-from .ccc import CCCIE
-from .cda import CDAIE
-from .ceskatelevize import CeskaTelevizeIE
-from .channel9 import Channel9IE
-from .chaturbate import ChaturbateIE
-from .chilloutzone import ChilloutzoneIE
-from .chirbit import (
-    ChirbitIE,
-    ChirbitProfileIE,
-)
-from .cinchcast import CinchcastIE
-from .cinemassacre import CinemassacreIE
-from .clipfish import ClipfishIE
-from .cliphunter import CliphunterIE
-from .clipsyndicate import ClipsyndicateIE
-from .cloudy import CloudyIE
-from .clubic import ClubicIE
-from .clyp import ClypIE
-from .cmt import CMTIE
-from .cnet import CNETIE
-from .cnn import (
-    CNNIE,
-    CNNBlogsIE,
-    CNNArticleIE,
-)
-from .collegehumor import CollegeHumorIE
-from .collegerama import CollegeRamaIE
-from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
-from .comcarcoff import ComCarCoffIE
-from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
-from .commonprotocols import RtmpIE
-from .condenast import CondeNastIE
-from .cracked import CrackedIE
-from .crackle import CrackleIE
-from .criterion import CriterionIE
-from .crooksandliars import CrooksAndLiarsIE
-from .crunchyroll import (
-    CrunchyrollIE,
-    CrunchyrollShowPlaylistIE
-)
-from .cspan import CSpanIE
-from .ctsnews import CtsNewsIE
-from .cultureunplugged import CultureUnpluggedIE
-from .cwtv import CWTVIE
-from .dailymotion import (
-    DailymotionIE,
-    DailymotionPlaylistIE,
-    DailymotionUserIE,
-    DailymotionCloudIE,
-)
-from .daum import (
-    DaumIE,
-    DaumClipIE,
-    DaumPlaylistIE,
-    DaumUserIE,
-)
-from .dbtv import DBTVIE
-from .dcn import (
-    DCNIE,
-    DCNVideoIE,
-    DCNLiveIE,
-    DCNSeasonIE,
-)
-from .dctp import DctpTvIE
-from .deezer import DeezerPlaylistIE
-from .democracynow import DemocracynowIE
-from .dfb import DFBIE
-from .dhm import DHMIE
-from .dotsub import DotsubIE
-from .douyutv import DouyuTVIE
-from .dplay import DPlayIE
-from .dramafever import (
-    DramaFeverIE,
-    DramaFeverSeriesIE,
-)
-from .dreisat import DreiSatIE
-from .drbonanza import DRBonanzaIE
-from .drtuber import DrTuberIE
-from .drtv import DRTVIE
-from .dvtv import DVTVIE
-from .dump import DumpIE
-from .dumpert import DumpertIE
-from .defense import DefenseGouvFrIE
-from .discovery import DiscoveryIE
-from .dropbox import DropboxIE
-from .dw import (
-    DWIE,
-    DWArticleIE,
-)
-from .eagleplatform import EaglePlatformIE
-from .ebaumsworld import EbaumsWorldIE
-from .echomsk import EchoMskIE
-from .ehow import EHowIE
-from .eighttracks import EightTracksIE
-from .einthusan import EinthusanIE
-from .eitb import EitbIE
-from .ellentv import (
-    EllenTVIE,
-    EllenTVClipsIE,
-)
-from .elpais import ElPaisIE
-from .embedly import EmbedlyIE
-from .engadget import EngadgetIE
-from .eporner import EpornerIE
-from .eroprofile import EroProfileIE
-from .escapist import EscapistIE
-from .espn import ESPNIE
-from .esri import EsriVideoIE
-from .europa import EuropaIE
-from .everyonesmixtape import EveryonesMixtapeIE
-from .exfm import ExfmIE
-from .expotv import ExpoTVIE
-from .extremetube import ExtremeTubeIE
-from .facebook import FacebookIE
-from .faz import FazIE
-from .fc2 import FC2IE
-from .fczenit import FczenitIE
-from .firstpost import FirstpostIE
-from .firsttv import FirstTVIE
-from .fivemin import FiveMinIE
-from .fivetv import FiveTVIE
-from .fktv import FKTVIE
-from .flickr import FlickrIE
-from .folketinget import FolketingetIE
-from .footyroom import FootyRoomIE
-from .fourtube import FourTubeIE
-from .fox import FOXIE
-from .foxgay import FoxgayIE
-from .foxnews import FoxNewsIE
-from .foxsports import FoxSportsIE
-from .franceculture import (
-    FranceCultureIE,
-    FranceCultureEmissionIE,
-)
-from .franceinter import FranceInterIE
-from .francetv import (
-    PluzzIE,
-    FranceTvInfoIE,
-    FranceTVIE,
-    GenerationQuoiIE,
-    CultureboxIE,
-)
-from .freesound import FreesoundIE
-from .freespeech import FreespeechIE
-from .freevideo import FreeVideoIE
-from .funimation import FunimationIE
-from .funnyordie import FunnyOrDieIE
-from .gameinformer import GameInformerIE
-from .gamekings import GamekingsIE
-from .gameone import (
-    GameOneIE,
-    GameOnePlaylistIE,
-)
-from .gamersyde import GamersydeIE
-from .gamespot import GameSpotIE
-from .gamestar import GameStarIE
-from .gametrailers import GametrailersIE
-from .gazeta import GazetaIE
-from .gdcvault import GDCVaultIE
-from .generic import GenericIE
-from .gfycat import GfycatIE
-from .giantbomb import GiantBombIE
-from .giga import GigaIE
-from .glide import GlideIE
-from .globo import (
-    GloboIE,
-    GloboArticleIE,
-)
-from .godtube import GodTubeIE
-from .goldenmoustache import GoldenMoustacheIE
-from .golem import GolemIE
-from .googledrive import GoogleDriveIE
-from .googleplus import GooglePlusIE
-from .googlesearch import GoogleSearchIE
-from .goshgay import GoshgayIE
-from .gputechconf import GPUTechConfIE
-from .groupon import GrouponIE
-from .hark import HarkIE
-from .hbo import HBOIE
-from .hearthisat import HearThisAtIE
-from .heise import HeiseIE
-from .hellporno import HellPornoIE
-from .helsinki import HelsinkiIE
-from .hentaistigma import HentaiStigmaIE
-from .historicfilms import HistoricFilmsIE
-from .hitbox import HitboxIE, HitboxLiveIE
-from .hornbunny import HornBunnyIE
-from .hotnewhiphop import HotNewHipHopIE
-from .hotstar import HotStarIE
-from .howcast import HowcastIE
-from .howstuffworks import HowStuffWorksIE
-from .huffpost import HuffPostIE
-from .hypem import HypemIE
-from .iconosquare import IconosquareIE
-from .ign import (
-    IGNIE,
-    OneUPIE,
-    PCMagIE,
-)
-from .imdb import (
-    ImdbIE,
-    ImdbListIE
-)
-from .imgur import (
-    ImgurIE,
-    ImgurAlbumIE,
-)
-from .ina import InaIE
-from .indavideo import (
-    IndavideoIE,
-    IndavideoEmbedIE,
-)
-from .infoq import InfoQIE
-from .instagram import InstagramIE, InstagramUserIE
-from .internetvideoarchive import InternetVideoArchiveIE
-from .iprima import IPrimaIE
-from .iqiyi import IqiyiIE
-from .ir90tv import Ir90TvIE
-from .ivi import (
-    IviIE,
-    IviCompilationIE
-)
-from .ivideon import IvideonIE
-from .izlesene import IzleseneIE
-from .jadorecettepub import JadoreCettePubIE
-from .jeuxvideo import JeuxVideoIE
-from .jove import JoveIE
-from .jwplatform import JWPlatformIE
-from .jpopsukitv import JpopsukiIE
-from .kaltura import KalturaIE
-from .kanalplay import KanalPlayIE
-from .kankan import KankanIE
-from .karaoketv import KaraoketvIE
-from .karrierevideos import KarriereVideosIE
-from .keezmovies import KeezMoviesIE
-from .khanacademy import KhanAcademyIE
-from .kickstarter import KickStarterIE
-from .keek import KeekIE
-from .konserthusetplay import KonserthusetPlayIE
-from .kontrtube import KontrTubeIE
-from .krasview import KrasViewIE
-from .ku6 import Ku6IE
-from .kusi import KUSIIE
-from .kuwo import (
-    KuwoIE,
-    KuwoAlbumIE,
-    KuwoChartIE,
-    KuwoSingerIE,
-    KuwoCategoryIE,
-    KuwoMvIE,
-)
-from .la7 import LA7IE
-from .laola1tv import Laola1TvIE
-from .lecture2go import Lecture2GoIE
-from .lemonde import LemondeIE
-from .leeco import (
-    LeIE,
-    LePlaylistIE,
-    LetvCloudIE,
-)
-from .libsyn import LibsynIE
-from .lifenews import (
-    LifeNewsIE,
-    LifeEmbedIE,
-)
-from .limelight import (
-    LimelightMediaIE,
-    LimelightChannelIE,
-    LimelightChannelListIE,
-)
-from .liveleak import LiveLeakIE
-from .livestream import (
-    LivestreamIE,
-    LivestreamOriginalIE,
-    LivestreamShortenerIE,
-)
-from .lnkgo import LnkGoIE
-from .lovehomeporn import LoveHomePornIE
-from .lrt import LRTIE
-from .lynda import (
-    LyndaIE,
-    LyndaCourseIE
-)
-from .m6 import M6IE
-from .macgamestore import MacGameStoreIE
-from .mailru import MailRuIE
-from .makerschannel import MakersChannelIE
-from .makertv import MakerTVIE
-from .malemotion import MalemotionIE
-from .matchtv import MatchTVIE
-from .mdr import MDRIE
-from .metacafe import MetacafeIE
-from .metacritic import MetacriticIE
-from .mgoon import MgoonIE
-from .minhateca import MinhatecaIE
-from .ministrygrid import MinistryGridIE
-from .minoto import MinotoIE
-from .miomio import MioMioIE
-from .mit import TechTVMITIE, MITIE, OCWMITIE
-from .mitele import MiTeleIE
-from .mixcloud import MixcloudIE
-from .mlb import MLBIE
-from .mnet import MnetIE
-from .mpora import MporaIE
-from .moevideo import MoeVideoIE
-from .mofosex import MofosexIE
-from .mojvideo import MojvideoIE
-from .moniker import MonikerIE
-from .mooshare import MooshareIE
-from .morningstar import MorningstarIE
-from .motherless import MotherlessIE
-from .motorsport import MotorsportIE
-from .movieclips import MovieClipsIE
-from .moviezine import MoviezineIE
-from .mtv import (
-    MTVIE,
-    MTVServicesEmbeddedIE,
-    MTVIggyIE,
-    MTVDEIE,
-)
-from .muenchentv import MuenchenTVIE
-from .musicplayon import MusicPlayOnIE
-from .muzu import MuzuTVIE
-from .mwave import MwaveIE
-from .myspace import MySpaceIE, MySpaceAlbumIE
-from .myspass import MySpassIE
-from .myvi import MyviIE
-from .myvideo import MyVideoIE
-from .myvidster import MyVidsterIE
-from .nationalgeographic import NationalGeographicIE
-from .naver import NaverIE
-from .nba import NBAIE
-from .nbc import (
-    NBCIE,
-    NBCNewsIE,
-    NBCSportsIE,
-    NBCSportsVPlayerIE,
-    MSNBCIE,
-)
-from .ndr import (
-    NDRIE,
-    NJoyIE,
-    NDREmbedBaseIE,
-    NDREmbedIE,
-    NJoyEmbedIE,
-)
-from .ndtv import NDTVIE
-from .netzkino import NetzkinoIE
-from .nerdcubed import NerdCubedFeedIE
-from .nerdist import NerdistIE
-from .neteasemusic import (
-    NetEaseMusicIE,
-    NetEaseMusicAlbumIE,
-    NetEaseMusicSingerIE,
-    NetEaseMusicListIE,
-    NetEaseMusicMvIE,
-    NetEaseMusicProgramIE,
-    NetEaseMusicDjRadioIE,
-)
-from .newgrounds import NewgroundsIE
-from .newstube import NewstubeIE
-from .nextmedia import (
-    NextMediaIE,
-    NextMediaActionNewsIE,
-    AppleDailyIE,
-)
-from .nextmovie import NextMovieIE
-from .nfb import NFBIE
-from .nfl import NFLIE
-from .nhl import (
-    NHLIE,
-    NHLNewsIE,
-    NHLVideocenterIE,
-)
-from .nick import NickIE
-from .niconico import NiconicoIE, NiconicoPlaylistIE
-from .ninegag import NineGagIE
-from .noco import NocoIE
-from .normalboots import NormalbootsIE
-from .nosvideo import NosVideoIE
-from .nova import NovaIE
-from .novamov import (
-    NovaMovIE,
-    WholeCloudIE,
-    NowVideoIE,
-    VideoWeedIE,
-    CloudTimeIE,
-)
-from .nowness import (
-    NownessIE,
-    NownessPlaylistIE,
-    NownessSeriesIE,
-)
-from .nowtv import (
-    NowTVIE,
-    NowTVListIE,
-)
-from .noz import NozIE
-from .npo import (
-    NPOIE,
-    NPOLiveIE,
-    NPORadioIE,
-    NPORadioFragmentIE,
-    SchoolTVIE,
-    VPROIE,
-    WNLIE
-)
-from .npr import NprIE
-from .nrk import (
-    NRKIE,
-    NRKPlaylistIE,
-    NRKSkoleIE,
-    NRKTVIE,
-)
-from .ntvde import NTVDeIE
-from .ntvru import NTVRuIE
-from .nytimes import (
-    NYTimesIE,
-    NYTimesArticleIE,
-)
-from .nuvid import NuvidIE
-from .odnoklassniki import OdnoklassnikiIE
-from .oktoberfesttv import OktoberfestTVIE
-from .onionstudios import OnionStudiosIE
-from .ooyala import (
-    OoyalaIE,
-    OoyalaExternalIE,
-)
-from .openload import OpenloadIE
-from .ora import OraTVIE
-from .orf import (
-    ORFTVthekIE,
-    ORFOE1IE,
-    ORFFM4IE,
-    ORFIPTVIE,
-)
-from .pandoratv import PandoraTVIE
-from .parliamentliveuk import ParliamentLiveUKIE
-from .patreon import PatreonIE
-from .pbs import PBSIE
-from .periscope import PeriscopeIE
-from .philharmoniedeparis import PhilharmonieDeParisIE
-from .phoenix import PhoenixIE
-from .photobucket import PhotobucketIE
-from .pinkbike import PinkbikeIE
-from .planetaplay import PlanetaPlayIE
-from .pladform import PladformIE
-from .played import PlayedIE
-from .playfm import PlayFMIE
-from .plays import PlaysTVIE
-from .playtvak import PlaytvakIE
-from .playvid import PlayvidIE
-from .playwire import PlaywireIE
-from .pluralsight import (
-    PluralsightIE,
-    PluralsightCourseIE,
-)
-from .podomatic import PodomaticIE
-from .porn91 import Porn91IE
-from .pornhd import PornHdIE
-from .pornhub import (
-    PornHubIE,
-    PornHubPlaylistIE,
-    PornHubUserVideosIE,
-)
-from .pornotube import PornotubeIE
-from .pornovoisines import PornoVoisinesIE
-from .pornoxo import PornoXOIE
-from .primesharetv import PrimeShareTVIE
-from .promptfile import PromptFileIE
-from .prosiebensat1 import ProSiebenSat1IE
-from .puls4 import Puls4IE
-from .pyvideo import PyvideoIE
-from .qqmusic import (
-    QQMusicIE,
-    QQMusicSingerIE,
-    QQMusicAlbumIE,
-    QQMusicToplistIE,
-    QQMusicPlaylistIE,
-)
-from .quickvid import QuickVidIE
-from .r7 import R7IE
-from .radiode import RadioDeIE
-from .radiojavan import RadioJavanIE
-from .radiobremen import RadioBremenIE
-from .radiofrance import RadioFranceIE
-from .rai import (
-    RaiTVIE,
-    RaiIE,
-)
-from .rbmaradio import RBMARadioIE
-from .rds import RDSIE
-from .redtube import RedTubeIE
-from .regiotv import RegioTVIE
-from .restudy import RestudyIE
-from .reverbnation import ReverbNationIE
-from .revision3 import Revision3IE
-from .rice import RICEIE
-from .ringtv import RingTVIE
-from .ro220 import Ro220IE
-from .rottentomatoes import RottenTomatoesIE
-from .roxwel import RoxwelIE
-from .rtbf import RTBFIE
-from .rte import RteIE, RteRadioIE
-from .rtlnl import RtlNlIE
-from .rtl2 import RTL2IE
-from .rtp import RTPIE
-from .rts import RTSIE
-from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
-from .rtvnh import RTVNHIE
-from .ruhd import RUHDIE
-from .ruleporn import RulePornIE
-from .rutube import (
-    RutubeIE,
-    RutubeChannelIE,
-    RutubeEmbedIE,
-    RutubeMovieIE,
-    RutubePersonIE,
-)
-from .rutv import RUTVIE
-from .ruutu import RuutuIE
-from .sandia import SandiaIE
-from .safari import (
-    SafariIE,
-    SafariApiIE,
-    SafariCourseIE,
-)
-from .sapo import SapoIE
-from .savefrom import SaveFromIE
-from .sbs import SBSIE
-from .scivee import SciVeeIE
-from .screencast import ScreencastIE
-from .screencastomatic import ScreencastOMaticIE
-from .screenjunkies import ScreenJunkiesIE
-from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
-from .senateisvp import SenateISVPIE
-from .servingsys import ServingSysIE
-from .sexu import SexuIE
-from .sexykarma import SexyKarmaIE
-from .shahid import ShahidIE
-from .shared import SharedIE
-from .sharesix import ShareSixIE
-from .sina import SinaIE
-from .skynewsarabia import (
-    SkyNewsArabiaIE,
-    SkyNewsArabiaArticleIE,
-)
-from .slideshare import SlideshareIE
-from .slutload import SlutloadIE
-from .smotri import (
-    SmotriIE,
-    SmotriCommunityIE,
-    SmotriUserIE,
-    SmotriBroadcastIE,
-)
-from .snagfilms import (
-    SnagFilmsIE,
-    SnagFilmsEmbedIE,
-)
-from .snotr import SnotrIE
-from .sohu import SohuIE
-from .soundcloud import (
-    SoundcloudIE,
-    SoundcloudSetIE,
-    SoundcloudUserIE,
-    SoundcloudPlaylistIE,
-    SoundcloudSearchIE
-)
-from .soundgasm import (
-    SoundgasmIE,
-    SoundgasmProfileIE
-)
-from .southpark import (
-    SouthParkIE,
-    SouthParkDeIE,
-    SouthParkDkIE,
-    SouthParkEsIE,
-    SouthParkNlIE
-)
-from .spankbang import SpankBangIE
-from .spankwire import SpankwireIE
-from .spiegel import SpiegelIE, SpiegelArticleIE
-from .spiegeltv import SpiegeltvIE
-from .spike import SpikeIE
-from .stitcher import StitcherIE
-from .sport5 import Sport5IE
-from .sportbox import (
-    SportBoxIE,
-    SportBoxEmbedIE,
-)
-from .sportdeutschland import SportDeutschlandIE
-from .srgssr import (
-    SRGSSRIE,
-    SRGSSRPlayIE,
-)
-from .srmediathek import SRMediathekIE
-from .ssa import SSAIE
-from .stanfordoc import StanfordOpenClassroomIE
-from .steam import SteamIE
-from .streamcloud import StreamcloudIE
-from .streamcz import StreamCZIE
-from .streetvoice import StreetVoiceIE
-from .sunporno import SunPornoIE
-from .svt import (
-    SVTIE,
-    SVTPlayIE,
-)
-from .swrmediathek import SWRMediathekIE
-from .syfy import SyfyIE
-from .sztvhu import SztvHuIE
-from .tagesschau import TagesschauIE
-from .tapely import TapelyIE
-from .tass import TassIE
-from .teachertube import (
-    TeacherTubeIE,
-    TeacherTubeUserIE,
-)
-from .teachingchannel import TeachingChannelIE
-from .teamcoco import TeamcocoIE
-from .techtalks import TechTalksIE
-from .ted import TEDIE
-from .tele13 import Tele13IE
-from .telebruxelles import TeleBruxellesIE
-from .telecinco import TelecincoIE
-from .telegraaf import TelegraafIE
-from .telemb import TeleMBIE
-from .teletask import TeleTaskIE
-from .tenplay import TenPlayIE
-from .testurl import TestURLIE
-from .tf1 import TF1IE
-from .theintercept import TheInterceptIE
-from .theonion import TheOnionIE
-from .theplatform import (
-    ThePlatformIE,
-    ThePlatformFeedIE,
-)
-from .thescene import TheSceneIE
-from .thesixtyone import TheSixtyOneIE
-from .thestar import TheStarIE
-from .thisamericanlife import ThisAmericanLifeIE
-from .thisav import ThisAVIE
-from .tinypic import TinyPicIE
-from .tlc import TlcDeIE
-from .tmz import (
-    TMZIE,
-    TMZArticleIE,
-)
-from .tnaflix import (
-    TNAFlixNetworkEmbedIE,
-    TNAFlixIE,
-    EMPFlixIE,
-    MovieFapIE,
-)
-from .toggle import ToggleIE
-from .thvideo import (
-    THVideoIE,
-    THVideoPlaylistIE
-)
-from .toutv import TouTvIE
-from .toypics import ToypicsUserIE, ToypicsIE
-from .traileraddict import TrailerAddictIE
-from .trilulilu import TriluliluIE
-from .trollvids import TrollvidsIE
-from .trutube import TruTubeIE
-from .tube8 import Tube8IE
-from .tubitv import TubiTvIE
-from .tudou import (
-    TudouIE,
-    TudouPlaylistIE,
-    TudouAlbumIE,
-)
-from .tumblr import TumblrIE
-from .tunein import (
-    TuneInClipIE,
-    TuneInStationIE,
-    TuneInProgramIE,
-    TuneInTopicIE,
-    TuneInShortenerIE,
-)
-from .turbo import TurboIE
-from .tutv import TutvIE
-from .tv2 import (
-    TV2IE,
-    TV2ArticleIE,
-)
-from .tv3 import TV3IE
-from .tv4 import TV4IE
-from .tvc import (
-    TVCIE,
-    TVCArticleIE,
-)
-from .tvigle import TvigleIE
-from .tvland import TVLandIE
-from .tvp import TvpIE, TvpSeriesIE
-from .tvplay import TVPlayIE
-from .tweakers import TweakersIE
-from .twentyfourvideo import TwentyFourVideoIE
-from .twentymin import TwentyMinutenIE
-from .twentytwotracks import (
-    TwentyTwoTracksIE,
-    TwentyTwoTracksGenreIE
-)
-from .twitch import (
-    TwitchVideoIE,
-    TwitchChapterIE,
-    TwitchVodIE,
-    TwitchProfileIE,
-    TwitchPastBroadcastsIE,
-    TwitchBookmarksIE,
-    TwitchStreamIE,
-)
-from .twitter import (
-    TwitterCardIE,
-    TwitterIE,
-    TwitterAmplifyIE,
-)
-from .ubu import UbuIE
-from .udemy import (
-    UdemyIE,
-    UdemyCourseIE
-)
-from .udn import UDNEmbedIE
-from .digiteka import DigitekaIE
-from .unistra import UnistraIE
-from .urort import UrortIE
-from .usatoday import USATodayIE
-from .ustream import UstreamIE, UstreamChannelIE
-from .ustudio import UstudioIE
-from .varzesh3 import Varzesh3IE
-from .vbox7 import Vbox7IE
-from .veehd import VeeHDIE
-from .veoh import VeohIE
-from .vessel import VesselIE
-from .vesti import VestiIE
-from .vevo import VevoIE
-from .vgtv import (
-    BTArticleIE,
-    BTVestlendingenIE,
-    VGTVIE,
-)
-from .vh1 import VH1IE
-from .vice import (
-    ViceIE,
-    ViceShowIE,
-)
-from .viddler import ViddlerIE
-from .videodetective import VideoDetectiveIE
-from .videofyme import VideofyMeIE
-from .videomega import VideoMegaIE
-from .videomore import (
-    VideomoreIE,
-    VideomoreVideoIE,
-    VideomoreSeasonIE,
-)
-from .videopremium import VideoPremiumIE
-from .videott import VideoTtIE
-from .vidme import (
-    VidmeIE,
-    VidmeUserIE,
-    VidmeUserLikesIE,
-)
-from .vidzi import VidziIE
-from .vier import VierIE, VierVideosIE
-from .viewster import ViewsterIE
-from .viidea import ViideaIE
-from .vimeo import (
-    VimeoIE,
-    VimeoAlbumIE,
-    VimeoChannelIE,
-    VimeoGroupsIE,
-    VimeoLikesIE,
-    VimeoOndemandIE,
-    VimeoReviewIE,
-    VimeoUserIE,
-    VimeoWatchLaterIE,
-)
-from .vimple import VimpleIE
-from .vine import (
-    VineIE,
-    VineUserIE,
-)
-from .viki import (
-    VikiIE,
-    VikiChannelIE,
-)
-from .vk import (
-    VKIE,
-    VKUserVideosIE,
-)
-from .vlive import VLiveIE
-from .vodlocker import VodlockerIE
-from .voicerepublic import VoiceRepublicIE
-from .vporn import VpornIE
-from .vrt import VRTIE
-from .vube import VubeIE
-from .vuclip import VuClipIE
-from .vulture import VultureIE
-from .walla import WallaIE
-from .washingtonpost import WashingtonPostIE
-from .wat import WatIE
-from .wayofthemaster import WayOfTheMasterIE
-from .wdr import (
-    WDRIE,
-    WDRMobileIE,
-    WDRMausIE,
-)
-from .webofstories import (
-    WebOfStoriesIE,
-    WebOfStoriesPlaylistIE,
-)
-from .weibo import WeiboIE
-from .weiqitv import WeiqiTVIE
-from .wimp import WimpIE
-from .wistia import WistiaIE
-from .worldstarhiphop import WorldStarHipHopIE
-from .wrzuta import WrzutaIE
-from .wsj import WSJIE
-from .xbef import XBefIE
-from .xboxclips import XboxClipsIE
-from .xfileshare import XFileShareIE
-from .xhamster import (
-    XHamsterIE,
-    XHamsterEmbedIE,
-)
-from .xminus import XMinusIE
-from .xnxx import XNXXIE
-from .xstream import XstreamIE
-from .xtube import XTubeUserIE, XTubeIE
-from .xuite import XuiteIE
-from .xvideos import XVideosIE
-from .xxxymovies import XXXYMoviesIE
-from .yahoo import (
-    YahooIE,
-    YahooSearchIE,
-)
-from .yam import YamIE
-from .yandexmusic import (
-    YandexMusicTrackIE,
-    YandexMusicAlbumIE,
-    YandexMusicPlaylistIE,
-)
-from .yesjapan import YesJapanIE
-from .yinyuetai import YinYueTaiIE
-from .ynet import YnetIE
-from .youjizz import YouJizzIE
-from .youku import YoukuIE
-from .youporn import YouPornIE
-from .yourupload import YourUploadIE
-from .youtube import (
-    YoutubeIE,
-    YoutubeChannelIE,
-    YoutubeFavouritesIE,
-    YoutubeHistoryIE,
-    YoutubeLiveIE,
-    YoutubePlaylistIE,
-    YoutubePlaylistsIE,
-    YoutubeRecommendedIE,
-    YoutubeSearchDateIE,
-    YoutubeSearchIE,
-    YoutubeSearchURLIE,
-    YoutubeShowIE,
-    YoutubeSubscriptionsIE,
-    YoutubeTruncatedIDIE,
-    YoutubeTruncatedURLIE,
-    YoutubeUserIE,
-    YoutubeWatchLaterIE,
-)
-from .zapiks import ZapiksIE
-from .zdf import ZDFIE, ZDFChannelIE
-from .zingmp3 import (
-    ZingMp3SongIE,
-    ZingMp3AlbumIE,
-)
-from .zippcast import ZippCastIE
-
-_ALL_CLASSES = [
-    klass
-    for name, klass in globals().items()
-    if name.endswith('IE') and name != 'GenericIE'
-]
-_ALL_CLASSES.append(GenericIE)
+try:
+    from .lazy_extractors import *
+    from .lazy_extractors import _ALL_CLASSES
+    _LAZY_LOADER = True
+except ImportError:
+    _LAZY_LOADER = False
+    from .extractors import *
+
+    _ALL_CLASSES = [
+        klass
+        for name, klass in globals().items()
+        if name.endswith('IE') and name != 'GenericIE'
+    ]
+    _ALL_CLASSES.append(GenericIE)
+
+
+def gen_extractor_classes():
+    """ Return a list of supported extractors.
+    The order does matter; the first extractor matched is the one handling the URL.
+    """
+    return _ALL_CLASSES
  
  
  def gen_extractors():
      """ Return a list of an instance of every supported extractor.
      The order does matter; the first extractor matched is the one handling the URL.
      """
-    return [klass() for klass in _ALL_CLASSES]
+    return [klass() for klass in gen_extractor_classes()]
  
  
  def list_extractors(age_limit):
diff --git a/youtube_dl/extractor/abc.py b/youtube_dl/extractor/abc.py

index b584277be92b5a86fb9e0ac5d95870444d441174..0247cabf9df8a6c61602085dcabe5f139b53420a 100644 (file)
--- a/youtube_dl/extractor/abc.py
+++ b/youtube_dl/extractor/abc.py
@@ -7,12 +7,13 @@ from ..utils import (
      ExtractorError,
      js_to_json,
      int_or_none,
+    parse_iso8601,
  )
  
  
  class ABCIE(InfoExtractor):
      IE_NAME = 'abc.net.au'
-    _VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',
@@ -93,3 +94,59 @@ class ABCIE(InfoExtractor):
              'description': self._og_search_description(webpage),
              'thumbnail': self._og_search_thumbnail(webpage),
          }
+
+
+class ABCIViewIE(InfoExtractor):
+    IE_NAME = 'abc.net.au:iview'
+    _VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
+
+    # ABC iview programs are normally available for 14 days only.
+    _TESTS = [{
+        'url': 'http://iview.abc.net.au/programs/diaries-of-a-broken-mind/ZX9735A001S00',
+        'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
+        'info_dict': {
+            'id': 'ZX9735A001S00',
+            'ext': 'mp4',
+            'title': 'Diaries Of A Broken Mind',
+            'description': 'md5:7de3903874b7a1be279fe6b68718fc9e',
+            'upload_date': '20161010',
+            'uploader_id': 'abc2',
+            'timestamp': 1476064920,
+        },
+        'skip': 'Video gone',
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        video_params = self._parse_json(self._search_regex(
+            r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id)
+        title = video_params.get('title') or video_params['seriesTitle']
+        stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
+
+        formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
+        self._sort_formats(formats)
+
+        subtitles = {}
+        src_vtt = stream.get('captions', {}).get('src-vtt')
+        if src_vtt:
+            subtitles['en'] = [{
+                'url': src_vtt,
+                'ext': 'vtt',
+            }]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': self._html_search_meta(['og:description', 'twitter:description'], webpage),
+            'thumbnail': self._html_search_meta(['og:image', 'twitter:image:src'], webpage),
+            'duration': int_or_none(video_params.get('eventDuration')),
+            'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
+            'series': video_params.get('seriesTitle'),
+            'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
+            'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage, default=None)),
+            'episode': self._html_search_meta('episode_title', webpage, default=None),
+            'uploader_id': video_params.get('channel'),
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/abc7news.py b/youtube_dl/extractor/abc7news.py

deleted file mode 100644 (file)

index c04949c..0000000
--- a/youtube_dl/extractor/abc7news.py
+++ /dev/null
@@ -1,68 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import parse_iso8601
-
-
-class Abc7NewsIE(InfoExtractor):
-    _VALID_URL = r'https?://abc7news\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
-    _TESTS = [
-        {
-            'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
-            'info_dict': {
-                'id': '472581',
-                'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
-                'ext': 'mp4',
-                'title': 'East Bay museum celebrates history of synthesized music',
-                'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
-                'thumbnail': 're:^https?://.*\.jpg$',
-                'timestamp': 1421123075,
-                'upload_date': '20150113',
-                'uploader': 'Jonathan Bloom',
-            },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            },
-        },
-        {
-            'url': 'http://abc7news.com/472581',
-            'only_matching': True,
-        },
-    ]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id') or video_id
-
-        webpage = self._download_webpage(url, display_id)
-
-        m3u8 = self._html_search_meta(
-            'contentURL', webpage, 'm3u8 url', fatal=True)
-
-        formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
-        self._sort_formats(formats)
-
-        title = self._og_search_title(webpage).strip()
-        description = self._og_search_description(webpage).strip()
-        thumbnail = self._og_search_thumbnail(webpage)
-        timestamp = parse_iso8601(self._search_regex(
-            r'<div class="meta">\s*<time class="timeago" datetime="([^"]+)">',
-            webpage, 'upload date', fatal=False))
-        uploader = self._search_regex(
-            r'rel="author">([^<]+)</a>',
-            webpage, 'uploader', default=None)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'timestamp': timestamp,
-            'uploader': uploader,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/abcnews.py b/youtube_dl/extractor/abcnews.py

new file mode 100644 (file)

index 0000000..6ae5d9a
--- /dev/null
+++ b/youtube_dl/extractor/abcnews.py
@@ -0,0 +1,135 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import calendar
+import re
+import time
+
+from .amp import AMPIE
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+
+
+class AbcNewsVideoIE(AMPIE):
+    IE_NAME = 'abcnews:video'
+    _VALID_URL = r'https?://abcnews\.go\.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
+
+    _TESTS = [{
+        'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932',
+        'info_dict': {
+            'id': '20411932',
+            'ext': 'mp4',
+            'display_id': 'week-exclusive-irans-foreign-minister-zarif',
+            'title': '\'This Week\' Exclusive: Iran\'s Foreign Minister Zarif',
+            'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.',
+            'duration': 180,
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://abcnews.go.com/2020/video/2020-husband-stands-teacher-jail-student-affairs-26119478',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        display_id = mobj.group('display_id')
+        video_id = mobj.group('id')
+        info_dict = self._extract_feed_info(
+            'http://abcnews.go.com/video/itemfeed?id=%s' % video_id)
+        info_dict.update({
+            'id': video_id,
+            'display_id': display_id,
+        })
+        return info_dict
+
+
+class AbcNewsIE(InfoExtractor):
+    IE_NAME = 'abcnews'
+    _VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
+
+    _TESTS = [{
+        'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',
+        'info_dict': {
+            'id': '10498713',
+            'ext': 'flv',
+            'display_id': 'dramatic-video-rare-death-job-america',
+            'title': 'Occupational Hazards',
+            'description': 'Nightline investigates the dangers that lurk at various jobs.',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'upload_date': '20100428',
+            'timestamp': 1272412800,
+        },
+        'add_ie': ['AbcNewsVideo'],
+    }, {
+        'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
+        'info_dict': {
+            'id': '39125818',
+            'ext': 'mp4',
+            'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
+            'title': 'Justin Timberlake Drops Hints For Secret Single',
+            'description': 'Lara Spencer reports the buzziest stories of the day in "GMA" Pop News.',
+            'upload_date': '20160515',
+            'timestamp': 1463329500,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+            # The embedded YouTube video is blocked due to copyright issues
+            'playlist_items': '1',
+        },
+        'add_ie': ['AbcNewsVideo'],
+    }, {
+        'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        display_id = mobj.group('display_id')
+        video_id = mobj.group('id')
+
+        webpage = self._download_webpage(url, video_id)
+        video_url = self._search_regex(
+            r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL')
+        full_video_url = compat_urlparse.urljoin(url, video_url)
+
+        youtube_url = self._html_search_regex(
+            r'<iframe[^>]+src="(https://www\.youtube\.com/embed/[^"]+)"',
+            webpage, 'YouTube URL', default=None)
+
+        timestamp = None
+        date_str = self._html_search_regex(
+            r'<span[^>]+class="timestamp">([^<]+)</span>',
+            webpage, 'timestamp', fatal=False)
+        if date_str:
+            tz_offset = 0
+            if date_str.endswith(' ET'):  # Eastern Time
+                tz_offset = -5
+                date_str = date_str[:-3]
+            date_formats = ['%b. %d, %Y', '%b %d, %Y, %I:%M %p']
+            for date_format in date_formats:
+                try:
+                    timestamp = calendar.timegm(time.strptime(date_str.strip(), date_format))
+                except ValueError:
+                    continue
+            if timestamp is not None:
+                timestamp -= tz_offset * 3600
+
+        entry = {
+            '_type': 'url_transparent',
+            'ie_key': AbcNewsVideoIE.ie_key(),
+            'url': full_video_url,
+            'id': video_id,
+            'display_id': display_id,
+            'timestamp': timestamp,
+        }
+
+        if youtube_url:
+            entries = [entry, self.url_result(youtube_url, 'Youtube')]
+            return self.playlist_result(entries)
+
+        return entry
diff --git a/youtube_dl/extractor/abcotvs.py b/youtube_dl/extractor/abcotvs.py

new file mode 100644 (file)

index 0000000..054bb05
--- /dev/null
+++ b/youtube_dl/extractor/abcotvs.py
@@ -0,0 +1,112 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+)
+
+
+class ABCOTVSIE(InfoExtractor):
+    IE_NAME = 'abcotvs'
+    IE_DESC = 'ABC Owned Television Stations'
+    _VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
+    _TESTS = [
+        {
+            'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
+            'info_dict': {
+                'id': '472581',
+                'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
+                'ext': 'mp4',
+                'title': 'East Bay museum celebrates vintage synthesizers',
+                'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
+                'thumbnail': 're:^https?://.*\.jpg$',
+                'timestamp': 1421123075,
+                'upload_date': '20150113',
+                'uploader': 'Jonathan Bloom',
+            },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
+        },
+        {
+            'url': 'http://abc7news.com/472581',
+            'only_matching': True,
+        },
+    ]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id') or video_id
+
+        webpage = self._download_webpage(url, display_id)
+
+        m3u8 = self._html_search_meta(
+            'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0]
+
+        formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
+        self._sort_formats(formats)
+
+        title = self._og_search_title(webpage).strip()
+        description = self._og_search_description(webpage).strip()
+        thumbnail = self._og_search_thumbnail(webpage)
+        timestamp = parse_iso8601(self._search_regex(
+            r'<div class="meta">\s*<time class="timeago" datetime="([^"]+)">',
+            webpage, 'upload date', fatal=False))
+        uploader = self._search_regex(
+            r'rel="author">([^<]+)</a>',
+            webpage, 'uploader', default=None)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'timestamp': timestamp,
+            'uploader': uploader,
+            'formats': formats,
+        }
+
+
+class ABCOTVSClipsIE(InfoExtractor):
+    IE_NAME = 'abcotvs:clips'
+    _VALID_URL = r'https?://clips\.abcotvs\.com/(?:[^/]+/)*video/(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://clips.abcotvs.com/kabc/video/214814',
+        'info_dict': {
+            'id': '214814',
+            'ext': 'mp4',
+            'title': 'SpaceX launch pad explosion destroys rocket, satellite',
+            'description': 'md5:9f186e5ad8f490f65409965ee9c7be1b',
+            'upload_date': '20160901',
+            'timestamp': 1472756695,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_json('https://clips.abcotvs.com/vogo/video/getByIds?ids=' + video_id, video_id)['results'][0]
+        title = video_data['title']
+        formats = self._extract_m3u8_formats(
+            video_data['videoURL'].split('?')[0], video_id, 'mp4')
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'thumbnail': video_data.get('thumbnailURL'),
+            'duration': int_or_none(video_data.get('duration')),
+            'timestamp': int_or_none(video_data.get('pubDate')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/acast.py b/youtube_dl/extractor/acast.py

index 92eee8119da420a78782d5eea05a2d3564468d3a..94ce88c834f5ce1575b36f839f02fdf43f96e046 100644 (file)
--- a/youtube_dl/extractor/acast.py
+++ b/youtube_dl/extractor/acast.py
@@ -2,10 +2,14 @@
  from __future__ import unicode_literals
  
  import re
+import functools
  
  from .common import InfoExtractor
  from ..compat import compat_str
-from ..utils import int_or_none
+from ..utils import (
+    int_or_none,
+    OnDemandPagedList,
+)
  
  
  class ACastIE(InfoExtractor):
@@ -26,13 +30,8 @@ class ACastIE(InfoExtractor):
  
      def _real_extract(self, url):
          channel, display_id = re.match(self._VALID_URL, url).groups()
-
-        embed_page = self._download_webpage(
-            re.sub('(?:www\.)?acast\.com', 'embedcdn.acast.com', url), display_id)
-        cast_data = self._parse_json(self._search_regex(
-            r'window\[\'acast/queries\'\]\s*=\s*([^;]+);', embed_page, 'acast data'),
-            display_id)['GetAcast/%s/%s' % (channel, display_id)]
-
+        cast_data = self._download_json(
+            'https://embed.acast.com/api/acasts/%s/%s' % (channel, display_id), display_id)
          return {
              'id': compat_str(cast_data['id']),
              'display_id': display_id,
@@ -58,15 +57,26 @@ class ACastChannelIE(InfoExtractor):
          'playlist_mincount': 20,
      }
      _API_BASE_URL = 'https://www.acast.com/api/'
+    _PAGE_SIZE = 10
  
      @classmethod
      def suitable(cls, url):
          return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
  
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        channel_data = self._download_json(self._API_BASE_URL + 'channels/%s' % display_id, display_id)
-        casts = self._download_json(self._API_BASE_URL + 'channels/%s/acasts' % display_id, display_id)
-        entries = [self.url_result('https://www.acast.com/%s/%s' % (display_id, cast['url']), 'ACast') for cast in casts]
+    def _fetch_page(self, channel_slug, page):
+        casts = self._download_json(
+            self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
+            channel_slug, note='Download page %d of channel data' % page)
+        for cast in casts:
+            yield self.url_result(
+                'https://www.acast.com/%s/%s' % (channel_slug, cast['url']),
+                'ACast', cast['id'])
  
-        return self.playlist_result(entries, compat_str(channel_data['id']), channel_data['name'], channel_data.get('description'))
+    def _real_extract(self, url):
+        channel_slug = self._match_id(url)
+        channel_data = self._download_json(
+            self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug)
+        entries = OnDemandPagedList(functools.partial(
+            self._fetch_page, channel_slug), self._PAGE_SIZE)
+        return self.playlist_result(entries, compat_str(
+            channel_data['id']), channel_data['name'], channel_data.get('description'))
diff --git a/youtube_dl/extractor/adobepass.py b/youtube_dl/extractor/adobepass.py

new file mode 100644 (file)

index 0000000..12eeab2
--- /dev/null
+++ b/youtube_dl/extractor/adobepass.py
@@ -0,0 +1,1472 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+import time
+import xml.etree.ElementTree as etree
+
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    unescapeHTML,
+    urlencode_postdata,
+    unified_timestamp,
+    ExtractorError,
+)
+
+
+MSO_INFO = {
+    'DTV': {
+        'name': 'DIRECTV',
+        'username_field': 'username',
+        'password_field': 'password',
+    },
+    'Rogers': {
+        'name': 'Rogers',
+        'username_field': 'UserName',
+        'password_field': 'UserPassword',
+    },
+    'Comcast_SSO': {
+        'name': 'Comcast XFINITY',
+        'username_field': 'user',
+        'password_field': 'passwd',
+    },
+    'thr030': {
+        'name': '3 Rivers Communications'
+    },
+    'com140': {
+        'name': 'Access Montana'
+    },
+    'acecommunications': {
+        'name': 'AcenTek'
+    },
+    'acm010': {
+        'name': 'Acme Communications'
+    },
+    'ada020': {
+        'name': 'Adams Cable Service'
+    },
+    'alb020': {
+        'name': 'Albany Mutual Telephone'
+    },
+    'algona': {
+        'name': 'Algona Municipal Utilities'
+    },
+    'allwest': {
+        'name': 'All West Communications'
+    },
+    'all025': {
+        'name': 'Allen\'s Communications'
+    },
+    'spl010': {
+        'name': 'Alliance Communications'
+    },
+    'all070': {
+        'name': 'ALLO Communications'
+    },
+    'alpine': {
+        'name': 'Alpine Communications'
+    },
+    'hun015': {
+        'name': 'American Broadband'
+    },
+    'nwc010': {
+        'name': 'American Broadband Missouri'
+    },
+    'com130-02': {
+        'name': 'American Community Networks'
+    },
+    'com130-01': {
+        'name': 'American Warrior Networks'
+    },
+    'tom020': {
+        'name': 'Amherst Telephone/Tomorrow Valley'
+    },
+    'tvc020': {
+        'name': 'Andycable'
+    },
+    'arkwest': {
+        'name': 'Arkwest Communications'
+    },
+    'art030': {
+        'name': 'Arthur Mutual Telephone Company'
+    },
+    'arvig': {
+        'name': 'Arvig'
+    },
+    'nttcash010': {
+        'name': 'Ashland Home Net'
+    },
+    'astound': {
+        'name': 'Astound (now Wave)'
+    },
+    'dix030': {
+        'name': 'ATC Broadband'
+    },
+    'ara010': {
+        'name': 'ATC Communications'
+    },
+    'she030-02': {
+        'name': 'Ayersville Communications'
+    },
+    'baldwin': {
+        'name': 'Baldwin Lightstream'
+    },
+    'bal040': {
+        'name': 'Ballard TV'
+    },
+    'cit025': {
+        'name': 'Bardstown Cable TV'
+    },
+    'bay030': {
+        'name': 'Bay Country Communications'
+    },
+    'tel095': {
+        'name': 'Beaver Creek Cooperative Telephone'
+    },
+    'bea020': {
+        'name': 'Beaver Valley Cable'
+    },
+    'bee010': {
+        'name': 'Bee Line Cable'
+    },
+    'wir030': {
+        'name': 'Beehive Broadband'
+    },
+    'bra020': {
+        'name': 'BELD'
+    },
+    'bel020': {
+        'name': 'Bellevue Municipal Cable'
+    },
+    'vol040-01': {
+        'name': 'Ben Lomand Connect / BLTV'
+    },
+    'bev010': {
+        'name': 'BEVCOMM'
+    },
+    'big020': {
+        'name': 'Big Sandy Broadband'
+    },
+    'ble020': {
+        'name': 'Bledsoe Telephone Cooperative'
+    },
+    'bvt010': {
+        'name': 'Blue Valley Tele-Communications'
+    },
+    'bra050': {
+        'name': 'Brandenburg Telephone Co.'
+    },
+    'bte010': {
+        'name': 'Bristol Tennessee Essential Services'
+    },
+    'annearundel': {
+        'name': 'Broadstripe'
+    },
+    'btc010': {
+        'name': 'BTC Communications'
+    },
+    'btc040': {
+        'name': 'BTC Vision - Nahunta'
+    },
+    'bul010': {
+        'name': 'Bulloch Telephone Cooperative'
+    },
+    'but010': {
+        'name': 'Butler-Bremer Communications'
+    },
+    'tel160-csp': {
+        'name': 'C Spire SNAP'
+    },
+    'csicable': {
+        'name': 'Cable Services Inc.'
+    },
+    'cableamerica': {
+        'name': 'CableAmerica'
+    },
+    'cab038': {
+        'name': 'CableSouth Media 3'
+    },
+    'weh010-camtel': {
+        'name': 'Cam-Tel Company'
+    },
+    'car030': {
+        'name': 'Cameron Communications'
+    },
+    'canbytel': {
+        'name': 'Canby Telcom'
+    },
+    'crt020': {
+        'name': 'CapRock Tv'
+    },
+    'car050': {
+        'name': 'Carnegie Cable'
+    },
+    'cas': {
+        'name': 'CAS Cable'
+    },
+    'casscomm': {
+        'name': 'CASSCOMM'
+    },
+    'mid180-02': {
+        'name': 'Catalina Broadband Solutions'
+    },
+    'cccomm': {
+        'name': 'CC Communications'
+    },
+    'nttccde010': {
+        'name': 'CDE Lightband'
+    },
+    'cfunet': {
+        'name': 'Cedar Falls Utilities'
+    },
+    'dem010-01': {
+        'name': 'Celect-Bloomer Telephone Area'
+    },
+    'dem010-02': {
+        'name': 'Celect-Bruce Telephone Area'
+    },
+    'dem010-03': {
+        'name': 'Celect-Citizens Connected Area'
+    },
+    'dem010-04': {
+        'name': 'Celect-Elmwood/Spring Valley Area'
+    },
+    'dem010-06': {
+        'name': 'Celect-Mosaic Telecom'
+    },
+    'dem010-05': {
+        'name': 'Celect-West WI Telephone Area'
+    },
+    'net010-02': {
+        'name': 'Cellcom/Nsight Telservices'
+    },
+    'cen100': {
+        'name': 'CentraCom'
+    },
+    'nttccst010': {
+        'name': 'Central Scott / CSTV'
+    },
+    'cha035': {
+        'name': 'Chaparral CableVision'
+    },
+    'cha050': {
+        'name': 'Chariton Valley Communication Corporation, Inc.'
+    },
+    'cha060': {
+        'name': 'Chatmoss Cablevision'
+    },
+    'nttcche010': {
+        'name': 'Cherokee Communications'
+    },
+    'che050': {
+        'name': 'Chesapeake Bay Communications'
+    },
+    'cimtel': {
+        'name': 'Cim-Tel Cable, LLC.'
+    },
+    'cit180': {
+        'name': 'Citizens Cablevision - Floyd, VA'
+    },
+    'cit210': {
+        'name': 'Citizens Cablevision, Inc.'
+    },
+    'cit040': {
+        'name': 'Citizens Fiber'
+    },
+    'cit250': {
+        'name': 'Citizens Mutual'
+    },
+    'war040': {
+        'name': 'Citizens Telephone Corporation'
+    },
+    'wat025': {
+        'name': 'City Of Monroe'
+    },
+    'wadsworth': {
+        'name': 'CityLink'
+    },
+    'nor100': {
+        'name': 'CL Tel'
+    },
+    'cla010': {
+        'name': 'Clarence Telephone and Cedar Communications'
+    },
+    'ser060': {
+        'name': 'Clear Choice Communications'
+    },
+    'tac020': {
+        'name': 'Click! Cable TV'
+    },
+    'war020': {
+        'name': 'CLICK1.NET'
+    },
+    'cml010': {
+        'name': 'CML Telephone Cooperative Association'
+    },
+    'cns': {
+        'name': 'CNS'
+    },
+    'com160': {
+        'name': 'Co-Mo Connect'
+    },
+    'coa020': {
+        'name': 'Coast Communications'
+    },
+    'coa030': {
+        'name': 'Coaxial Cable TV'
+    },
+    'mid055': {
+        'name': 'Cobalt TV (Mid-State Community TV)'
+    },
+    'col070': {
+        'name': 'Columbia Power & Water Systems'
+    },
+    'col080': {
+        'name': 'Columbus Telephone'
+    },
+    'nor105': {
+        'name': 'Communications 1 Cablevision, Inc.'
+    },
+    'com150': {
+        'name': 'Community Cable & Broadband'
+    },
+    'com020': {
+        'name': 'Community Communications Company'
+    },
+    'coy010': {
+        'name': 'commZoom'
+    },
+    'com025': {
+        'name': 'Complete Communication Services'
+    },
+    'cat020': {
+        'name': 'Comporium'
+    },
+    'com071': {
+        'name': 'ComSouth Telesys'
+    },
+    'consolidatedcable': {
+        'name': 'Consolidated'
+    },
+    'conwaycorp': {
+        'name': 'Conway Corporation'
+    },
+    'coo050': {
+        'name': 'Coon Valley Telecommunications Inc'
+    },
+    'coo080': {
+        'name': 'Cooperative Telephone Company'
+    },
+    'cpt010': {
+        'name': 'CP-TEL'
+    },
+    'cra010': {
+        'name': 'Craw-Kan Telephone'
+    },
+    'crestview': {
+        'name': 'Crestview Cable Communications'
+    },
+    'cross': {
+        'name': 'Cross TV'
+    },
+    'cro030': {
+        'name': 'Crosslake Communications'
+    },
+    'ctc040': {
+        'name': 'CTC - Brainerd MN'
+    },
+    'phe030': {
+        'name': 'CTV-Beam - East Alabama'
+    },
+    'cun010': {
+        'name': 'Cunningham Telephone & Cable'
+    },
+    'dpc010': {
+        'name': 'D & P Communications'
+    },
+    'dak030': {
+        'name': 'Dakota Central Telecommunications'
+    },
+    'nttcdel010': {
+        'name': 'Delcambre Telephone LLC'
+    },
+    'tel160-del': {
+        'name': 'Delta Telephone Company'
+    },
+    'sal040': {
+        'name': 'DiamondNet'
+    },
+    'ind060-dc': {
+        'name': 'Direct Communications'
+    },
+    'doy010': {
+        'name': 'Doylestown Cable TV'
+    },
+    'dic010': {
+        'name': 'DRN'
+    },
+    'dtc020': {
+        'name': 'DTC'
+    },
+    'dtc010': {
+        'name': 'DTC Cable (Delhi)'
+    },
+    'dum010': {
+        'name': 'Dumont Telephone Company'
+    },
+    'dun010': {
+        'name': 'Dunkerton Telephone Cooperative'
+    },
+    'cci010': {
+        'name': 'Duo County Telecom'
+    },
+    'eagle': {
+        'name': 'Eagle Communications'
+    },
+    'weh010-east': {
+        'name': 'East Arkansas Cable TV'
+    },
+    'eatel': {
+        'name': 'EATEL Video, LLC'
+    },
+    'ell010': {
+        'name': 'ECTA'
+    },
+    'emerytelcom': {
+        'name': 'Emery Telcom Video LLC'
+    },
+    'nor200': {
+        'name': 'Empire Access'
+    },
+    'endeavor': {
+        'name': 'Endeavor Communications'
+    },
+    'sun045': {
+        'name': 'Enhanced Telecommunications Corporation'
+    },
+    'mid030': {
+        'name': 'enTouch'
+    },
+    'epb020': {
+        'name': 'EPB Smartnet'
+    },
+    'jea010': {
+        'name': 'EPlus Broadband'
+    },
+    'com065': {
+        'name': 'ETC'
+    },
+    'ete010': {
+        'name': 'Etex Communications'
+    },
+    'fbc-tele': {
+        'name': 'F&B Communications'
+    },
+    'fal010': {
+        'name': 'Falcon Broadband'
+    },
+    'fam010': {
+        'name': 'FamilyView CableVision'
+    },
+    'far020': {
+        'name': 'Farmers Mutual Telephone Company'
+    },
+    'fay010': {
+        'name': 'Fayetteville Public Utilities'
+    },
+    'sal060': {
+        'name': 'fibrant'
+    },
+    'fid010': {
+        'name': 'Fidelity Communications'
+    },
+    'for030': {
+        'name': 'FJ Communications'
+    },
+    'fli020': {
+        'name': 'Flint River Communications'
+    },
+    'far030': {
+        'name': 'FMT - Jesup'
+    },
+    'foo010': {
+        'name': 'Foothills Communications'
+    },
+    'for080': {
+        'name': 'Forsyth CableNet'
+    },
+    'fbcomm': {
+        'name': 'Frankfort Plant Board'
+    },
+    'tel160-fra': {
+        'name': 'Franklin Telephone Company'
+    },
+    'nttcftc010': {
+        'name': 'FTC'
+    },
+    'fullchannel': {
+        'name': 'Full Channel, Inc.'
+    },
+    'gar040': {
+        'name': 'Gardonville Cooperative Telephone Association'
+    },
+    'gbt010': {
+        'name': 'GBT Communications, Inc.'
+    },
+    'tec010': {
+        'name': 'Genuine Telecom'
+    },
+    'clr010': {
+        'name': 'Giant Communications'
+    },
+    'gla010': {
+        'name': 'Glasgow EPB'
+    },
+    'gle010': {
+        'name': 'Glenwood Telecommunications'
+    },
+    'gra060': {
+        'name': 'GLW Broadband Inc.'
+    },
+    'goldenwest': {
+        'name': 'Golden West Cablevision'
+    },
+    'vis030': {
+        'name': 'Grantsburg Telcom'
+    },
+    'gpcom': {
+        'name': 'Great Plains Communications'
+    },
+    'gri010': {
+        'name': 'Gridley Cable Inc'
+    },
+    'hbc010': {
+        'name': 'H&B Cable Services'
+    },
+    'hae010': {
+        'name': 'Haefele TV Inc.'
+    },
+    'htc010': {
+        'name': 'Halstad Telephone Company'
+    },
+    'har005': {
+        'name': 'Harlan Municipal Utilities'
+    },
+    'har020': {
+        'name': 'Hart Communications'
+    },
+    'ced010': {
+        'name': 'Hartelco TV'
+    },
+    'hea040': {
+        'name': 'Heart of Iowa Communications Cooperative'
+    },
+    'htc020': {
+        'name': 'Hickory Telephone Company'
+    },
+    'nttchig010': {
+        'name': 'Highland Communication Services'
+    },
+    'hig030': {
+        'name': 'Highland Media'
+    },
+    'spc010': {
+        'name': 'Hilliary Communications'
+    },
+    'hin020': {
+        'name': 'Hinton CATV Co.'
+    },
+    'hometel': {
+        'name': 'HomeTel Entertainment, Inc.'
+    },
+    'hoodcanal': {
+        'name': 'Hood Canal Communications'
+    },
+    'weh010-hope': {
+        'name': 'Hope - Prescott Cable TV'
+    },
+    'horizoncable': {
+        'name': 'Horizon Cable TV, Inc.'
+    },
+    'hor040': {
+        'name': 'Horizon Chillicothe Telephone'
+    },
+    'htc030': {
+        'name': 'HTC Communications Co. - IL'
+    },
+    'htccomm': {
+        'name': 'HTC Communications, Inc. - IA'
+    },
+    'wal005': {
+        'name': 'Huxley Communications'
+    },
+    'imon': {
+        'name': 'ImOn Communications'
+    },
+    'ind040': {
+        'name': 'Independence Telecommunications'
+    },
+    'rrc010': {
+        'name': 'Inland Networks'
+    },
+    'stc020': {
+        'name': 'Innovative Cable TV St Croix'
+    },
+    'car100': {
+        'name': 'Innovative Cable TV St Thomas-St John'
+    },
+    'icc010': {
+        'name': 'Inside Connect Cable'
+    },
+    'int100': {
+        'name': 'Integra Telecom'
+    },
+    'int050': {
+        'name': 'Interstate Telecommunications Coop'
+    },
+    'irv010': {
+        'name': 'Irvine Cable'
+    },
+    'k2c010': {
+        'name': 'K2 Communications'
+    },
+    'kal010': {
+        'name': 'Kalida Telephone Company, Inc.'
+    },
+    'kal030': {
+        'name': 'Kalona Cooperative Telephone Company'
+    },
+    'kmt010': {
+        'name': 'KMTelecom'
+    },
+    'kpu010': {
+        'name': 'KPU Telecommunications'
+    },
+    'kuh010': {
+        'name': 'Kuhn Communications, Inc.'
+    },
+    'lak130': {
+        'name': 'Lakeland Communications'
+    },
+    'lan010': {
+        'name': 'Langco'
+    },
+    'lau020': {
+        'name': 'Laurel Highland Total Communications, Inc.'
+    },
+    'leh010': {
+        'name': 'Lehigh Valley Cooperative Telephone'
+    },
+    'bra010': {
+        'name': 'Limestone Cable/Bracken Cable'
+    },
+    'loc020': {
+        'name': 'LISCO'
+    },
+    'lit020': {
+        'name': 'Litestream'
+    },
+    'tel140': {
+        'name': 'LivCom'
+    },
+    'loc010': {
+        'name': 'LocalTel Communications'
+    },
+    'weh010-longview': {
+        'name': 'Longview - Kilgore Cable TV'
+    },
+    'lon030': {
+        'name': 'Lonsdale Video Ventures, LLC'
+    },
+    'lns010': {
+        'name': 'Lost Nation-Elwood Telephone Co.'
+    },
+    'nttclpc010': {
+        'name': 'LPC Connect'
+    },
+    'lumos': {
+        'name': 'Lumos Networks'
+    },
+    'madison': {
+        'name': 'Madison Communications'
+    },
+    'mad030': {
+        'name': 'Madison County Cable Inc.'
+    },
+    'nttcmah010': {
+        'name': 'Mahaska Communication Group'
+    },
+    'mar010': {
+        'name': 'Marne & Elk Horn Telephone Company'
+    },
+    'mcc040': {
+        'name': 'McClure Telephone Co.'
+    },
+    'mctv': {
+        'name': 'MCTV'
+    },
+    'merrimac': {
+        'name': 'Merrimac Communications Ltd.'
+    },
+    'metronet': {
+        'name': 'Metronet'
+    },
+    'mhtc': {
+        'name': 'MHTC'
+    },
+    'midhudson': {
+        'name': 'Mid-Hudson Cable'
+    },
+    'midrivers': {
+        'name': 'Mid-Rivers Communications'
+    },
+    'mid045': {
+        'name': 'Midstate Communications'
+    },
+    'mil080': {
+        'name': 'Milford Communications'
+    },
+    'min030': {
+        'name': 'MINET'
+    },
+    'nttcmin010': {
+        'name': 'Minford TV'
+    },
+    'san040-02': {
+        'name': 'Mitchell Telecom'
+    },
+    'mlg010': {
+        'name': 'MLGC'
+    },
+    'mon060': {
+        'name': 'Mon-Cre TVE'
+    },
+    'mou110': {
+        'name': 'Mountain Telephone'
+    },
+    'mou050': {
+        'name': 'Mountain Village Cable'
+    },
+    'mtacomm': {
+        'name': 'MTA Communications, LLC'
+    },
+    'mtc010': {
+        'name': 'MTC Cable'
+    },
+    'med040': {
+        'name': 'MTC Technologies'
+    },
+    'man060': {
+        'name': 'MTCC'
+    },
+    'mtc030': {
+        'name': 'MTCO Communications'
+    },
+    'mul050': {
+        'name': 'Mulberry Telecommunications'
+    },
+    'mur010': {
+        'name': 'Murray Electric System'
+    },
+    'musfiber': {
+        'name': 'MUS FiberNET'
+    },
+    'mpw': {
+        'name': 'Muscatine Power & Water'
+    },
+    'nttcsli010': {
+        'name': 'myEVTV.com'
+    },
+    'nor115': {
+        'name': 'NCC'
+    },
+    'nor260': {
+        'name': 'NDTC'
+    },
+    'nctc': {
+        'name': 'Nebraska Central Telecom, Inc.'
+    },
+    'nel020': {
+        'name': 'Nelsonville TV Cable'
+    },
+    'nem010': {
+        'name': 'Nemont'
+    },
+    'new075': {
+        'name': 'New Hope Telephone Cooperative'
+    },
+    'nor240': {
+        'name': 'NICP'
+    },
+    'cic010': {
+        'name': 'NineStar Connect'
+    },
+    'nktelco': {
+        'name': 'NKTelco'
+    },
+    'nortex': {
+        'name': 'Nortex Communications'
+    },
+    'nor140': {
+        'name': 'North Central Telephone Cooperative'
+    },
+    'nor030': {
+        'name': 'Northland Communications'
+    },
+    'nor075': {
+        'name': 'Northwest Communications'
+    },
+    'nor125': {
+        'name': 'Norwood Light Broadband'
+    },
+    'net010': {
+        'name': 'Nsight Telservices'
+    },
+    'dur010': {
+        'name': 'Ntec'
+    },
+    'nts010': {
+        'name': 'NTS Communications'
+    },
+    'new045': {
+        'name': 'NU-Telecom'
+    },
+    'nulink': {
+        'name': 'NuLink'
+    },
+    'jam030': {
+        'name': 'NVC'
+    },
+    'far035': {
+        'name': 'OmniTel Communications'
+    },
+    'onesource': {
+        'name': 'OneSource Communications'
+    },
+    'cit230': {
+        'name': 'Opelika Power Services'
+    },
+    'daltonutilities': {
+        'name': 'OptiLink'
+    },
+    'mid140': {
+        'name': 'OPTURA'
+    },
+    'ote010': {
+        'name': 'OTEC Communication Company'
+    },
+    'cci020': {
+        'name': 'Packerland Broadband'
+    },
+    'pan010': {
+        'name': 'Panora Telco/Guthrie Center Communications'
+    },
+    'otter': {
+        'name': 'Park Region Telephone & Otter Tail Telcom'
+    },
+    'mid050': {
+        'name': 'Partner Communications Cooperative'
+    },
+    'fib010': {
+        'name': 'Pathway'
+    },
+    'paulbunyan': {
+        'name': 'Paul Bunyan Communications'
+    },
+    'pem020': {
+        'name': 'Pembroke Telephone Company'
+    },
+    'mck010': {
+        'name': 'Peoples Rural Telephone Cooperative'
+    },
+    'pul010': {
+        'name': 'PES Energize'
+    },
+    'phi010': {
+        'name': 'Philippi Communications System'
+    },
+    'phonoscope': {
+        'name': 'Phonoscope Cable'
+    },
+    'pin070': {
+        'name': 'Pine Belt Communications, Inc.'
+    },
+    'weh010-pine': {
+        'name': 'Pine Bluff Cable TV'
+    },
+    'pin060': {
+        'name': 'Pineland Telephone Cooperative'
+    },
+    'cam010': {
+        'name': 'Pinpoint Communications'
+    },
+    'pio060': {
+        'name': 'Pioneer Broadband'
+    },
+    'pioncomm': {
+        'name': 'Pioneer Communications'
+    },
+    'pioneer': {
+        'name': 'Pioneer DTV'
+    },
+    'pla020': {
+        'name': 'Plant TiftNet, Inc.'
+    },
+    'par010': {
+        'name': 'PLWC'
+    },
+    'pro035': {
+        'name': 'PMT'
+    },
+    'vik011': {
+        'name': 'Polar Cablevision'
+    },
+    'pottawatomie': {
+        'name': 'Pottawatomie Telephone Co.'
+    },
+    'premiercomm': {
+        'name': 'Premier Communications'
+    },
+    'psc010': {
+        'name': 'PSC'
+    },
+    'pan020': {
+        'name': 'PTCI'
+    },
+    'qco010': {
+        'name': 'QCOL'
+    },
+    'qua010': {
+        'name': 'Quality Cablevision'
+    },
+    'rad010': {
+        'name': 'Radcliffe Telephone Company'
+    },
+    'car040': {
+        'name': 'Rainbow Communications'
+    },
+    'rai030': {
+        'name': 'Rainier Connect'
+    },
+    'ral010': {
+        'name': 'Ralls Technologies'
+    },
+    'rct010': {
+        'name': 'RC Technologies'
+    },
+    'red040': {
+        'name': 'Red River Communications'
+    },
+    'ree010': {
+        'name': 'Reedsburg Utility Commission'
+    },
+    'mol010': {
+        'name': 'Reliance Connects- Oregon'
+    },
+    'res020': {
+        'name': 'Reserve Telecommunications'
+    },
+    'weh010-resort': {
+        'name': 'Resort TV Cable'
+    },
+    'rld010': {
+        'name': 'Richland Grant Telephone Cooperative, Inc.'
+    },
+    'riv030': {
+        'name': 'River Valley Telecommunications Coop'
+    },
+    'rockportcable': {
+        'name': 'Rock Port Cablevision'
+    },
+    'rsf010': {
+        'name': 'RS Fiber'
+    },
+    'rtc': {
+        'name': 'RTC Communication Corp'
+    },
+    'res040': {
+        'name': 'RTC-Reservation Telephone Coop.'
+    },
+    'rte010': {
+        'name': 'RTEC Communications'
+    },
+    'stc010': {
+        'name': 'S&T'
+    },
+    'san020': {
+        'name': 'San Bruno Cable TV'
+    },
+    'san040-01': {
+        'name': 'Santel'
+    },
+    'sav010': {
+        'name': 'SCI Broadband-Savage Communications Inc.'
+    },
+    'sco050': {
+        'name': 'Scottsboro Electric Power Board'
+    },
+    'scr010': {
+        'name': 'Scranton Telephone Company'
+    },
+    'selco': {
+        'name': 'SELCO'
+    },
+    'she010': {
+        'name': 'Shentel'
+    },
+    'she030': {
+        'name': 'Sherwood Mutual Telephone Association, Inc.'
+    },
+    'ind060-ssc': {
+        'name': 'Silver Star Communications'
+    },
+    'sjoberg': {
+        'name': 'Sjoberg\'s Inc.'
+    },
+    'sou025': {
+        'name': 'SKT'
+    },
+    'sky050': {
+        'name': 'SkyBest TV'
+    },
+    'nttcsmi010': {
+        'name': 'Smithville Communications'
+    },
+    'woo010': {
+        'name': 'Solarus'
+    },
+    'sou075': {
+        'name': 'South Central Rural Telephone Cooperative'
+    },
+    'sou065': {
+        'name': 'South Holt Cablevision, Inc.'
+    },
+    'sou035': {
+        'name': 'South Slope Cooperative Communications'
+    },
+    'spa020': {
+        'name': 'Spanish Fork Community Network'
+    },
+    'spe010': {
+        'name': 'Spencer Municipal Utilities'
+    },
+    'spi005': {
+        'name': 'Spillway Communications, Inc.'
+    },
+    'srt010': {
+        'name': 'SRT'
+    },
+    'cccsmc010': {
+        'name': 'St. Maarten Cable TV'
+    },
+    'sta025': {
+        'name': 'Star Communications'
+    },
+    'sco020': {
+        'name': 'STE'
+    },
+    'uin010': {
+        'name': 'STRATA Networks'
+    },
+    'sum010': {
+        'name': 'Sumner Cable TV'
+    },
+    'pie010': {
+        'name': 'Surry TV/PCSI TV'
+    },
+    'swa010': {
+        'name': 'Swayzee Communications'
+    },
+    'sweetwater': {
+        'name': 'Sweetwater Cable Television Co'
+    },
+    'weh010-talequah': {
+        'name': 'Tahlequah Cable TV'
+    },
+    'tct': {
+        'name': 'TCT'
+    },
+    'tel050': {
+        'name': 'Tele-Media Company'
+    },
+    'com050': {
+        'name': 'The Community Agency'
+    },
+    'thr020': {
+        'name': 'Three River'
+    },
+    'cab140': {
+        'name': 'Town & Country Technologies'
+    },
+    'tra010': {
+        'name': 'Trans-Video'
+    },
+    'tre010': {
+        'name': 'Trenton TV Cable Company'
+    },
+    'tcc': {
+        'name': 'Tri County Communications Cooperative'
+    },
+    'tri025': {
+        'name': 'TriCounty Telecom'
+    },
+    'tri110': {
+        'name': 'TrioTel Communications, Inc.'
+    },
+    'tro010': {
+        'name': 'Troy Cablevision, Inc.'
+    },
+    'tsc': {
+        'name': 'TSC'
+    },
+    'cit220': {
+        'name': 'Tullahoma Utilities Board'
+    },
+    'tvc030': {
+        'name': 'TV Cable of Rensselaer'
+    },
+    'tvc015': {
+        'name': 'TVC Cable'
+    },
+    'cab180': {
+        'name': 'TVision'
+    },
+    'twi040': {
+        'name': 'Twin Lakes'
+    },
+    'tvtinc': {
+        'name': 'Twin Valley'
+    },
+    'uis010': {
+        'name': 'Union Telephone Company'
+    },
+    'uni110': {
+        'name': 'United Communications - TN'
+    },
+    'uni120': {
+        'name': 'United Services'
+    },
+    'uss020': {
+        'name': 'US Sonet'
+    },
+    'cab060': {
+        'name': 'USA Communications'
+    },
+    'she005': {
+        'name': 'USA Communications/Shellsburg, IA'
+    },
+    'val040': {
+        'name': 'Valley TeleCom Group'
+    },
+    'val025': {
+        'name': 'Valley Telecommunications'
+    },
+    'val030': {
+        'name': 'Valparaiso Broadband'
+    },
+    'cla050': {
+        'name': 'Vast Broadband'
+    },
+    'sul015': {
+        'name': 'Venture Communications Cooperative, Inc.'
+    },
+    'ver025': {
+        'name': 'Vernon Communications Co-op'
+    },
+    'weh010-vicksburg': {
+        'name': 'Vicksburg Video'
+    },
+    'vis070': {
+        'name': 'Vision Communications'
+    },
+    'volcanotel': {
+        'name': 'Volcano Vision, Inc.'
+    },
+    'vol040-02': {
+        'name': 'VolFirst / BLTV'
+    },
+    'ver070': {
+        'name': 'VTel'
+    },
+    'nttcvtx010': {
+        'name': 'VTX1'
+    },
+    'bci010-02': {
+        'name': 'Vyve Broadband'
+    },
+    'wab020': {
+        'name': 'Wabash Mutual Telephone'
+    },
+    'waitsfield': {
+        'name': 'Waitsfield Cable'
+    },
+    'wal010': {
+        'name': 'Walnut Communications'
+    },
+    'wavebroadband': {
+        'name': 'Wave'
+    },
+    'wav030': {
+        'name': 'Waverly Communications Utility'
+    },
+    'wbi010': {
+        'name': 'WBI'
+    },
+    'web020': {
+        'name': 'Webster-Calhoun Cooperative Telephone Association'
+    },
+    'wes005': {
+        'name': 'West Alabama TV Cable'
+    },
+    'carolinata': {
+        'name': 'West Carolina Communications'
+    },
+    'wct010': {
+        'name': 'West Central Telephone Association'
+    },
+    'wes110': {
+        'name': 'West River Cooperative Telephone Company'
+    },
+    'ani030': {
+        'name': 'WesTel Systems'
+    },
+    'westianet': {
+        'name': 'Western Iowa Networks'
+    },
+    'nttcwhi010': {
+        'name': 'Whidbey Telecom'
+    },
+    'weh010-white': {
+        'name': 'White County Cable TV'
+    },
+    'wes130': {
+        'name': 'Wiatel'
+    },
+    'wik010': {
+        'name': 'Wiktel'
+    },
+    'wil070': {
+        'name': 'Wilkes Communications, Inc./RiverStreet Networks'
+    },
+    'wil015': {
+        'name': 'Wilson Communications'
+    },
+    'win010': {
+        'name': 'Windomnet/SMBS'
+    },
+    'win090': {
+        'name': 'Windstream Cable TV'
+    },
+    'wcta': {
+        'name': 'Winnebago Cooperative Telecom Association'
+    },
+    'wtc010': {
+        'name': 'WTC'
+    },
+    'wil040': {
+        'name': 'WTC Communications, Inc.'
+    },
+    'wya010': {
+        'name': 'Wyandotte Cable'
+    },
+    'hin020-02': {
+        'name': 'X-Stream Services'
+    },
+    'xit010': {
+        'name': 'XIT Communications'
+    },
+    'yel010': {
+        'name': 'Yelcot Communications'
+    },
+    'mid180-01': {
+        'name': 'yondoo'
+    },
+    'cou060': {
+        'name': 'Zito Media'
+    },
+}
+
+
+class AdobePassIE(InfoExtractor):
+    _SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s'
+    _USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0'
+    _MVPD_CACHE = 'ap-mvpd'
+
+    @staticmethod
+    def _get_mvpd_resource(provider_id, title, guid, rating):
+        channel = etree.Element('channel')
+        channel_title = etree.SubElement(channel, 'title')
+        channel_title.text = provider_id
+        item = etree.SubElement(channel, 'item')
+        resource_title = etree.SubElement(item, 'title')
+        resource_title.text = title
+        resource_guid = etree.SubElement(item, 'guid')
+        resource_guid.text = guid
+        resource_rating = etree.SubElement(item, 'media:rating')
+        resource_rating.attrib = {'scheme': 'urn:v-chip'}
+        resource_rating.text = rating
+        return '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">' + etree.tostring(channel).decode() + '</rss>'
+
+    def _extract_mvpd_auth(self, url, video_id, requestor_id, resource):
+        def xml_text(xml_str, tag):
+            return self._search_regex(
+                '<%s>(.+?)</%s>' % (tag, tag), xml_str, tag)
+
+        def is_expired(token, date_ele):
+            token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(token, date_ele)))
+            return token_expires and token_expires <= int(time.time())
+
+        def post_form(form_page_res, note, data={}):
+            form_page, urlh = form_page_res
+            post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url')
+            if not re.match(r'https?://', post_url):
+                post_url = compat_urlparse.urljoin(urlh.geturl(), post_url)
+            form_data = self._hidden_inputs(form_page)
+            form_data.update(data)
+            return self._download_webpage_handle(
+                post_url, video_id, note, data=urlencode_postdata(form_data), headers={
+                    'Content-Type': 'application/x-www-form-urlencoded',
+                })
+
+        def raise_mvpd_required():
+            raise ExtractorError(
+                'This video is only available for users of participating TV providers. '
+                'Use --ap-mso to specify Adobe Pass Multiple-system operator Identifier '
+                'and --ap-username and --ap-password or --netrc to provide account credentials.', expected=True)
+
+        mvpd_headers = {
+            'ap_42': 'anonymous',
+            'ap_11': 'Linux i686',
+            'ap_z': self._USER_AGENT,
+            'User-Agent': self._USER_AGENT,
+        }
+
+        guid = xml_text(resource, 'guid') if '<' in resource else resource
+        count = 0
+        while count < 2:
+            requestor_info = self._downloader.cache.load(self._MVPD_CACHE, requestor_id) or {}
+            authn_token = requestor_info.get('authn_token')
+            if authn_token and is_expired(authn_token, 'simpleTokenExpires'):
+                authn_token = None
+            if not authn_token:
+                # TODO add support for other TV Providers
+                mso_id = self._downloader.params.get('ap_mso')
+                if not mso_id:
+                    raise_mvpd_required()
+                username, password = self._get_login_info('ap_username', 'ap_password', mso_id)
+                if not username or not password:
+                    raise_mvpd_required()
+                mso_info = MSO_INFO[mso_id]
+
+                provider_redirect_page_res = self._download_webpage_handle(
+                    self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
+                    'Downloading Provider Redirect Page', query={
+                        'noflash': 'true',
+                        'mso_id': mso_id,
+                        'requestor_id': requestor_id,
+                        'no_iframe': 'false',
+                        'domain_name': 'adobe.com',
+                        'redirect_url': url,
+                    })
+
+                if mso_id == 'Comcast_SSO':
+                    # Comcast page flow varies by video site and whether you
+                    # are on Comcast's network.
+                    provider_redirect_page, urlh = provider_redirect_page_res
+                    # Check for Comcast auto login
+                    if 'automatically signing you in' in provider_redirect_page:
+                        oauth_redirect_url = self._html_search_regex(
+                            r'window\.location\s*=\s*[\'"]([^\'"]+)',
+                            provider_redirect_page, 'oauth redirect')
+                        # Just need to process the request. No useful data comes back
+                        self._download_webpage(
+                            oauth_redirect_url, video_id, 'Confirming auto login')
+                    else:
+                        if '<form name="signin"' in provider_redirect_page:
+                            # already have the form, just fill it
+                            provider_login_page_res = provider_redirect_page_res
+                        elif 'http-equiv="refresh"' in provider_redirect_page:
+                            # redirects to the login page
+                            oauth_redirect_url = self._html_search_regex(
+                                r'content="0;\s*url=([^\'"]+)',
+                                provider_redirect_page, 'meta refresh redirect')
+                            provider_login_page_res = self._download_webpage_handle(
+                                oauth_redirect_url,
+                                video_id, 'Downloading Provider Login Page')
+                        else:
+                            provider_login_page_res = post_form(
+                                provider_redirect_page_res, 'Downloading Provider Login Page')
+
+                        mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
+                            mso_info.get('username_field', 'username'): username,
+                            mso_info.get('password_field', 'password'): password,
+                        })
+                        mvpd_confirm_page, urlh = mvpd_confirm_page_res
+                        if '<button class="submit" value="Resume">Resume</button>' in mvpd_confirm_page:
+                            post_form(mvpd_confirm_page_res, 'Confirming Login')
+
+                else:
+                    # Normal, non-Comcast flow
+                    provider_login_page_res = post_form(
+                        provider_redirect_page_res, 'Downloading Provider Login Page')
+                    mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
+                        mso_info.get('username_field', 'username'): username,
+                        mso_info.get('password_field', 'password'): password,
+                    })
+                    if mso_id != 'Rogers':
+                        post_form(mvpd_confirm_page_res, 'Confirming Login')
+
+                session = self._download_webpage(
+                    self._SERVICE_PROVIDER_TEMPLATE % 'session', video_id,
+                    'Retrieving Session', data=urlencode_postdata({
+                        '_method': 'GET',
+                        'requestor_id': requestor_id,
+                    }), headers=mvpd_headers)
+                if '<pendingLogout' in session:
+                    self._downloader.cache.store(self._MVPD_CACHE, requestor_id, {})
+                    count += 1
+                    continue
+                authn_token = unescapeHTML(xml_text(session, 'authnToken'))
+                requestor_info['authn_token'] = authn_token
+                self._downloader.cache.store(self._MVPD_CACHE, requestor_id, requestor_info)
+
+            authz_token = requestor_info.get(guid)
+            if authz_token and is_expired(authz_token, 'simpleTokenTTL'):
+                authz_token = None
+            if not authz_token:
+                authorize = self._download_webpage(
+                    self._SERVICE_PROVIDER_TEMPLATE % 'authorize', video_id,
+                    'Retrieving Authorization Token', data=urlencode_postdata({
+                        'resource_id': resource,
+                        'requestor_id': requestor_id,
+                        'authentication_token': authn_token,
+                        'mso_id': xml_text(authn_token, 'simpleTokenMsoID'),
+                        'userMeta': '1',
+                    }), headers=mvpd_headers)
+                if '<pendingLogout' in authorize:
+                    self._downloader.cache.store(self._MVPD_CACHE, requestor_id, {})
+                    count += 1
+                    continue
+                authz_token = unescapeHTML(xml_text(authorize, 'authzToken'))
+                requestor_info[guid] = authz_token
+                self._downloader.cache.store(self._MVPD_CACHE, requestor_id, requestor_info)
+
+            mvpd_headers.update({
+                'ap_19': xml_text(authn_token, 'simpleSamlNameID'),
+                'ap_23': xml_text(authn_token, 'simpleSamlSessionIndex'),
+            })
+
+            short_authorize = self._download_webpage(
+                self._SERVICE_PROVIDER_TEMPLATE % 'shortAuthorize',
+                video_id, 'Retrieving Media Token', data=urlencode_postdata({
+                    'authz_token': authz_token,
+                    'requestor_id': requestor_id,
+                    'session_guid': xml_text(authn_token, 'simpleTokenAuthenticationGuid'),
+                    'hashed_guid': 'false',
+                }), headers=mvpd_headers)
+            if '<pendingLogout' in short_authorize:
+                self._downloader.cache.store(self._MVPD_CACHE, requestor_id, {})
+                count += 1
+                continue
+            return short_authorize
diff --git a/youtube_dl/extractor/adobetv.py b/youtube_dl/extractor/adobetv.py

index 8753ee2cf2b5fdaa5810fc8d564f388734a84324..5ae16fa16809b557e74e133a4a7811d396b1c2c2 100644 (file)
--- a/youtube_dl/extractor/adobetv.py
+++ b/youtube_dl/extractor/adobetv.py
@@ -156,7 +156,10 @@ class AdobeTVVideoIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        video_data = self._download_json(url + '?format=json', video_id)
+        webpage = self._download_webpage(url, video_id)
+
+        video_data = self._parse_json(self._search_regex(
+            r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
  
          formats = [{
              'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),
diff --git a/youtube_dl/extractor/adultswim.py b/youtube_dl/extractor/adultswim.py

index 8157da2cb63af8a7079fda8c388be3108281a7ad..989505c8232abf53f99d0af594c84e45f8778eb0 100644 (file)
--- a/youtube_dl/extractor/adultswim.py
+++ b/youtube_dl/extractor/adultswim.py
@@ -3,16 +3,14 @@ from __future__ import unicode_literals
  
  import re
  
-from .common import InfoExtractor
+from .turner import TurnerBaseIE
  from ..utils import (
-    determine_ext,
      ExtractorError,
-    float_or_none,
-    xpath_text,
+    int_or_none,
  )
  
  
-class AdultSwimIE(InfoExtractor):
+class AdultSwimIE(TurnerBaseIE):
      _VALID_URL = r'https?://(?:www\.)?adultswim\.com/videos/(?P<is_playlist>playlists/)?(?P<show_path>[^/]+)/(?P<episode_path>[^/?#]+)/?'
  
      _TESTS = [{
@@ -83,6 +81,42 @@ class AdultSwimIE(InfoExtractor):
              # m3u8 download
              'skip_download': True,
          }
+    }, {
+        # heroMetadata.trailer
+        'url': 'http://www.adultswim.com/videos/decker/inside-decker-a-new-hero/',
+        'info_dict': {
+            'id': 'I0LQFQkaSUaFp8PnAWHhoQ',
+            'ext': 'mp4',
+            'title': 'Decker - Inside Decker: A New Hero',
+            'description': 'md5:c916df071d425d62d70c86d4399d3ee0',
+            'duration': 249.008,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'expected_warnings': ['Unable to download f4m manifest'],
+    }, {
+        'url': 'http://www.adultswim.com/videos/toonami/friday-october-14th-2016/',
+        'info_dict': {
+            'id': 'eYiLsKVgQ6qTC6agD67Sig',
+            'title': 'Toonami - Friday, October 14th, 2016',
+            'description': 'md5:99892c96ffc85e159a428de85c30acde',
+        },
+        'playlist': [{
+            'md5': '',
+            'info_dict': {
+                'id': 'eYiLsKVgQ6qTC6agD67Sig',
+                'ext': 'mp4',
+                'title': 'Toonami - Friday, October 14th, 2016',
+                'description': 'md5:99892c96ffc85e159a428de85c30acde',
+            },
+        }],
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'expected_warnings': ['Unable to download f4m manifest'],
      }]
  
      @staticmethod
@@ -133,79 +167,58 @@ class AdultSwimIE(InfoExtractor):
              if video_info is None:
                  if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path:
                      video_info = bootstrapped_data['slugged_video']
-                else:
-                    raise ExtractorError('Unable to find video info')
+            if not video_info:
+                video_info = bootstrapped_data.get(
+                    'heroMetadata', {}).get('trailer', {}).get('video')
+            if not video_info:
+                video_info = bootstrapped_data.get('onlineOriginals', [None])[0]
+            if not video_info:
+                raise ExtractorError('Unable to find video info')
  
              show = bootstrapped_data['show']
              show_title = show['title']
              stream = video_info.get('stream')
-            clips = [stream] if stream else video_info.get('clips')
-            if not clips:
-                raise ExtractorError(
-                    'This video is only available via cable service provider subscription that'
-                    ' is not currently supported. You may want to use --cookies.'
-                    if video_info.get('auth') is True else 'Unable to find stream or clips',
-                    expected=True)
-            segment_ids = [clip['videoPlaybackID'] for clip in clips]
+            if stream and stream.get('videoPlaybackID'):
+                segment_ids = [stream['videoPlaybackID']]
+            elif video_info.get('clips'):
+                segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']]
+            elif video_info.get('videoPlaybackID'):
+                segment_ids = [video_info['videoPlaybackID']]
+            elif video_info.get('id'):
+                segment_ids = [video_info['id']]
+            else:
+                if video_info.get('auth') is True:
+                    raise ExtractorError(
+                        'This video is only available via cable service provider subscription that'
+                        ' is not currently supported. You may want to use --cookies.', expected=True)
+                else:
+                    raise ExtractorError('Unable to find stream or clips')
  
          episode_id = video_info['id']
          episode_title = video_info['title']
-        episode_description = video_info['description']
-        episode_duration = video_info.get('duration')
+        episode_description = video_info.get('description')
+        episode_duration = int_or_none(video_info.get('duration'))
+        view_count = int_or_none(video_info.get('views'))
  
          entries = []
          for part_num, segment_id in enumerate(segment_ids):
-            segment_url = 'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=desktop' % segment_id
-
+            segement_info = self._extract_cvp_info(
+                'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=desktop' % segment_id,
+                segment_id, {
+                    'secure': {
+                        'media_src': 'http://androidhls-secure.cdn.turner.com/adultswim/big',
+                        'tokenizer_src': 'http://www.adultswim.com/astv/mvpd/processors/services/token_ipadAdobe.do',
+                    },
+                })
              segment_title = '%s - %s' % (show_title, episode_title)
              if len(segment_ids) > 1:
                  segment_title += ' Part %d' % (part_num + 1)
-
-            idoc = self._download_xml(
-                segment_url, segment_title,
-                'Downloading segment information', 'Unable to download segment information')
-
-            segment_duration = float_or_none(
-                xpath_text(idoc, './/trt', 'segment duration').strip())
-
-            formats = []
-            file_els = idoc.findall('.//files/file') or idoc.findall('./files/file')
-
-            unique_urls = []
-            unique_file_els = []
-            for file_el in file_els:
-                media_url = file_el.text
-                if not media_url or determine_ext(media_url) == 'f4m':
-                    continue
-                if file_el.text not in unique_urls:
-                    unique_urls.append(file_el.text)
-                    unique_file_els.append(file_el)
-
-            for file_el in unique_file_els:
-                bitrate = file_el.attrib.get('bitrate')
-                ftype = file_el.attrib.get('type')
-                media_url = file_el.text
-                if determine_ext(media_url) == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        media_url, segment_title, 'mp4', preference=0,
-                        m3u8_id='hls', fatal=False))
-                else:
-                    formats.append({
-                        'format_id': '%s_%s' % (bitrate, ftype),
-                        'url': file_el.text.strip(),
-                        # The bitrate may not be a number (for example: 'iphone')
-                        'tbr': int(bitrate) if bitrate.isdigit() else None,
-                    })
-
-            self._sort_formats(formats)
-
-            entries.append({
+            segement_info.update({
                  'id': segment_id,
                  'title': segment_title,
-                'formats': formats,
-                'duration': segment_duration,
-                'description': episode_description
+                'description': episode_description,
              })
+            entries.append(segement_info)
  
          return {
              '_type': 'playlist',
@@ -214,5 +227,6 @@ class AdultSwimIE(InfoExtractor):
              'entries': entries,
              'title': '%s - %s' % (show_title, episode_title),
              'description': episode_description,
-            'duration': episode_duration
+            'duration': episode_duration,
+            'view_count': view_count,
          }
diff --git a/youtube_dl/extractor/aenetworks.py b/youtube_dl/extractor/aenetworks.py

index 6018ae79a2a114451c79edc34af524898462004d..6adb6d824c00ec733afaf1bbe1b243f7d623b647 100644 (file)
--- a/youtube_dl/extractor/aenetworks.py
+++ b/youtube_dl/extractor/aenetworks.py
@@ -1,66 +1,208 @@
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..utils import smuggle_url
+import re
  
+from .theplatform import ThePlatformIE
+from ..utils import (
+    smuggle_url,
+    update_url_query,
+    unescapeHTML,
+    extract_attributes,
+    get_element_by_attribute,
+)
+from ..compat import (
+    compat_urlparse,
+)
  
-class AENetworksIE(InfoExtractor):
+
+class AENetworksBaseIE(ThePlatformIE):
+    _THEPLATFORM_KEY = 'crazyjava'
+    _THEPLATFORM_SECRET = 's3cr3t'
+
+
+class AENetworksIE(AENetworksBaseIE):
      IE_NAME = 'aenetworks'
      IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
-    _VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
-
+    _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
      _TESTS = [{
-        'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
+        'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
+        'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
          'info_dict': {
-            'id': 'g12m5Gyt3fdR',
+            'id': '22253814',
              'ext': 'mp4',
-            'title': "Bet You Didn't Know: Valentine's Day",
-            'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
+            'title': 'Winter Is Coming',
+            'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
+            'timestamp': 1338306241,
+            'upload_date': '20120529',
+            'uploader': 'AENE-NEW',
          },
          'add_ie': ['ThePlatform'],
-        'expected_warnings': ['JSON-LD'],
      }, {
-        'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
+        'url': 'http://www.history.com/shows/ancient-aliens/season-1',
          'info_dict': {
-            'id': 'eg47EERs_JsZ',
-            'ext': 'mp4',
-            'title': 'Winter Is Coming',
-            'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
+            'id': '71889446852',
          },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
+        'playlist_mincount': 5,
+    }, {
+        'url': 'http://www.mylifetime.com/shows/atlanta-plastic',
+        'info_dict': {
+            'id': 'SERIES4317',
+            'title': 'Atlanta Plastic',
          },
-        'add_ie': ['ThePlatform'],
+        'playlist_mincount': 2,
      }, {
-        'url': 'http://www.aetv.com/shows/duck-dynasty/video/inlawful-entry',
+        'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
          'only_matching': True
      }, {
-        'url': 'http://www.fyi.tv/shows/tiny-house-nation/videos/207-sq-ft-minnesota-prairie-cottage',
+        'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
          'only_matching': True
      }, {
-        'url': 'http://www.mylifetime.com/shows/project-runway-junior/video/season-1/episode-6/superstar-clients',
+        'url': 'http://www.mylifetime.com/shows/project-runway-junior/season-1/episode-6',
+        'only_matching': True
+    }, {
+        'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
          'only_matching': True
      }]
+    _DOMAIN_TO_REQUESTOR_ID = {
+        'history.com': 'HISTORY',
+        'aetv.com': 'AETV',
+        'mylifetime.com': 'LIFETIME',
+        'fyi.tv': 'FYI',
+    }
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
+        domain, show_path, movie_display_id = re.match(self._VALID_URL, url).groups()
+        display_id = show_path or movie_display_id
+        webpage = self._download_webpage(url, display_id)
+        if show_path:
+            url_parts = show_path.split('/')
+            url_parts_len = len(url_parts)
+            if url_parts_len == 1:
+                entries = []
+                for season_url_path in re.findall(r'(?s)<li[^>]+data-href="(/shows/%s/season-\d+)"' % url_parts[0], webpage):
+                    entries.append(self.url_result(
+                        compat_urlparse.urljoin(url, season_url_path), 'AENetworks'))
+                return self.playlist_result(
+                    entries, self._html_search_meta('aetn:SeriesId', webpage),
+                    self._html_search_meta('aetn:SeriesTitle', webpage))
+            elif url_parts_len == 2:
+                entries = []
+                for episode_item in re.findall(r'(?s)<div[^>]+class="[^"]*episode-item[^"]*"[^>]*>', webpage):
+                    episode_attributes = extract_attributes(episode_item)
+                    episode_url = compat_urlparse.urljoin(
+                        url, episode_attributes['data-canonical'])
+                    entries.append(self.url_result(
+                        episode_url, 'AENetworks',
+                        episode_attributes['data-videoid']))
+                return self.playlist_result(
+                    entries, self._html_search_meta('aetn:SeasonId', webpage))
  
-        video_url_re = [
-            r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
-            r"media_url\s*=\s*'([^']+)'"
-        ]
-        video_url = self._search_regex(video_url_re, webpage, 'video url')
-
-        info = self._search_json_ld(webpage, video_id, fatal=False)
+        query = {
+            'mbr': 'true',
+            'assetTypes': 'medium_video_s3'
+        }
+        video_id = self._html_search_meta('aetn:VideoID', webpage)
+        media_url = self._search_regex(
+            r"media_url\s*=\s*'([^']+)'", webpage, 'video url')
+        theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
+            r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
+        info = self._parse_theplatform_metadata(theplatform_metadata)
+        if theplatform_metadata.get('AETN$isBehindWall'):
+            requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
+            resource = self._get_mvpd_resource(
+                requestor_id, theplatform_metadata['title'],
+                theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
+                theplatform_metadata['ratings'][0]['rating'])
+            query['auth'] = self._extract_mvpd_auth(
+                url, video_id, requestor_id, resource)
+        info.update(self._search_json_ld(webpage, video_id, fatal=False))
+        media_url = update_url_query(media_url, query)
+        media_url = self._sign_url(media_url, self._THEPLATFORM_KEY, self._THEPLATFORM_SECRET)
+        formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
+        self._sort_formats(formats)
          info.update({
-            '_type': 'url_transparent',
-            'url': smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}),
+            'id': video_id,
+            'formats': formats,
+            'subtitles': subtitles,
          })
          return info
+
+
+class HistoryTopicIE(AENetworksBaseIE):
+    IE_NAME = 'history:topic'
+    IE_DESC = 'History.com Topic'
+    _VALID_URL = r'https?://(?:www\.)?history\.com/topics/(?:[^/]+/)?(?P<topic_id>[^/]+)(?:/[^/]+(?:/(?P<video_display_id>[^/?#]+))?)?'
+    _TESTS = [{
+        'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
+        'info_dict': {
+            'id': '40700995724',
+            'ext': 'mp4',
+            'title': "Bet You Didn't Know: Valentine's Day",
+            'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
+            'timestamp': 1375819729,
+            'upload_date': '20130806',
+            'uploader': 'AENE-NEW',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'add_ie': ['ThePlatform'],
+    }, {
+        'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/videos',
+        'info_dict':
+        {
+            'id': 'world-war-i-history',
+            'title': 'World War I History',
+        },
+        'playlist_mincount': 24,
+    }, {
+        'url': 'http://www.history.com/topics/world-war-i-history/videos',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.history.com/topics/world-war-i/world-war-i-history',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/speeches',
+        'only_matching': True,
+    }]
+
+    def theplatform_url_result(self, theplatform_url, video_id, query):
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': smuggle_url(
+                update_url_query(theplatform_url, query),
+                {
+                    'sig': {
+                        'key': self._THEPLATFORM_KEY,
+                        'secret': self._THEPLATFORM_SECRET,
+                    },
+                    'force_smil_url': True
+                }),
+            'ie_key': 'ThePlatform',
+        }
+
+    def _real_extract(self, url):
+        topic_id, video_display_id = re.match(self._VALID_URL, url).groups()
+        if video_display_id:
+            webpage = self._download_webpage(url, video_display_id)
+            release_url, video_id = re.search(r"_videoPlayer.play\('([^']+)'\s*,\s*'[^']+'\s*,\s*'(\d+)'\)", webpage).groups()
+            release_url = unescapeHTML(release_url)
+
+            return self.theplatform_url_result(
+                release_url, video_id, {
+                    'mbr': 'true',
+                    'switch': 'hls'
+                })
+        else:
+            webpage = self._download_webpage(url, topic_id)
+            entries = []
+            for episode_item in re.findall(r'<a.+?data-release-url="[^"]+"[^>]*>', webpage):
+                video_attributes = extract_attributes(episode_item)
+                entries.append(self.theplatform_url_result(
+                    video_attributes['data-release-url'], video_attributes['data-id'], {
+                        'mbr': 'true',
+                        'switch': 'hls'
+                    }))
+            return self.playlist_result(entries, topic_id, get_element_by_attribute('class', 'show-title', webpage))
diff --git a/youtube_dl/extractor/afreecatv.py b/youtube_dl/extractor/afreecatv.py

new file mode 100644 (file)

index 0000000..75b3669
--- /dev/null
+++ b/youtube_dl/extractor/afreecatv.py
@@ -0,0 +1,145 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlparse,
+    compat_urlparse,
+)
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    update_url_query,
+    xpath_element,
+    xpath_text,
+)
+
+
+class AfreecaTVIE(InfoExtractor):
+    IE_DESC = 'afreecatv.com'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            (?:(?:live|afbbs|www)\.)?afreeca(?:tv)?\.com(?::\d+)?
+                            (?:
+                                /app/(?:index|read_ucc_bbs)\.cgi|
+                                /player/[Pp]layer\.(?:swf|html)
+                            )\?.*?\bnTitleNo=|
+                            vod\.afreecatv\.com/PLAYER/STATION/
+                        )
+                        (?P<id>\d+)
+                    '''
+    _TESTS = [{
+        'url': 'http://live.afreecatv.com:8079/app/index.cgi?szType=read_ucc_bbs&szBjId=dailyapril&nStationNo=16711924&nBbsNo=18605867&nTitleNo=36164052&szSkin=',
+        'md5': 'f72c89fe7ecc14c1b5ce506c4996046e',
+        'info_dict': {
+            'id': '36164052',
+            'ext': 'mp4',
+            'title': '데일리 에이프릴 요정들의 시상식!',
+            'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
+            'uploader': 'dailyapril',
+            'uploader_id': 'dailyapril',
+            'upload_date': '20160503',
+        }
+    }, {
+        'url': 'http://afbbs.afreecatv.com:8080/app/read_ucc_bbs.cgi?nStationNo=16711924&nTitleNo=36153164&szBjId=dailyapril&nBbsNo=18605867',
+        'info_dict': {
+            'id': '36153164',
+            'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
+            'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
+            'uploader': 'dailyapril',
+            'uploader_id': 'dailyapril',
+        },
+        'playlist_count': 2,
+        'playlist': [{
+            'md5': 'd8b7c174568da61d774ef0203159bf97',
+            'info_dict': {
+                'id': '36153164_1',
+                'ext': 'mp4',
+                'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
+                'upload_date': '20160502',
+            },
+        }, {
+            'md5': '58f2ce7f6044e34439ab2d50612ab02b',
+            'info_dict': {
+                'id': '36153164_2',
+                'ext': 'mp4',
+                'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
+                'upload_date': '20160502',
+            },
+        }],
+    }, {
+        'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652',
+        'only_matching': True,
+    }, {
+        'url': 'http://vod.afreecatv.com/PLAYER/STATION/15055030',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def parse_video_key(key):
+        video_key = {}
+        m = re.match(r'^(?P<upload_date>\d{8})_\w+_(?P<part>\d+)$', key)
+        if m:
+            video_key['upload_date'] = m.group('upload_date')
+            video_key['part'] = m.group('part')
+        return video_key
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        parsed_url = compat_urllib_parse_urlparse(url)
+        info_url = compat_urlparse.urlunparse(parsed_url._replace(
+            netloc='afbbs.afreecatv.com:8080',
+            path='/api/video/get_video_info.php'))
+
+        video_xml = self._download_xml(
+            update_url_query(info_url, {'nTitleNo': video_id}), video_id)
+
+        if xpath_element(video_xml, './track/video/file') is None:
+            raise ExtractorError('Specified AfreecaTV video does not exist',
+                                 expected=True)
+
+        title = xpath_text(video_xml, './track/title', 'title')
+        uploader = xpath_text(video_xml, './track/nickname', 'uploader')
+        uploader_id = xpath_text(video_xml, './track/bj_id', 'uploader id')
+        duration = int_or_none(xpath_text(video_xml, './track/duration',
+                                          'duration'))
+        thumbnail = xpath_text(video_xml, './track/titleImage', 'thumbnail')
+
+        entries = []
+        for i, video_file in enumerate(video_xml.findall('./track/video/file')):
+            video_key = self.parse_video_key(video_file.get('key', ''))
+            if not video_key:
+                continue
+            entries.append({
+                'id': '%s_%s' % (video_id, video_key.get('part', i + 1)),
+                'title': title,
+                'upload_date': video_key.get('upload_date'),
+                'duration': int_or_none(video_file.get('duration')),
+                'url': video_file.text,
+            })
+
+        info = {
+            'id': video_id,
+            'title': title,
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'duration': duration,
+            'thumbnail': thumbnail,
+        }
+
+        if len(entries) > 1:
+            info['_type'] = 'multi_video'
+            info['entries'] = entries
+        elif len(entries) == 1:
+            info['url'] = entries[0]['url']
+            info['upload_date'] = entries[0].get('upload_date')
+        else:
+            raise ExtractorError(
+                'No files found for the specified AfreecaTV video, either'
+                ' the URL is incorrect or the video has been made private.',
+                expected=True)
+
+        return info
diff --git a/youtube_dl/extractor/aftonbladet.py b/youtube_dl/extractor/aftonbladet.py

deleted file mode 100644 (file)

index d548592..0000000
--- a/youtube_dl/extractor/aftonbladet.py
+++ /dev/null
@@ -1,64 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import int_or_none
-
-
-class AftonbladetIE(InfoExtractor):
-    _VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
-    _TEST = {
-        'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
-        'info_dict': {
-            'id': '36015',
-            'ext': 'mp4',
-            'title': 'Vulkanutbrott i rymden - nu släpper NASA bilderna',
-            'description': 'Jupiters måne mest aktiv av alla himlakroppar',
-            'timestamp': 1394142732,
-            'upload_date': '20140306',
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        # find internal video meta data
-        meta_url = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json'
-        player_config = self._parse_json(self._html_search_regex(
-            r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
-        internal_meta_id = player_config['videoId']
-        internal_meta_url = meta_url % internal_meta_id
-        internal_meta_json = self._download_json(
-            internal_meta_url, video_id, 'Downloading video meta data')
-
-        # find internal video formats
-        format_url = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
-        internal_video_id = internal_meta_json['videoId']
-        internal_formats_url = format_url % internal_video_id
-        internal_formats_json = self._download_json(
-            internal_formats_url, video_id, 'Downloading video formats')
-
-        formats = []
-        for fmt in internal_formats_json['formats']['http']['pseudostreaming']['mp4']:
-            p = fmt['paths'][0]
-            formats.append({
-                'url': 'http://%s:%d/%s/%s' % (p['address'], p['port'], p['path'], p['filename']),
-                'ext': 'mp4',
-                'width': int_or_none(fmt.get('width')),
-                'height': int_or_none(fmt.get('height')),
-                'tbr': int_or_none(fmt.get('bitrate')),
-                'protocol': 'http',
-            })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': internal_meta_json['title'],
-            'formats': formats,
-            'thumbnail': internal_meta_json.get('imageUrl'),
-            'description': internal_meta_json.get('shortPreamble'),
-            'timestamp': int_or_none(internal_meta_json.get('timePublished')),
-            'duration': int_or_none(internal_meta_json.get('duration')),
-            'view_count': int_or_none(internal_meta_json.get('views')),
-        }
diff --git a/youtube_dl/extractor/aljazeera.py b/youtube_dl/extractor/aljazeera.py

index b081695d8400c0e24d36e84bd8445efa084ed8b3..388e578d569a27bdfd3a7d597d3ebcd5f31ccb94 100644 (file)
--- a/youtube_dl/extractor/aljazeera.py
+++ b/youtube_dl/extractor/aljazeera.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class AlJazeeraIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
+    _VALID_URL = r'https?://(?:www\.)?aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
  
      _TEST = {
          'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',
diff --git a/youtube_dl/extractor/allocine.py b/youtube_dl/extractor/allocine.py

index 190bc2cc8730853a23b9025f1849bf234a32e001..517b06def4d2ff690628eece4b1e85e647aea267 100644 (file)
--- a/youtube_dl/extractor/allocine.py
+++ b/youtube_dl/extractor/allocine.py
@@ -1,29 +1,26 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
-import re
-import json
-
  from .common import InfoExtractor
-from ..compat import compat_str
  from ..utils import (
+    remove_end,
      qualities,
-    unescapeHTML,
-    xpath_element,
+    url_basename,
  )
  
  
  class AllocineIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
+    _VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?:article|video|film)/(?:fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
  
      _TESTS = [{
          'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html',
          'md5': '0c9fcf59a841f65635fa300ac43d8269',
          'info_dict': {
              'id': '19546517',
+            'display_id': '18635087',
              'ext': 'mp4',
              'title': 'Astérix - Le Domaine des Dieux Teaser VF',
-            'description': 'md5:abcd09ce503c6560512c14ebfdb720d2',
+            'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
              'thumbnail': 're:http://.*\.jpg',
          },
      }, {
@@ -31,64 +28,82 @@ class AllocineIE(InfoExtractor):
          'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0',
          'info_dict': {
              'id': '19540403',
+            'display_id': '19540403',
              'ext': 'mp4',
              'title': 'Planes 2 Bande-annonce VF',
              'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
              'thumbnail': 're:http://.*\.jpg',
          },
      }, {
-        'url': 'http://www.allocine.fr/film/fichefilm_gen_cfilm=181290.html',
+        'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html',
          'md5': '101250fb127ef9ca3d73186ff22a47ce',
          'info_dict': {
              'id': '19544709',
+            'display_id': '19544709',
              'ext': 'mp4',
              'title': 'Dragons 2 - Bande annonce finale VF',
-            'description': 'md5:601d15393ac40f249648ef000720e7e3',
+            'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a',
              'thumbnail': 're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.allocine.fr/video/video-19550147/',
-        'only_matching': True,
+        'md5': '3566c0668c0235e2d224fd8edb389f67',
+        'info_dict': {
+            'id': '19550147',
+            'ext': 'mp4',
+            'title': 'Faux Raccord N°123 - Les gaffes de Cliffhanger',
+            'description': 'md5:bc734b83ffa2d8a12188d9eb48bb6354',
+            'thumbnail': 're:http://.*\.jpg',
+        },
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        typ = mobj.group('typ')
-        display_id = mobj.group('id')
+        display_id = self._match_id(url)
  
          webpage = self._download_webpage(url, display_id)
  
-        if typ == 'film':
-            video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
-        else:
-            player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player', default=None)
-            if player:
-                player_data = json.loads(player)
-                video_id = compat_str(player_data['refMedia'])
-            else:
-                model = self._search_regex(r'data-model="([^"]+)">', webpage, 'data model')
-                model_data = self._parse_json(unescapeHTML(model), display_id)
-                video_id = compat_str(model_data['id'])
+        formats = []
+        quality = qualities(['ld', 'md', 'hd'])
  
-        xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
+        model = self._html_search_regex(
+            r'data-model="([^"]+)"', webpage, 'data model', default=None)
+        if model:
+            model_data = self._parse_json(model, display_id)
  
-        video = xpath_element(xml, './/AcVisionVideo').attrib
-        quality = qualities(['ld', 'md', 'hd'])
+            for video_url in model_data['sources'].values():
+                video_id, format_id = url_basename(video_url).split('_')[:2]
+                formats.append({
+                    'format_id': format_id,
+                    'quality': quality(format_id),
+                    'url': video_url,
+                })
  
-        formats = []
-        for k, v in video.items():
-            if re.match(r'.+_path', k):
-                format_id = k.split('_')[0]
+            title = model_data['title']
+        else:
+            video_id = display_id
+            media_data = self._download_json(
+                'http://www.allocine.fr/ws/AcVisiondataV5.ashx?media=%s' % video_id, display_id)
+            for key, value in media_data['video'].items():
+                if not key.endswith('Path'):
+                    continue
+
+                format_id = key[:-len('Path')]
                  formats.append({
                      'format_id': format_id,
                      'quality': quality(format_id),
-                    'url': v,
+                    'url': value,
                  })
+
+            title = remove_end(self._html_search_regex(
+                r'(?s)<title>(.+?)</title>', webpage, 'title'
+            ).strip(), ' - AlloCiné')
+
          self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'title': video['videoTitle'],
+            'display_id': display_id,
+            'title': title,
              'thumbnail': self._og_search_thumbnail(webpage),
              'formats': formats,
              'description': self._og_search_description(webpage),
diff --git a/youtube_dl/extractor/amcnetworks.py b/youtube_dl/extractor/amcnetworks.py

new file mode 100644 (file)

index 0000000..d2b03b1
--- /dev/null
+++ b/youtube_dl/extractor/amcnetworks.py
@@ -0,0 +1,92 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .theplatform import ThePlatformIE
+from ..utils import (
+    update_url_query,
+    parse_age_limit,
+    int_or_none,
+)
+
+
+class AMCNetworksIE(ThePlatformIE):
+    _VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?season-\d+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
+    _TESTS = [{
+        'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
+        'md5': '',
+        'info_dict': {
+            'id': 's3MX01Nl4vPH',
+            'ext': 'mp4',
+            'title': 'Maron - Season 4 - Step 1',
+            'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.',
+            'age_limit': 17,
+            'upload_date': '20160505',
+            'timestamp': 1462468831,
+            'uploader': 'AMCN',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'skip': 'Requires TV provider accounts',
+    }, {
+        'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.amc.com/shows/preacher/full-episodes/season-01/episode-00/pilot',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.wetv.com/shows/million-dollar-matchmaker/season-01/episode-06-the-dumped-dj-and-shallow-hal',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ifc.com/movies/chaos',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        query = {
+            'mbr': 'true',
+            'manifest': 'm3u',
+        }
+        media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url')
+        theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
+            r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id)
+        info = self._parse_theplatform_metadata(theplatform_metadata)
+        video_id = theplatform_metadata['pid']
+        title = theplatform_metadata['title']
+        rating = theplatform_metadata['ratings'][0]['rating']
+        auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required')
+        if auth_required == 'true':
+            requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id')
+            resource = self._get_mvpd_resource(requestor_id, title, video_id, rating)
+            query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource)
+        media_url = update_url_query(media_url, query)
+        formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
+        self._sort_formats(formats)
+        info.update({
+            'id': video_id,
+            'subtitles': subtitles,
+            'formats': formats,
+            'age_limit': parse_age_limit(parse_age_limit(rating)),
+        })
+        ns_keys = theplatform_metadata.get('$xmlns', {}).keys()
+        if ns_keys:
+            ns = list(ns_keys)[0]
+            series = theplatform_metadata.get(ns + '$show')
+            season_number = int_or_none(theplatform_metadata.get(ns + '$season'))
+            episode = theplatform_metadata.get(ns + '$episodeTitle')
+            episode_number = int_or_none(theplatform_metadata.get(ns + '$episode'))
+            if season_number:
+                title = 'Season %d - %s' % (season_number, title)
+            if series:
+                title = '%s - %s' % (series, title)
+            info.update({
+                'title': title,
+                'series': series,
+                'season_number': season_number,
+                'episode': episode,
+                'episode_number': episode_number,
+            })
+        return info
diff --git a/youtube_dl/extractor/amp.py b/youtube_dl/extractor/amp.py

index 138fa08086ee2d4e7c446c09ce85b8726e4ff255..e8e40126baca4bad27f8593dd9bd026f16fad131 100644 (file)
--- a/youtube_dl/extractor/amp.py
+++ b/youtube_dl/extractor/amp.py
@@ -5,6 +5,8 @@ from .common import InfoExtractor
  from ..utils import (
      int_or_none,
      parse_iso8601,
+    mimetype2ext,
+    determine_ext,
  )
  
  
@@ -50,21 +52,25 @@ class AMPIE(InfoExtractor):
          if isinstance(media_content, dict):
              media_content = [media_content]
          for media_data in media_content:
-            media = media_data['@attributes']
-            media_type = media['type']
-            if media_type == 'video/f4m':
+            media = media_data.get('@attributes', {})
+            media_url = media.get('url')
+            if not media_url:
+                continue
+            ext = mimetype2ext(media.get('type')) or determine_ext(media_url)
+            if ext == 'f4m':
                  formats.extend(self._extract_f4m_formats(
-                    media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
+                    media_url + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
                      video_id, f4m_id='hds', fatal=False))
-            elif media_type == 'application/x-mpegURL':
+            elif ext == 'm3u8':
                  formats.extend(self._extract_m3u8_formats(
-                    media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False))
+                    media_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
              else:
                  formats.append({
-                    'format_id': media_data['media-category']['@attributes']['label'],
+                    'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
                      'url': media['url'],
                      'tbr': int_or_none(media.get('bitrate')),
                      'filesize': int_or_none(media.get('fileSize')),
+                    'ext': ext,
                  })
  
          self._sort_formats(formats)
diff --git a/youtube_dl/extractor/animeondemand.py b/youtube_dl/extractor/animeondemand.py

index 9b01e38f5fe8b5a80b2635061433cc214fb1b315..9e28f25790bb284b5ba00376292cf33c8ded8cb1 100644 (file)
--- a/youtube_dl/extractor/animeondemand.py
+++ b/youtube_dl/extractor/animeondemand.py
@@ -22,6 +22,7 @@ class AnimeOnDemandIE(InfoExtractor):
      _APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
      _NETRC_MACHINE = 'animeondemand'
      _TESTS = [{
+        # jap, OmU
          'url': 'https://www.anime-on-demand.de/anime/161',
          'info_dict': {
              'id': '161',
@@ -30,17 +31,21 @@ class AnimeOnDemandIE(InfoExtractor):
          },
          'playlist_mincount': 4,
      }, {
-        # Film wording is used instead of Episode
+        # Film wording is used instead of Episode, ger/jap, Dub/OmU
          'url': 'https://www.anime-on-demand.de/anime/39',
          'only_matching': True,
      }, {
-        # Episodes without titles
+        # Episodes without titles, jap, OmU
          'url': 'https://www.anime-on-demand.de/anime/162',
          'only_matching': True,
      }, {
          # ger/jap, Dub/OmU, account required
          'url': 'https://www.anime-on-demand.de/anime/169',
          'only_matching': True,
+    }, {
+        # Full length film, non-series, ger/jap, Dub/OmU, account required
+        'url': 'https://www.anime-on-demand.de/anime/185',
+        'only_matching': True,
      }]
  
      def _login(self):
@@ -110,35 +115,12 @@ class AnimeOnDemandIE(InfoExtractor):
  
          entries = []
  
-        for num, episode_html in enumerate(re.findall(
-                r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1):
-            episodebox_title = self._search_regex(
-                (r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
-                 r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
-                episode_html, 'episodebox title', default=None, group='title')
-            if not episodebox_title:
-                continue
-
-            episode_number = int(self._search_regex(
-                r'(?:Episode|Film)\s*(\d+)',
-                episodebox_title, 'episode number', default=num))
-            episode_title = self._search_regex(
-                r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
-                episodebox_title, 'episode title', default=None)
-
-            video_id = 'episode-%d' % episode_number
-
-            common_info = {
-                'id': video_id,
-                'series': anime_title,
-                'episode': episode_title,
-                'episode_number': episode_number,
-            }
-
+        def extract_info(html, video_id, num=None):
+            title, description = [None] * 2
              formats = []
  
              for input_ in re.findall(
-                    r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html):
+                    r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html):
                  attributes = extract_attributes(input_)
                  playlist_urls = []
                  for playlist_key in ('data-playlist', 'data-otherplaylist'):
@@ -161,7 +143,7 @@ class AnimeOnDemandIE(InfoExtractor):
                          format_id_list.append(lang)
                      if kind:
                          format_id_list.append(kind)
-                    if not format_id_list:
+                    if not format_id_list and num is not None:
                          format_id_list.append(compat_str(num))
                      format_id = '-'.join(format_id_list)
                      format_note = ', '.join(filter(None, (kind, lang_note)))
@@ -215,28 +197,74 @@ class AnimeOnDemandIE(InfoExtractor):
                              })
                          formats.extend(file_formats)
  
-            if formats:
-                self._sort_formats(formats)
+            return {
+                'title': title,
+                'description': description,
+                'formats': formats,
+            }
+
+        def extract_entries(html, video_id, common_info, num=None):
+            info = extract_info(html, video_id, num)
+
+            if info['formats']:
+                self._sort_formats(info['formats'])
                  f = common_info.copy()
-                f.update({
-                    'title': title,
-                    'description': description,
-                    'formats': formats,
-                })
+                f.update(info)
                  entries.append(f)
  
-            # Extract teaser only when full episode is not available
-            if not formats:
+            # Extract teaser/trailer only when full episode is not available
+            if not info['formats']:
                  m = re.search(
-                    r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
-                    episode_html)
+                    r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>(?P<kind>Teaser|Trailer)<',
+                    html)
                  if m:
                      f = common_info.copy()
                      f.update({
-                        'id': '%s-teaser' % f['id'],
+                        'id': '%s-%s' % (f['id'], m.group('kind').lower()),
                          'title': m.group('title'),
                          'url': compat_urlparse.urljoin(url, m.group('href')),
                      })
                      entries.append(f)
  
+        def extract_episodes(html):
+            for num, episode_html in enumerate(re.findall(
+                    r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', html), 1):
+                episodebox_title = self._search_regex(
+                    (r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
+                     r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
+                    episode_html, 'episodebox title', default=None, group='title')
+                if not episodebox_title:
+                    continue
+
+                episode_number = int(self._search_regex(
+                    r'(?:Episode|Film)\s*(\d+)',
+                    episodebox_title, 'episode number', default=num))
+                episode_title = self._search_regex(
+                    r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
+                    episodebox_title, 'episode title', default=None)
+
+                video_id = 'episode-%d' % episode_number
+
+                common_info = {
+                    'id': video_id,
+                    'series': anime_title,
+                    'episode': episode_title,
+                    'episode_number': episode_number,
+                }
+
+                extract_entries(episode_html, video_id, common_info)
+
+        def extract_film(html, video_id):
+            common_info = {
+                'id': anime_id,
+                'title': anime_title,
+                'description': anime_description,
+            }
+            extract_entries(html, video_id, common_info)
+
+        extract_episodes(webpage)
+
+        if not entries:
+            extract_film(webpage, anime_id)
+
          return self.playlist_result(entries, anime_id, anime_title, anime_description)
diff --git a/youtube_dl/extractor/anvato.py b/youtube_dl/extractor/anvato.py

new file mode 100644 (file)

index 0000000..623f44d
--- /dev/null
+++ b/youtube_dl/extractor/anvato.py
@@ -0,0 +1,230 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import base64
+import hashlib
+import json
+import random
+import time
+
+from .common import InfoExtractor
+from ..aes import aes_encrypt
+from ..compat import compat_str
+from ..utils import (
+    bytes_to_intlist,
+    determine_ext,
+    intlist_to_bytes,
+    int_or_none,
+    strip_jsonp,
+)
+
+
+def md5_text(s):
+    if not isinstance(s, compat_str):
+        s = compat_str(s)
+    return hashlib.md5(s.encode('utf-8')).hexdigest()
+
+
+class AnvatoIE(InfoExtractor):
+    # Copied from anvplayer.min.js
+    _ANVACK_TABLE = {
+        'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
+        'nbcu_nbcd_desktop_web_qa_1a6f01bdd0dc45a439043b694c8a031d': 'eSxJUbA2UUKBTXryyQ2d6NuM8oEqaPySvaPzfKNA',
+        'nbcu_nbcd_desktop_web_acc_eb2ff240a5d4ae9a63d4c297c32716b6c523a129': '89JR3RtUGbvKuuJIiKOMK0SoarLb5MUx8v89RcbP',
+        'nbcu_nbcd_watchvod_web_prod_e61107507180976724ec8e8319fe24ba5b4b60e1': 'Uc7dFt7MJ9GsBWB5T7iPvLaMSOt8BBxv4hAXk5vv',
+        'nbcu_nbcd_watchvod_web_qa_42afedba88a36203db5a4c09a5ba29d045302232': 'T12oDYVFP2IaFvxkmYMy5dKxswpLHtGZa4ZAXEi7',
+        'nbcu_nbcd_watchvod_web_acc_9193214448e2e636b0ffb78abacfd9c4f937c6ca': 'MmobcxUxMedUpohNWwXaOnMjlbiyTOBLL6d46ZpR',
+        'nbcu_local_monitor_web_acc_f998ad54eaf26acd8ee033eb36f39a7b791c6335': 'QvfIoPYrwsjUCcASiw3AIkVtQob2LtJHfidp9iWg',
+        'nbcu_cable_monitor_web_acc_a413759603e8bedfcd3c61b14767796e17834077': 'uwVPJLShvJWSs6sWEIuVem7MTF8A4IknMMzIlFto',
+        'nbcu_nbcd_mcpstage_web_qa_4c43a8f6e95a88dbb40276c0630ba9f693a63a4e': 'PxVYZVwjhgd5TeoPRxL3whssb5OUPnM3zyAzq8GY',
+        'nbcu_comcast_comcast_web_prod_074080762ad4ce956b26b43fb22abf153443a8c4': 'afnaRZfDyg1Z3WZHdupKfy6xrbAG2MHqe3VfuSwh',
+        'nbcu_comcast_comcast_web_qa_706103bb93ead3ef70b1de12a0e95e3c4481ade0': 'DcjsVbX9b3uoPlhdriIiovgFQZVxpISZwz0cx1ZK',
+        'nbcu_comcast_comcastcable_web_prod_669f04817536743563d7331c9293e59fbdbe3d07': '0RwMN2cWy10qhAhOscq3eK7aEe0wqnKt3vJ0WS4D',
+        'nbcu_comcast_comcastcable_web_qa_3d9d2d66219094127f0f6b09cc3c7bb076e3e1ca': '2r8G9DEya7PCqBceKZgrn2XkXgASjwLMuaFE1Aad',
+        'hearst_hearst_demo_web_stage_960726dfef3337059a01a78816e43b29ec04dfc7': 'cuZBPXTR6kSdoTCVXwk5KGA8rk3NrgGn4H6e9Dsp',
+        'anvato_mcpqa_demo_web_stage_18b55e00db5a13faa8d03ae6e41f6f5bcb15b922': 'IOaaLQ8ymqVyem14QuAvE5SndQynTcH5CrLkU2Ih',
+        'anvato_nextmedia_demo_web_stage_9787d56a02ff6b9f43e9a2b0920d8ca88beb5818': 'Pqu9zVzI1ApiIzbVA3VkGBEQHvdKSUuKpD6s2uaR',
+        'anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a': 'du1ccmn7RxzgizwbWU7hyUaGodNlJn7HtXI0WgXW',
+        'anvato_scripps_app_web_stage_360797e00fe2826be142155c4618cc52fce6c26c': '2PMrQ0BRoqCWl7nzphj0GouIMEh2mZYivAT0S1Su',
+        'fs2go_fs2go_go_all_prod_21934911ccfafc03a075894ead2260d11e2ddd24': 'RcuHlKikW2IJw6HvVoEkqq2UsuEJlbEl11pWXs4Q',
+        'fs2go_fs2go_go_web_prod_ead4b0eec7460c1a07783808db21b49cf1f2f9a7': '4K0HTT2u1zkQA2MaGaZmkLa1BthGSBdr7jllrhk5',
+        'fs2go_fs2go_go_web_stage_407585454a4400355d4391691c67f361': 'ftnc37VKRJBmHfoGGi3kT05bHyeJzilEzhKJCyl3',
+        'fs2go_fs2go_go_android_stage_44b714db6f8477f29afcba15a41e1d30': 'CtxpPvVpo6AbZGomYUhkKs7juHZwNml9b9J0J2gI',
+        'anvato_cbslocal_app_web_prod_547f3e49241ef0e5d30c79b2efbca5d92c698f67': 'Pw0XX5KBDsyRnPS0R2JrSrXftsy8Jnz5pAjaYC8s',
+        'anvato_cbslocal_app_web_stage_547a5f096594cd3e00620c6f825cad1096d28c80': '37OBUhX2uwNyKhhrNzSSNHSRPZpApC3trdqDBpuz',
+        'fs2go_att_att_web_prod_1042dddd089a05438b6a08f972941176f699ffd8': 'JLcF20JwYvpv6uAGcLWIaV12jKwaL1R8us4b6Zkg',
+        'fs2go_att_att_web_stage_807c5001955fc114a3331fe027ddc76e': 'gbu1oO1y0JiOFh4SUipt86P288JHpyjSqolrrT1x',
+        'fs2go_fs2go_tudor_web_prod_a7dd8e5a7cdc830cae55eae6f3e9fee5ee49eb9b': 'ipcp87VCEZXPPe868j3orLqzc03oTy7DXsGkAXXH',
+        'anvato_mhz_app_web_prod_b808218b30de7fdf60340cbd9831512bc1bf6d37': 'Stlm5Gs6BEhJLRTZHcNquyzxGqr23EuFmE5DCgjX',
+        'fs2go_charter_charter_web_stage_c2c6e5a68375a1bf00fff213d3ff8f61a835a54c': 'Lz4hbJp1fwL6jlcz4M2PMzghM4jp4aAmybtT5dPc',
+        'fs2go_charter_charter_web_prod_ebfe3b10f1af215a7321cd3d629e0b81dfa6fa8c': 'vUJsK345A1bVmyYDRhZX0lqFIgVXuqhmuyp1EtPK',
+        'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b': 'GDKq1ixvX3MoBNdU5IOYmYa2DTUXYOozPjrCJnW7',
+        'anvato_epfox_app_web_stage_a3c2ce60f8f83ef374a88b68ee73a950f8ab87ce': '2jz2NH4BsXMaDsoJ5qkHMbcczAfIReo2eFYuVC1C',
+        'fs2go_verizon_verizon_web_stage_08e6df0354a4803f1b1f2428b5a9a382e8dbcd62': 'rKTVapNaAcmnUbGL4ZcuOoY4SE7VmZSQsblPFr7e',
+        'fs2go_verizon_verizon_web_prod_f909564cb606eff1f731b5e22e0928676732c445': 'qLSUuHerM3u9eNPzaHyUK52obai5MvE4XDJfqYe1',
+        'fs2go_foxcom_synd_web_stage_f7b9091f00ea25a4fdaaae77fca5b54cdc7e7043': '96VKF2vLd24fFiDfwPFpzM5llFN4TiIGAlodE0Re',
+        'fs2go_foxcom_synd_web_prod_0f2cdd64d87e4ab6a1d54aada0ff7a7c8387a064': 'agiPjbXEyEZUkbuhcnmVPhe9NNVbDjCFq2xkcx51',
+        'anvato_own_app_web_stage_1214ade5d28422c4dae9d03c1243aba0563c4dba': 'mzhamNac3swG4WsJAiUTacnGIODi6SWeVWk5D7ho',
+        'anvato_own_app_web_prod_944e162ed927ec3e9ed13eb68ed2f1008ee7565e': '9TSxh6G2TXOLBoYm9ro3LdNjjvnXpKb8UR8KoIP9',
+        'anvato_scripps_app_ftv_prod_a10a10468edd5afb16fb48171c03b956176afad1': 'COJ2i2UIPK7xZqIWswxe7FaVBOVgRkP1F6O6qGoH',
+        'anvato_scripps_app_ftv_stage_77d3ad2bdb021ec37ca2e35eb09acd396a974c9a': 'Q7nnopNLe2PPfGLOTYBqxSaRpl209IhqaEuDZi1F',
+        'anvato_univision_app_web_stage_551236ef07a0e17718c3995c35586b5ed8cb5031': 'D92PoLS6UitwxDRA191HUGT9OYcOjV6mPMa5wNyo',
+        'anvato_univision_app_web_prod_039a5c0a6009e637ae8ac906718a79911e0e65e1': '5mVS5u4SQjtw6NGw2uhMbKEIONIiLqRKck5RwQLR',
+        'nbcu_cnbc_springfield_ios_prod_670207fae43d6e9a94c351688851a2ce': 'M7fqCCIP9lW53oJbHs19OlJlpDrVyc2OL8gNeuTa',
+        'nbcu_cnbc_springfieldvod_ios_prod_7a5f04b1ceceb0e9c9e2264a44aa236e08e034c2': 'Yia6QbJahW0S7K1I0drksimhZb4UFq92xLBmmMvk',
+        'anvato_cox_app_web_prod_ce45cda237969f93e7130f50ee8bb6280c1484ab': 'cc0miZexpFtdoqZGvdhfXsLy7FXjRAOgb9V0f5fZ',
+        'anvato_cox_app_web_stage_c23dbe016a8e9d8c7101d10172b92434f6088bf9': 'yivU3MYHd2eDZcOfmLbINVtqxyecKTOp8OjOuoGJ',
+        'anvato_chnzero_app_web_stage_b1164d1352b579e792e542fddf13ee34c0eeb46b': 'A76QkXMmVH8lTCfU15xva1mZnSVcqeY4Xb22Kp7m',
+        'anvato_chnzero_app_web_prod_253d358928dc08ec161eda2389d53707288a730c': 'OA5QI3ZWZZkdtUEDqh28AH8GedsF6FqzJI32596b',
+        'anvato_discovery_vodpoc_web_stage_9fa7077b5e8af1f8355f65d4fb8d2e0e9d54e2b7': 'q3oT191tTQ5g3JCP67PkjLASI9s16DuWZ6fYmry3',
+        'anvato_discovery_vodpoc_web_prod_688614983167a1af6cdf6d76343fda10a65223c1': 'qRvRQCTVHd0VVOHsMvvfidyWmlYVrTbjby7WqIuK',
+        'nbcu_cnbc_springfieldvod_ftv_stage_826040aad1925a46ac5dfb4b3c5143e648c6a30d': 'JQaSb5a8Tz0PT4ti329DNmzDO30TnngTHmvX8Vua',
+        'nbcu_cnbc_springfield_ftv_stage_826040aad1925a46ac5dfb4b3c5143e648c6a30d': 'JQaSb5a8Tz0PT4ti329DNmzDO30TnngTHmvX8Vua',
+        'nbcu_nbcd_capture_web_stage_4dd9d585bfb984ebf856dee35db027b2465cc4ae': '0j1Ov4Vopyi2HpBZJYdL2m8ERJVGYh3nNpzPiO8F',
+        'nbcu_nbcd_watch3_android_prod_7712ca5fcf1c22f19ec1870a9650f9c37db22dcf': '3LN2UB3rPUAMu7ZriWkHky9vpLMXYha8JbSnxBlx',
+        'nbcu_nbcd_watchvod3_android_prod_0910a3a4692d57c0b5ff4316075bc5d096be45b9': 'mJagcQ2II30vUOAauOXne7ERwbf5S9nlB3IP17lQ',
+        'anvato_scripps_app_atv_prod_790deda22e16e71e83df58f880cd389908a45d52': 'CB6trI1mpoDIM5o54DNTsji90NDBQPZ4z4RqBNSH',
+        'nbcu_nbcd_watchv4_android_prod_ff67cef9cb409158c6f8c3533edddadd0b750507': 'j8CHQCUWjlYERj4NFRmUYOND85QNbHViH09UwuKm',
+        'nbcu_nbcd_watchvodv4_android_prod_a814d781609989dea6a629d50ae4c7ad8cc8e907': 'rkVnUXxdA9rawVLUlDQtMue9Y4Q7lFEaIotcUhjt',
+        'rvVKpA50qlOPLFxMjrCGf5pdkdQDm7qn': '1J7ZkY5Qz5lMLi93QOH9IveE7EYB3rLl',
+        'nbcu_dtv_local_web_prod_b266cf49defe255fd4426a97e27c09e513e9f82f': 'HuLnJDqzLa4saCzYMJ79zDRSQpEduw1TzjMNQu2b',
+        'nbcu_att_local_web_prod_4cef038b2d969a6b7d700a56a599040b6a619f67': 'Q0Em5VDc2KpydUrVwzWRXAwoNBulWUxCq2faK0AV',
+        'nbcu_dish_local_web_prod_c56dcaf2da2e9157a4266c82a78195f1dd570f6b': 'bC1LWmRz9ayj2AlzizeJ1HuhTfIaJGsDBnZNgoRg',
+        'nbcu_verizon_local_web_prod_88bebd2ce006d4ed980de8133496f9a74cb9b3e1': 'wzhDKJZpgvUSS1EQvpCQP8Q59qVzcPixqDGJefSk',
+        'nbcu_charter_local_web_prod_9ad90f7fc4023643bb718f0fe0fd5beea2382a50': 'PyNbxNhEWLzy1ZvWEQelRuIQY88Eub7xbSVRMdfT',
+        'nbcu_suddenlink_local_web_prod_20fb711725cac224baa1c1cb0b1c324d25e97178': '0Rph41lPXZbb3fqeXtHjjbxfSrNbtZp1Ygq7Jypa',
+        'nbcu_wow_local_web_prod_652d9ce4f552d9c2e7b5b1ed37b8cb48155174ad': 'qayIBZ70w1dItm2zS42AptXnxW15mkjRrwnBjMPv',
+        'nbcu_centurylink_local_web_prod_2034402b029bf3e837ad46814d9e4b1d1345ccd5': 'StePcPMkjsX51PcizLdLRMzxMEl5k2FlsMLUNV4k',
+        'nbcu_atlanticbrd_local_web_prod_8d5f5ecbf7f7b2f5e6d908dd75d90ae3565f682e': 'NtYLb4TFUS0pRs3XTkyO5sbVGYjVf17bVbjaGscI',
+        'nbcu_nbcd_watchvod_web_dev_08bc05699be47c4f31d5080263a8cfadc16d0f7c': 'hwxi2dgDoSWgfmVVXOYZm14uuvku4QfopstXckhr',
+        'anvato_nextmedia_app_web_prod_a4fa8c7204aa65e71044b57aaf63711980cfe5a0': 'tQN1oGPYY1nM85rJYePWGcIb92TG0gSqoVpQTWOw',
+        'anvato_mcp_lin_web_prod_4c36fbfd4d8d8ecae6488656e21ac6d1ac972749': 'GUXNf5ZDX2jFUpu4WT2Go4DJ5nhUCzpnwDRRUx1K',
+        'anvato_mcp_univision_web_prod_37fe34850c99a3b5cdb71dab10a417dd5cdecafa': 'bLDYF8JqfG42b7bwKEgQiU9E2LTIAtnKzSgYpFUH',
+        'anvato_mcp_fs2go_web_prod_c7b90a93e171469cdca00a931211a2f556370d0a': 'icgGoYGipQMMSEvhplZX1pwbN69srwKYWksz3xWK',
+        'anvato_mcp_sps_web_prod_54bdc90dd6ba21710e9f7074338365bba28da336': 'fA2iQdI7RDpynqzQYIpXALVS83NTPr8LLFK4LFsu',
+        'anvato_mcp_anv_web_prod_791407490f4c1ef2a4bcb21103e0cb1bcb3352b3': 'rMOUZqe9lwcGq2mNgG3EDusm6lKgsUnczoOX3mbg',
+        'anvato_mcp_gray_web_prod_4c10f067c393ed8fc453d3930f8ab2b159973900': 'rMOUZqe9lwcGq2mNgG3EDusm6lKgsUnczoOX3mbg',
+        'anvato_mcp_hearst_web_prod_5356c3de0fc7c90a3727b4863ca7fec3a4524a99': 'P3uXJ0fXXditBPCGkfvlnVScpPEfKmc64Zv7ZgbK',
+        'anvato_mcp_cbs_web_prod_02f26581ff80e5bda7aad28226a8d369037f2cbe': 'mGPvo5ZA5SgjOFAPEPXv7AnOpFUICX8hvFQVz69n',
+        'anvato_mcp_telemundo_web_prod_c5278d51ad46fda4b6ca3d0ea44a7846a054f582': 'qyT6PXXLjVNCrHaRVj0ugAhalNRS7Ee9BP7LUokD',
+        'nbcu_nbcd_watchvodv4_web_stage_4108362fba2d4ede21f262fea3c4162cbafd66c7': 'DhaU5lj0W2gEdcSSsnxURq8t7KIWtJfD966crVDk',
+        'anvato_scripps_app_ios_prod_409c41960c60b308db43c3cc1da79cab9f1c3d93': 'WPxj5GraLTkYCyj3M7RozLqIycjrXOEcDGFMIJPn',
+        'EZqvRyKBJLrgpClDPDF8I7Xpdp40Vx73': '4OxGd2dEakylntVKjKF0UK9PDPYB6A9W',
+        'M2v78QkpleXm9hPp9jUXI63x5vA6BogR': 'ka6K32k7ZALmpINkjJUGUo0OE42Md1BQ',
+        'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ'
+    }
+
+    _AUTH_KEY = b'\x31\xc2\x42\x84\x9e\x73\xa0\xce'
+
+    def __init__(self, *args, **kwargs):
+        super(AnvatoIE, self).__init__(*args, **kwargs)
+        self.__server_time = None
+
+    def _server_time(self, access_key, video_id):
+        if self.__server_time is not None:
+            return self.__server_time
+
+        self.__server_time = int(self._download_json(
+            self._api_prefix(access_key) + 'server_time?anvack=' + access_key, video_id,
+            note='Fetching server time')['server_time'])
+
+        return self.__server_time
+
+    def _api_prefix(self, access_key):
+        return 'https://tkx2-%s.anvato.net/rest/v2/' % ('prod' if 'prod' in access_key else 'stage')
+
+    def _get_video_json(self, access_key, video_id):
+        # See et() in anvplayer.min.js, which is an alias of getVideoJSON()
+        video_data_url = self._api_prefix(access_key) + 'mcp/video/%s?anvack=%s' % (video_id, access_key)
+        server_time = self._server_time(access_key, video_id)
+        input_data = '%d~%s~%s' % (server_time, md5_text(video_data_url), md5_text(server_time))
+
+        auth_secret = intlist_to_bytes(aes_encrypt(
+            bytes_to_intlist(input_data[:64]), bytes_to_intlist(self._AUTH_KEY)))
+
+        video_data_url += '&X-Anvato-Adst-Auth=' + base64.b64encode(auth_secret).decode('ascii')
+        anvrid = md5_text(time.time() * 1000 * random.random())[:30]
+        payload = {
+            'api': {
+                'anvrid': anvrid,
+                'anvstk': md5_text('%s|%s|%d|%s' % (
+                    access_key, anvrid, server_time, self._ANVACK_TABLE[access_key])),
+                'anvts': server_time,
+            },
+        }
+
+        return self._download_json(
+            video_data_url, video_id, transform_source=strip_jsonp,
+            data=json.dumps(payload).encode('utf-8'))
+
+    def _get_anvato_videos(self, access_key, video_id):
+        video_data = self._get_video_json(access_key, video_id)
+
+        formats = []
+        for published_url in video_data['published_urls']:
+            video_url = published_url['embed_url']
+            media_format = published_url.get('format')
+            ext = determine_ext(video_url)
+
+            if ext == 'smil' or media_format == 'smil':
+                formats.extend(self._extract_smil_formats(video_url, video_id))
+                continue
+
+            tbr = int_or_none(published_url.get('kbps'))
+            a_format = {
+                'url': video_url,
+                'format_id': ('-'.join(filter(None, ['http', published_url.get('cdn_name')]))).lower(),
+                'tbr': tbr if tbr != 0 else None,
+            }
+
+            if ext == 'm3u8' or media_format in ('m3u8', 'm3u8-variant'):
+                # Not using _extract_m3u8_formats here as individual media
+                # playlists are also included in published_urls.
+                if tbr is None:
+                    formats.append(self._m3u8_meta_format(video_url, ext='mp4', m3u8_id='hls'))
+                    continue
+                else:
+                    a_format.update({
+                        'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])),
+                        'ext': 'mp4',
+                    })
+            elif ext == 'mp3' or media_format == 'mp3':
+                a_format['vcodec'] = 'none'
+            else:
+                a_format.update({
+                    'width': int_or_none(published_url.get('width')),
+                    'height': int_or_none(published_url.get('height')),
+                })
+            formats.append(a_format)
+
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for caption in video_data.get('captions', []):
+            a_caption = {
+                'url': caption['url'],
+                'ext': 'tt' if caption.get('format') == 'SMPTE-TT' else None
+            }
+            subtitles.setdefault(caption['language'], []).append(a_caption)
+
+        return {
+            'id': video_id,
+            'formats': formats,
+            'title': video_data.get('def_title'),
+            'description': video_data.get('def_description'),
+            'tags': video_data.get('def_tags', '').split(','),
+            'categories': video_data.get('categories'),
+            'thumbnail': video_data.get('thumbnail'),
+            'timestamp': int_or_none(video_data.get(
+                'ts_published') or video_data.get('ts_added')),
+            'uploader': video_data.get('mcp_id'),
+            'duration': int_or_none(video_data.get('duration')),
+            'subtitles': subtitles,
+        }
+
+    def _extract_anvato_videos(self, webpage, video_id):
+        anvplayer_data = self._parse_json(self._html_search_regex(
+            r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
+            'Anvato player data'), video_id)
+        return self._get_anvato_videos(
+            anvplayer_data['accessKey'], anvplayer_data['video'])
diff --git a/youtube_dl/extractor/aol.py b/youtube_dl/extractor/aol.py

index 95a99c6b0d567c52b477a1964d9c055d0b0a6b8a..2cdee33200232dc69c1755213fc2f8298c6c8fa3 100644 (file)
--- a/youtube_dl/extractor/aol.py
+++ b/youtube_dl/extractor/aol.py
@@ -1,26 +1,113 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+)
  
  
  class AolIE(InfoExtractor):
      IE_NAME = 'on.aol.com'
-    _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
+    _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/(?:[^/]+/)*(?:[^/?#&]+-)?)(?P<id>[^/?#&]+)'
  
      _TESTS = [{
+        # video with 5min ID
          'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
          'md5': '18ef68f48740e86ae94b98da815eec42',
          'info_dict': {
              'id': '518167793',
              'ext': 'mp4',
              'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam',
+            'description': 'A major phone scam has cost thousands of taxpayers more than $1 million, with less than a month until income tax returns are due to the IRS.',
+            'timestamp': 1395405060,
+            'upload_date': '20140321',
+            'uploader': 'Newsy Studio',
          },
-        'add_ie': ['FiveMin'],
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        # video with vidible ID
+        'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
+        'info_dict': {
+            'id': '5707d6b8e4b090497b04f706',
+            'ext': 'mp4',
+            'title': 'Netflix is Raising Rates',
+            'description': 'Netflix is rewarding millions of it’s long-standing members with an increase in cost. Veuer’s Carly Figueroa has more.',
+            'upload_date': '20160408',
+            'timestamp': 1460123280,
+            'uploader': 'Veuer',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://on.aol.com/partners/abc-551438d309eab105804dbfe8/sneak-peek-was-haley-really-framed-570eaebee4b0448640a5c944',
+        'only_matching': True,
+    }, {
+        'url': 'http://on.aol.com/shows/park-bench-shw518173474-559a1b9be4b0c3bfad3357a7?context=SH:SHW518173474:PL4327:1460619712763',
+        'only_matching': True,
+    }, {
+        'url': 'http://on.aol.com/video/519442220',
+        'only_matching': True,
+    }, {
+        'url': 'aol-video:5707d6b8e4b090497b04f706',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        return self.url_result('5min:%s' % video_id)
+
+        response = self._download_json(
+            'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id,
+            video_id)['response']
+        if response['statusText'] != 'Ok':
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, response['statusText']), expected=True)
+
+        video_data = response['data']
+        formats = []
+        m3u8_url = video_data.get('videoMasterPlaylist')
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+        for rendition in video_data.get('renditions', []):
+            video_url = rendition.get('url')
+            if not video_url:
+                continue
+            ext = rendition.get('format')
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+            else:
+                f = {
+                    'url': video_url,
+                    'format_id': rendition.get('quality'),
+                }
+                mobj = re.search(r'(\d+)x(\d+)', video_url)
+                if mobj:
+                    f.update({
+                        'width': int(mobj.group(1)),
+                        'height': int(mobj.group(2)),
+                    })
+                formats.append(f)
+        self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
+
+        return {
+            'id': video_id,
+            'title': video_data['title'],
+            'duration': int_or_none(video_data.get('duration')),
+            'timestamp': int_or_none(video_data.get('publishDate')),
+            'view_count': int_or_none(video_data.get('views')),
+            'description': video_data.get('description'),
+            'uploader': video_data.get('videoOwner'),
+            'formats': formats,
+        }
  
  
  class AolFeaturesIE(InfoExtractor):
@@ -36,6 +123,10 @@ class AolFeaturesIE(InfoExtractor):
              'title': 'What To Watch - February 17, 2016',
          },
          'add_ie': ['FiveMin'],
+        'params': {
+            # encrypted m3u8 download
+            'skip_download': True,
+        },
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/aparat.py b/youtube_dl/extractor/aparat.py

index 63429780e8abf528165daf7e50a6317bce9a6c7d..025e29aa46fe5db97c323fa95d947470f1f2023a 100644 (file)
--- a/youtube_dl/extractor/aparat.py
+++ b/youtube_dl/extractor/aparat.py
@@ -1,8 +1,6 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
@@ -15,7 +13,7 @@ class AparatIE(InfoExtractor):
  
      _TEST = {
          'url': 'http://www.aparat.com/v/wP8On',
-        'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1',
+        'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
          'info_dict': {
              'id': 'wP8On',
              'ext': 'mp4',
@@ -31,13 +29,13 @@ class AparatIE(InfoExtractor):
          # Note: There is an easier-to-parse configuration at
          # http://www.aparat.com/video/video/config/videohash/%video_id
          # but the URL in there does not work
-        embed_url = ('http://www.aparat.com/video/video/embed/videohash/' +
-                     video_id + '/vt/frame')
+        embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id
          webpage = self._download_webpage(embed_url, video_id)
  
-        video_urls = [video_url.replace('\\/', '/') for video_url in re.findall(
-            r'(?:fileList\[[0-9]+\]\s*=|"file"\s*:)\s*"([^"]+)"', webpage)]
-        for i, video_url in enumerate(video_urls):
+        file_list = self._parse_json(self._search_regex(
+            r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
+        for i, item in enumerate(file_list[0]):
+            video_url = item['file']
              req = HEADRequest(video_url)
              res = self._request_webpage(
                  req, video_id, note='Testing video URL %d' % i, errnote=False)
diff --git a/youtube_dl/extractor/appletrailers.py b/youtube_dl/extractor/appletrailers.py

index be40f85b487057b4cb319dba102cec76519880a5..a6801f3d4860414c286277c92bd994e16212cffd 100644 (file)
--- a/youtube_dl/extractor/appletrailers.py
+++ b/youtube_dl/extractor/appletrailers.py
@@ -7,6 +7,8 @@ from .common import InfoExtractor
  from ..compat import compat_urlparse
  from ..utils import (
      int_or_none,
+    parse_duration,
+    unified_strdate,
  )
  
  
@@ -16,7 +18,8 @@ class AppleTrailersIE(InfoExtractor):
      _TESTS = [{
          'url': 'http://trailers.apple.com/trailers/wb/manofsteel/',
          'info_dict': {
-            'id': 'manofsteel',
+            'id': '5111',
+            'title': 'Man of Steel',
          },
          'playlist': [
              {
@@ -70,6 +73,15 @@ class AppleTrailersIE(InfoExtractor):
              'id': 'blackthorn',
          },
          'playlist_mincount': 2,
+        'expected_warnings': ['Unable to download JSON metadata'],
+    }, {
+        # json data only available from http://trailers.apple.com/trailers/feeds/data/15881.json
+        'url': 'http://trailers.apple.com/trailers/fox/kungfupanda3/',
+        'info_dict': {
+            'id': '15881',
+            'title': 'Kung Fu Panda 3',
+        },
+        'playlist_mincount': 4,
      }, {
          'url': 'http://trailers.apple.com/ca/metropole/autrui/',
          'only_matching': True,
@@ -85,6 +97,45 @@ class AppleTrailersIE(InfoExtractor):
          movie = mobj.group('movie')
          uploader_id = mobj.group('company')
  
+        webpage = self._download_webpage(url, movie)
+        film_id = self._search_regex(r"FilmId\s*=\s*'(\d+)'", webpage, 'film id')
+        film_data = self._download_json(
+            'http://trailers.apple.com/trailers/feeds/data/%s.json' % film_id,
+            film_id, fatal=False)
+
+        if film_data:
+            entries = []
+            for clip in film_data.get('clips', []):
+                clip_title = clip['title']
+
+                formats = []
+                for version, version_data in clip.get('versions', {}).items():
+                    for size, size_data in version_data.get('sizes', {}).items():
+                        src = size_data.get('src')
+                        if not src:
+                            continue
+                        formats.append({
+                            'format_id': '%s-%s' % (version, size),
+                            'url': re.sub(r'_(\d+p.mov)', r'_h\1', src),
+                            'width': int_or_none(size_data.get('width')),
+                            'height': int_or_none(size_data.get('height')),
+                            'language': version[:2],
+                        })
+                self._sort_formats(formats)
+
+                entries.append({
+                    'id': movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', clip_title).lower(),
+                    'formats': formats,
+                    'title': clip_title,
+                    'thumbnail': clip.get('screen') or clip.get('thumb'),
+                    'duration': parse_duration(clip.get('runtime') or clip.get('faded')),
+                    'upload_date': unified_strdate(clip.get('posted')),
+                    'uploader_id': uploader_id,
+                })
+
+            page_data = film_data.get('page', {})
+            return self.playlist_result(entries, film_id, page_data.get('movie_title'))
+
          playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc')
  
          def fix_html(s):
diff --git a/youtube_dl/extractor/archiveorg.py b/youtube_dl/extractor/archiveorg.py

index 8feb7cb7456ec4db8d6a8f28b411a58cb5ac47a1..486dff82d00a44a13384e3b7d8ff1b0189da8451 100644 (file)
--- a/youtube_dl/extractor/archiveorg.py
+++ b/youtube_dl/extractor/archiveorg.py
@@ -1,67 +1,65 @@
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..utils import unified_strdate
+from .jwplatform import JWPlatformBaseIE
+from ..utils import (
+    unified_strdate,
+    clean_html,
+)
  
  
-class ArchiveOrgIE(InfoExtractor):
+class ArchiveOrgIE(JWPlatformBaseIE):
      IE_NAME = 'archive.org'
      IE_DESC = 'archive.org videos'
-    _VALID_URL = r'https?://(?:www\.)?archive\.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
+    _VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'
      _TESTS = [{
          'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
          'md5': '8af1d4cf447933ed3c7f4871162602db',
          'info_dict': {
              'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect',
-            'ext': 'ogv',
+            'ext': 'ogg',
              'title': '1968 Demo - FJCC Conference Presentation Reel #1',
-            'description': 'md5:1780b464abaca9991d8968c877bb53ed',
+            'description': 'md5:da45c349df039f1cc8075268eb1b5c25',
              'upload_date': '19681210',
              'uploader': 'SRI International'
          }
      }, {
          'url': 'https://archive.org/details/Cops1922',
-        'md5': '18f2a19e6d89af8425671da1cf3d4e04',
+        'md5': 'bc73c8ab3838b5a8fc6c6651fa7b58ba',
          'info_dict': {
              'id': 'Cops1922',
-            'ext': 'ogv',
+            'ext': 'mp4',
              'title': 'Buster Keaton\'s "Cops" (1922)',
-            'description': 'md5:70f72ee70882f713d4578725461ffcc3',
+            'description': 'md5:b4544662605877edd99df22f9620d858',
          }
+    }, {
+        'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        webpage = self._download_webpage(
+            'http://archive.org/embed/' + video_id, video_id)
+        jwplayer_playlist = self._parse_json(self._search_regex(
+            r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\);",
+            webpage, 'jwplayer playlist'), video_id)
+        info = self._parse_jwplayer_data(
+            {'playlist': jwplayer_playlist}, video_id, base_url=url)
  
-        json_url = url + ('&' if '?' in url else '?') + 'output=json'
-        data = self._download_json(json_url, video_id)
-
-        def get_optional(data_dict, field):
-            return data_dict['metadata'].get(field, [None])[0]
-
-        title = get_optional(data, 'title')
-        description = get_optional(data, 'description')
-        uploader = get_optional(data, 'creator')
-        upload_date = unified_strdate(get_optional(data, 'date'))
+        def get_optional(metadata, field):
+            return metadata.get(field, [None])[0]
  
-        formats = [
-            {
-                'format': fdata['format'],
-                'url': 'http://' + data['server'] + data['dir'] + fn,
-                'file_size': int(fdata['size']),
-            }
-            for fn, fdata in data['files'].items()
-            if 'Video' in fdata['format']]
-
-        self._sort_formats(formats)
-
-        return {
-            '_type': 'video',
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-            'description': description,
-            'uploader': uploader,
-            'upload_date': upload_date,
-            'thumbnail': data.get('misc', {}).get('image'),
-        }
+        metadata = self._download_json(
+            'http://archive.org/details/' + video_id, video_id, query={
+                'output': 'json',
+            })['metadata']
+        info.update({
+            'title': get_optional(metadata, 'title') or info.get('title'),
+            'description': clean_html(get_optional(metadata, 'description')),
+        })
+        if info.get('_type') != 'playlist':
+            info.update({
+                'uploader': get_optional(metadata, 'creator'),
+                'upload_date': unified_strdate(get_optional(metadata, 'date')),
+            })
+        return info
diff --git a/youtube_dl/extractor/ard.py b/youtube_dl/extractor/ard.py

index 9fb84911a0b81fd42de2c9bd410cdaf2dd4813a6..35f3656f11d7579a1f67cd0ac6e9c06a37c44917 100644 (file)
--- a/youtube_dl/extractor/ard.py
+++ b/youtube_dl/extractor/ard.py
@@ -8,19 +8,19 @@ from .generic import GenericIE
  from ..utils import (
      determine_ext,
      ExtractorError,
-    get_element_by_attribute,
      qualities,
      int_or_none,
      parse_duration,
      unified_strdate,
      xpath_text,
+    update_url_query,
  )
  from ..compat import compat_etree_fromstring
  
  
  class ARDMediathekIE(InfoExtractor):
      IE_NAME = 'ARD:mediathek'
-    _VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
+    _VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
  
      _TESTS = [{
          'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
@@ -35,6 +35,7 @@ class ARDMediathekIE(InfoExtractor):
              # m3u8 download
              'skip_download': True,
          },
+        'skip': 'HTTP Error 404: Not Found',
      }, {
          'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
          'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
@@ -45,6 +46,7 @@ class ARDMediathekIE(InfoExtractor):
              'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
              'duration': 5252,
          },
+        'skip': 'HTTP Error 404: Not Found',
      }, {
          # audio
          'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
@@ -56,9 +58,22 @@ class ARDMediathekIE(InfoExtractor):
              'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
              'duration': 3240,
          },
+        'skip': 'HTTP Error 404: Not Found',
      }, {
          'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
          'only_matching': True,
+    }, {
+        # audio
+        'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
+        'md5': '4e8f00631aac0395fee17368ac0e9867',
+        'info_dict': {
+            'id': '30796318',
+            'ext': 'mp3',
+            'title': 'Vor dem Fest',
+            'description': 'md5:c0c1c8048514deaed2a73b3a60eecacb',
+            'duration': 3287,
+        },
+        'skip': 'Video is no longer available',
      }]
  
      def _extract_media_info(self, media_info_url, webpage, video_id):
@@ -83,7 +98,7 @@ class ARDMediathekIE(InfoExtractor):
          subtitle_url = media_info.get('_subtitleUrl')
          if subtitle_url:
              subtitles['de'] = [{
-                'ext': 'srt',
+                'ext': 'ttml',
                  'url': subtitle_url,
              }]
  
@@ -114,11 +129,14 @@ class ARDMediathekIE(InfoExtractor):
                          continue
                      if ext == 'f4m':
                          formats.extend(self._extract_f4m_formats(
-                            stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
-                            video_id, preference=-1, f4m_id='hds', fatal=False))
+                            update_url_query(stream_url, {
+                                'hdcore': '3.1.1',
+                                'plugin': 'aasp-3.1.1.69.124'
+                            }),
+                            video_id, f4m_id='hds', fatal=False))
                      elif ext == 'm3u8':
                          formats.extend(self._extract_m3u8_formats(
-                            stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False))
+                            stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
                      else:
                          if server and server.startswith('rtmp'):
                              f = {
@@ -156,11 +174,15 @@ class ARDMediathekIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        if '>Der gewünschte Beitrag ist nicht mehr verfügbar.<' in webpage:
-            raise ExtractorError('Video %s is no longer available' % video_id, expected=True)
+        ERRORS = (
+            ('>Leider liegt eine Störung vor.', 'Video %s is unavailable'),
+            ('>Der gewünschte Beitrag ist nicht mehr verfügbar.<',
+             'Video %s is no longer available'),
+        )
  
-        if 'Diese Sendung ist für Jugendliche unter 12 Jahren nicht geeignet. Der Clip ist deshalb nur von 20 bis 6 Uhr verfügbar.' in webpage:
-            raise ExtractorError('This program is only suitable for those aged 12 and older. Video %s is therefore only available between 20 pm and 6 am.' % video_id, expected=True)
+        for pattern, message in ERRORS:
+            if pattern in webpage:
+                raise ExtractorError(message % video_id, expected=True)
  
          if re.search(r'[\?&]rss($|[=&])', url):
              doc = compat_etree_fromstring(webpage.encode('utf-8'))
@@ -220,7 +242,7 @@ class ARDMediathekIE(InfoExtractor):
  
  
  class ARDIE(InfoExtractor):
-    _VALID_URL = '(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
+    _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
      _TEST = {
          'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
          'md5': 'd216c3a86493f9322545e045ddc3eb35',
@@ -232,7 +254,8 @@ class ARDIE(InfoExtractor):
              'title': 'Die Story im Ersten: Mission unter falscher Flagge',
              'upload_date': '20140804',
              'thumbnail': 're:^https?://.*\.jpg$',
-        }
+        },
+        'skip': 'HTTP Error 404: Not Found',
      }
  
      def _real_extract(self, url):
@@ -274,41 +297,3 @@ class ARDIE(InfoExtractor):
              'upload_date': upload_date,
              'thumbnail': thumbnail,
          }
-
-
-class SportschauIE(ARDMediathekIE):
-    IE_NAME = 'Sportschau'
-    _VALID_URL = r'(?P<baseurl>https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video(?P<id>[^/#?]+))\.html'
-    _TESTS = [{
-        'url': 'http://www.sportschau.de/tourdefrance/videoseppeltkokainhatnichtsmitklassischemdopingzutun100.html',
-        'info_dict': {
-            'id': 'seppeltkokainhatnichtsmitklassischemdopingzutun100',
-            'ext': 'mp4',
-            'title': 'Seppelt: "Kokain hat nichts mit klassischem Doping zu tun"',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'description': 'Der ARD-Doping Experte Hajo Seppelt gibt seine Einschätzung zum ersten Dopingfall der diesjährigen Tour de France um den Italiener Luca Paolini ab.',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        },
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        base_url = mobj.group('baseurl')
-
-        webpage = self._download_webpage(url, video_id)
-        title = get_element_by_attribute('class', 'headline', webpage)
-        description = self._html_search_meta('description', webpage, 'description')
-
-        info = self._extract_media_info(
-            base_url + '-mc_defaultQuality-h.json', webpage, video_id)
-
-        info.update({
-            'title': title,
-            'description': description,
-        })
-
-        return info
diff --git a/youtube_dl/extractor/arkena.py b/youtube_dl/extractor/arkena.py

new file mode 100644 (file)

index 0000000..d45cae3
--- /dev/null
+++ b/youtube_dl/extractor/arkena.py
@@ -0,0 +1,115 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    float_or_none,
+    int_or_none,
+    mimetype2ext,
+    parse_iso8601,
+    strip_jsonp,
+)
+
+
+class ArkenaIE(InfoExtractor):
+    _VALID_URL = r'https?://play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)'
+    _TESTS = [{
+        'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411',
+        'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
+        'info_dict': {
+            'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe',
+            'ext': 'mp4',
+            'title': 'Big Buck Bunny',
+            'description': 'Royalty free test video',
+            'timestamp': 1432816365,
+            'upload_date': '20150528',
+            'is_live': False,
+        },
+    }, {
+        'url': 'https://play.arkena.com/config/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411/?callbackMethod=jQuery1111023664739129262213_1469227693893',
+        'only_matching': True,
+    }, {
+        'url': 'http://play.arkena.com/config/avp/v1/player/media/327336/darkmatter/131064/?callbackMethod=jQuery1111002221189684892677_1469227595972',
+        'only_matching': True,
+    }, {
+        'url': 'http://play.arkena.com/embed/avp/v1/player/media/327336/darkmatter/131064/',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_url(webpage):
+        # See https://support.arkena.com/display/PLAY/Ways+to+embed+your+video
+        mobj = re.search(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//play\.arkena\.com/embed/avp/.+?)\1',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        account_id = mobj.group('account_id')
+
+        playlist = self._download_json(
+            'https://play.arkena.com/config/avp/v2/player/media/%s/0/%s/?callbackMethod=_'
+            % (video_id, account_id),
+            video_id, transform_source=strip_jsonp)['Playlist'][0]
+
+        media_info = playlist['MediaInfo']
+        title = media_info['Title']
+        media_files = playlist['MediaFiles']
+
+        is_live = False
+        formats = []
+        for kind_case, kind_formats in media_files.items():
+            kind = kind_case.lower()
+            for f in kind_formats:
+                f_url = f.get('Url')
+                if not f_url:
+                    continue
+                is_live = f.get('Live') == 'true'
+                exts = (mimetype2ext(f.get('Type')), determine_ext(f_url, None))
+                if kind == 'm3u8' or 'm3u8' in exts:
+                    formats.extend(self._extract_m3u8_formats(
+                        f_url, video_id, 'mp4',
+                        entry_protocol='m3u8' if is_live else 'm3u8_native',
+                        m3u8_id=kind, fatal=False, live=is_live))
+                elif kind == 'flash' or 'f4m' in exts:
+                    formats.extend(self._extract_f4m_formats(
+                        f_url, video_id, f4m_id=kind, fatal=False))
+                elif kind == 'dash' or 'mpd' in exts:
+                    formats.extend(self._extract_mpd_formats(
+                        f_url, video_id, mpd_id=kind, fatal=False))
+                elif kind == 'silverlight':
+                    # TODO: process when ism is supported (see
+                    # https://github.com/rg3/youtube-dl/issues/8118)
+                    continue
+                else:
+                    tbr = float_or_none(f.get('Bitrate'), 1000)
+                    formats.append({
+                        'url': f_url,
+                        'format_id': '%s-%d' % (kind, tbr) if tbr else kind,
+                        'tbr': tbr,
+                    })
+        self._sort_formats(formats)
+
+        description = media_info.get('Description')
+        video_id = media_info.get('VideoId') or video_id
+        timestamp = parse_iso8601(media_info.get('PublishDate'))
+        thumbnails = [{
+            'url': thumbnail['Url'],
+            'width': int_or_none(thumbnail.get('Size')),
+        } for thumbnail in (media_info.get('Poster') or []) if thumbnail.get('Url')]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'timestamp': timestamp,
+            'is_live': is_live,
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/arte.py b/youtube_dl/extractor/arte.py

index ae0f27dcbe059c0d469eaeca243ef59400ff68d6..69a23e88c5b08738a3ce66cce47215fe58e4fcc0 100644 (file)
--- a/youtube_dl/extractor/arte.py
+++ b/youtube_dl/extractor/arte.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -61,10 +61,7 @@ class ArteTvIE(InfoExtractor):
          }
  
  
-class ArteTVPlus7IE(InfoExtractor):
-    IE_NAME = 'arte.tv:+7'
-    _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&+])'
-
+class ArteTVBaseIE(InfoExtractor):
      @classmethod
      def _extract_url_info(cls, url):
          mobj = re.match(cls._VALID_URL, url)
@@ -78,60 +75,6 @@ class ArteTVPlus7IE(InfoExtractor):
              video_id = mobj.group('id')
          return video_id, lang
  
-    def _real_extract(self, url):
-        video_id, lang = self._extract_url_info(url)
-        webpage = self._download_webpage(url, video_id)
-        return self._extract_from_webpage(webpage, video_id, lang)
-
-    def _extract_from_webpage(self, webpage, video_id, lang):
-        patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
-        ids = (video_id, '')
-        # some pages contain multiple videos (like
-        # http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
-        # so we first try to look for json URLs that contain the video id from
-        # the 'vid' parameter.
-        patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
-        json_url = self._html_search_regex(
-            patterns, webpage, 'json vp url', default=None)
-        if not json_url:
-            def find_iframe_url(webpage, default=NO_DEFAULT):
-                return self._html_search_regex(
-                    r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
-                    webpage, 'iframe url', group='url', default=default)
-
-            iframe_url = find_iframe_url(webpage, None)
-            if not iframe_url:
-                embed_url = self._html_search_regex(
-                    r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
-                if embed_url:
-                    player = self._download_json(
-                        embed_url, video_id, 'Downloading player page')
-                    iframe_url = find_iframe_url(player['html'])
-            # en and es URLs produce react-based pages with different layout (e.g.
-            # http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
-            if not iframe_url:
-                program = self._search_regex(
-                    r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
-                    webpage, 'program', default=None)
-                if program:
-                    embed_html = self._parse_json(program, video_id)
-                    if embed_html:
-                        iframe_url = find_iframe_url(embed_html['embed_html'])
-            if iframe_url:
-                json_url = compat_parse_qs(
-                    compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
-        if json_url:
-            title = self._search_regex(
-                r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
-                webpage, 'title', default=None, group='title')
-            return self._extract_from_json_url(json_url, video_id, lang, title=title)
-        # Different kind of embed URL (e.g.
-        # http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
-        embed_url = self._search_regex(
-            r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
-            webpage, 'embed url', group='url')
-        return self.url_result(embed_url)
-
      def _extract_from_json_url(self, json_url, video_id, lang, title=None):
          info = self._download_json(json_url, video_id)
          player_info = info['videoJsonPlayer']
@@ -161,24 +104,53 @@ class ArteTVPlus7IE(InfoExtractor):
              'es': 'E[ESP]',
          }
  
+        langcode = LANGS.get(lang, lang)
+
          formats = []
          for format_id, format_dict in player_info['VSR'].items():
              f = dict(format_dict)
              versionCode = f.get('versionCode')
-            langcode = LANGS.get(lang, lang)
-            lang_rexs = [r'VO?%s-' % re.escape(langcode), r'VO?.-ST%s$' % re.escape(langcode)]
-            lang_pref = None
-            if versionCode:
-                matched_lang_rexs = [r for r in lang_rexs if re.match(r, versionCode)]
-                lang_pref = -10 if not matched_lang_rexs else 10 * len(matched_lang_rexs)
-            source_pref = 0
-            if versionCode is not None:
-                # The original version with subtitles has lower relevance
-                if re.match(r'VO-ST(F|A|E)', versionCode):
-                    source_pref -= 10
-                # The version with sourds/mal subtitles has also lower relevance
-                elif re.match(r'VO?(F|A|E)-STM\1', versionCode):
-                    source_pref -= 9
+            l = re.escape(langcode)
+
+            # Language preference from most to least priority
+            # Reference: section 5.6.3 of
+            # http://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-05.pdf
+            PREFERENCES = (
+                # original version in requested language, without subtitles
+                r'VO{0}$'.format(l),
+                # original version in requested language, with partial subtitles in requested language
+                r'VO{0}-ST{0}$'.format(l),
+                # original version in requested language, with subtitles for the deaf and hard-of-hearing in requested language
+                r'VO{0}-STM{0}$'.format(l),
+                # non-original (dubbed) version in requested language, without subtitles
+                r'V{0}$'.format(l),
+                # non-original (dubbed) version in requested language, with subtitles partial subtitles in requested language
+                r'V{0}-ST{0}$'.format(l),
+                # non-original (dubbed) version in requested language, with subtitles for the deaf and hard-of-hearing in requested language
+                r'V{0}-STM{0}$'.format(l),
+                # original version in requested language, with partial subtitles in different language
+                r'VO{0}-ST(?!{0}).+?$'.format(l),
+                # original version in requested language, with subtitles for the deaf and hard-of-hearing in different language
+                r'VO{0}-STM(?!{0}).+?$'.format(l),
+                # original version in different language, with partial subtitles in requested language
+                r'VO(?:(?!{0}).+?)?-ST{0}$'.format(l),
+                # original version in different language, with subtitles for the deaf and hard-of-hearing in requested language
+                r'VO(?:(?!{0}).+?)?-STM{0}$'.format(l),
+                # original version in different language, without subtitles
+                r'VO(?:(?!{0}))?$'.format(l),
+                # original version in different language, with partial subtitles in different language
+                r'VO(?:(?!{0}).+?)?-ST(?!{0}).+?$'.format(l),
+                # original version in different language, with subtitles for the deaf and hard-of-hearing in different language
+                r'VO(?:(?!{0}).+?)?-STM(?!{0}).+?$'.format(l),
+            )
+
+            for pref, p in enumerate(PREFERENCES):
+                if re.match(p, versionCode):
+                    lang_pref = len(PREFERENCES) - pref
+                    break
+            else:
+                lang_pref = -1
+
              format = {
                  'format_id': format_id,
                  'preference': -10 if f.get('videoFormat') == 'M3U8' else None,
@@ -188,7 +160,6 @@ class ArteTVPlus7IE(InfoExtractor):
                  'height': int_or_none(f.get('height')),
                  'tbr': int_or_none(f.get('bitrate')),
                  'quality': qfunc(f.get('quality')),
-                'source_preference': source_pref,
              }
  
              if f.get('mediaType') == 'rtmp':
@@ -207,28 +178,112 @@ class ArteTVPlus7IE(InfoExtractor):
          return info_dict
  
  
+class ArteTVPlus7IE(ArteTVBaseIE):
+    IE_NAME = 'arte.tv:+7'
+    _VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/[^/]+/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
+        'only_matching': True,
+    }, {
+        'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        video_id, lang = self._extract_url_info(url)
+        webpage = self._download_webpage(url, video_id)
+        return self._extract_from_webpage(webpage, video_id, lang)
+
+    def _extract_from_webpage(self, webpage, video_id, lang):
+        patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
+        ids = (video_id, '')
+        # some pages contain multiple videos (like
+        # http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
+        # so we first try to look for json URLs that contain the video id from
+        # the 'vid' parameter.
+        patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
+        json_url = self._html_search_regex(
+            patterns, webpage, 'json vp url', default=None)
+        if not json_url:
+            def find_iframe_url(webpage, default=NO_DEFAULT):
+                return self._html_search_regex(
+                    r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
+                    webpage, 'iframe url', group='url', default=default)
+
+            iframe_url = find_iframe_url(webpage, None)
+            if not iframe_url:
+                embed_url = self._html_search_regex(
+                    r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
+                if embed_url:
+                    player = self._download_json(
+                        embed_url, video_id, 'Downloading player page')
+                    iframe_url = find_iframe_url(player['html'])
+            # en and es URLs produce react-based pages with different layout (e.g.
+            # http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
+            if not iframe_url:
+                program = self._search_regex(
+                    r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
+                    webpage, 'program', default=None)
+                if program:
+                    embed_html = self._parse_json(program, video_id)
+                    if embed_html:
+                        iframe_url = find_iframe_url(embed_html['embed_html'])
+            if iframe_url:
+                json_url = compat_parse_qs(
+                    compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
+        if json_url:
+            title = self._search_regex(
+                r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
+                webpage, 'title', default=None, group='title')
+            return self._extract_from_json_url(json_url, video_id, lang, title=title)
+        # Different kind of embed URL (e.g.
+        # http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
+        entries = [
+            self.url_result(url)
+            for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
+        return self.playlist_result(entries)
+
+
  # It also uses the arte_vp_url url from the webpage to extract the information
  class ArteTVCreativeIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:creative'
-    _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:magazine?/)?(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
  
      _TESTS = [{
-        'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
+        'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
          'info_dict': {
-            'id': '72176',
+            'id': '057405-001-A',
              'ext': 'mp4',
-            'title': 'Folge 2 - Corporate Design',
-            'upload_date': '20131004',
+            'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
+            'upload_date': '20150716',
          },
      }, {
          'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
+        'playlist_count': 11,
+        'add_ie': ['Youtube'],
+    }, {
+        'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
+        'only_matching': True,
+    }]
+
+
+class ArteTVInfoIE(ArteTVPlus7IE):
+    IE_NAME = 'arte.tv:info'
+    _VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
          'info_dict': {
-            'id': '160676',
+            'id': '067528-000-A',
              'ext': 'mp4',
-            'title': 'Monty Python live (mostly)',
-            'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
-            'upload_date': '20140805',
-        }
+            'title': 'Service civique, un cache misère ?',
+            'upload_date': '20160403',
+        },
      }]
  
  
@@ -254,6 +309,8 @@ class ArteTVDDCIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:ddc'
      _VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
  
+    _TESTS = []
+
      def _real_extract(self, url):
          video_id, lang = self._extract_url_info(url)
          if lang == 'folge':
@@ -272,7 +329,7 @@ class ArteTVConcertIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:concert'
      _VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
          'md5': '9ea035b7bd69696b67aa2ccaaa218161',
          'info_dict': {
@@ -282,24 +339,23 @@ class ArteTVConcertIE(ArteTVPlus7IE):
              'upload_date': '20140128',
              'description': 'md5:486eb08f991552ade77439fe6d82c305',
          },
-    }
+    }]
  
  
  class ArteTVCinemaIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:cinema'
      _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
  
-    _TEST = {
-        'url': 'http://cinema.arte.tv/de/node/38291',
-        'md5': '6b275511a5107c60bacbeeda368c3aa1',
+    _TESTS = [{
+        'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
+        'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
          'info_dict': {
-            'id': '055876-000_PWA12025-D',
+            'id': '062494-000-A',
              'ext': 'mp4',
-            'title': 'Tod auf dem Nil',
-            'upload_date': '20160122',
-            'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
+            'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
+            'upload_date': '20150807',
          },
-    }
+    }]
  
  
  class ArteTVMagazineIE(ArteTVPlus7IE):
@@ -337,16 +393,65 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:embed'
      _VALID_URL = r'''(?x)
          http://www\.arte\.tv
-        /playerv2/embed\.php\?json_url=
+        /(?:playerv2/embed|arte_vp/index)\.php\?json_url=
          (?P<json_url>
              http://arte\.tv/papi/tvguide/videos/stream/player/
              (?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*
          )
      '''
  
+    _TESTS = []
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
          lang = mobj.group('lang')
          json_url = mobj.group('json_url')
          return self._extract_from_json_url(json_url, video_id, lang)
+
+
+class TheOperaPlatformIE(ArteTVPlus7IE):
+    IE_NAME = 'theoperaplatform'
+    _VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
+        'md5': '970655901fa2e82e04c00b955e9afe7b',
+        'info_dict': {
+            'id': '060338-009-A',
+            'ext': 'mp4',
+            'title': 'Verdi - OTELLO',
+            'upload_date': '20160927',
+        },
+    }]
+
+
+class ArteTVPlaylistIE(ArteTVBaseIE):
+    IE_NAME = 'arte.tv:playlist'
+    _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
+
+    _TESTS = [{
+        'url': 'http://www.arte.tv/guide/de/plus7/?country=DE#collection/PL-013263/ARTETV',
+        'info_dict': {
+            'id': 'PL-013263',
+            'title': 'Areva & Uramin',
+            'description': 'md5:a1dc0312ce357c262259139cfd48c9bf',
+        },
+        'playlist_mincount': 6,
+    }, {
+        'url': 'http://www.arte.tv/guide/de/playlists?country=DE#collection/PL-013190/ARTETV',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        playlist_id, lang = self._extract_url_info(url)
+        collection = self._download_json(
+            'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
+            % (lang, playlist_id), playlist_id)
+        title = collection.get('title')
+        description = collection.get('shortDescription') or collection.get('teaserText')
+        entries = [
+            self._extract_from_json_url(
+                video['jsonUrl'], video.get('programId') or playlist_id, lang)
+            for video in collection['videos'] if video.get('jsonUrl')]
+        return self.playlist_result(entries, playlist_id, title, description)
diff --git a/youtube_dl/extractor/audioboom.py b/youtube_dl/extractor/audioboom.py

index 2ec2d7092aca0468d122d9ab658f1ef848da7f90..d7d1c6306443b77dd7161b3c07480ad16c14ffa5 100644 (file)
--- a/youtube_dl/extractor/audioboom.py
+++ b/youtube_dl/extractor/audioboom.py
@@ -6,8 +6,8 @@ from ..utils import float_or_none
  
  
  class AudioBoomIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?audioboom\.com/boos/(?P<id>[0-9]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?audioboom\.com/(?:boos|posts)/(?P<id>[0-9]+)'
+    _TESTS = [{
          'url': 'https://audioboom.com/boos/4279833-3-09-2016-czaban-hour-3?t=0',
          'md5': '63a8d73a055c6ed0f1e51921a10a5a76',
          'info_dict': {
@@ -19,7 +19,10 @@ class AudioBoomIE(InfoExtractor):
              'uploader': 'Steve Czaban',
              'uploader_url': 're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
          }
-    }
+    }, {
+        'url': 'https://audioboom.com/posts/4279833-3-09-2016-czaban-hour-3?t=0',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/audiomack.py b/youtube_dl/extractor/audiomack.py

index 3eed91279fd7b6ead45bf3ee01486eab53b91d6a..f3bd4d4447f559a8bd924f7d796a1a9faf24b9d3 100644 (file)
--- a/youtube_dl/extractor/audiomack.py
+++ b/youtube_dl/extractor/audiomack.py
@@ -6,6 +6,7 @@ import time
  
  from .common import InfoExtractor
  from .soundcloud import SoundcloudIE
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
      url_basename,
@@ -30,14 +31,14 @@ class AudiomackIE(InfoExtractor):
          # audiomack wrapper around soundcloud song
          {
              'add_ie': ['Soundcloud'],
-            'url': 'http://www.audiomack.com/song/xclusiveszone/take-kare',
+            'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle',
              'info_dict': {
-                'id': '172419696',
+                'id': '258901379',
                  'ext': 'mp3',
-                'description': 'md5:1fc3272ed7a635cce5be1568c2822997',
-                'title': 'Young Thug ft Lil Wayne - Take Kare',
-                'uploader': 'Young Thug World',
-                'upload_date': '20141016',
+                'description': 'mamba day freestyle for the legend Kobe Bryant ',
+                'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
+                'uploader': 'ILOVEMAKONNEN',
+                'upload_date': '20160414',
              }
          },
      ]
@@ -136,7 +137,7 @@ class AudiomackAlbumIE(InfoExtractor):
                          result[resultkey] = api_response[apikey]
                  song_id = url_basename(api_response['url']).rpartition('.')[0]
                  result['entries'].append({
-                    'id': api_response.get('id', song_id),
+                    'id': compat_str(api_response.get('id', song_id)),
                      'uploader': api_response.get('artist'),
                      'title': api_response.get('title', song_id),
                      'url': api_response['url'],
diff --git a/youtube_dl/extractor/awaan.py b/youtube_dl/extractor/awaan.py

new file mode 100644 (file)

index 0000000..a2603bb
--- /dev/null
+++ b/youtube_dl/extractor/awaan.py
@@ -0,0 +1,185 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+import base64
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlencode,
+    compat_str,
+)
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+    smuggle_url,
+    unsmuggle_url,
+    urlencode_postdata,
+)
+
+
+class AWAANIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
+
+    def _real_extract(self, url):
+        show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
+        if video_id and int(video_id) > 0:
+            return self.url_result(
+                'http://awaan.ae/media/%s' % video_id, 'AWAANVideo')
+        elif season_id and int(season_id) > 0:
+            return self.url_result(smuggle_url(
+                'http://awaan.ae/program/season/%s' % season_id,
+                {'show_id': show_id}), 'AWAANSeason')
+        else:
+            return self.url_result(
+                'http://awaan.ae/program/%s' % show_id, 'AWAANSeason')
+
+
+class AWAANBaseIE(InfoExtractor):
+    def _parse_video_data(self, video_data, video_id, is_live):
+        title = video_data.get('title_en') or video_data['title_ar']
+        img = video_data.get('img')
+
+        return {
+            'id': video_id,
+            'title': self._live_title(title) if is_live else title,
+            'description': video_data.get('description_en') or video_data.get('description_ar'),
+            'thumbnail': 'http://admin.mangomolo.com/analytics/%s' % img if img else None,
+            'duration': int_or_none(video_data.get('duration')),
+            'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
+            'is_live': is_live,
+        }
+
+
+class AWAANVideoIE(AWAANBaseIE):
+    IE_NAME = 'awaan:video'
+    _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?(?:video(?:/[^/]+)?|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
+        'md5': '5f61c33bfc7794315c671a62d43116aa',
+        'info_dict':
+        {
+            'id': '17375',
+            'ext': 'mp4',
+            'title': 'رحلة العمر : الحلقة 1',
+            'description': 'md5:0156e935d870acb8ef0a66d24070c6d6',
+            'duration': 2041,
+            'timestamp': 1227504126,
+            'upload_date': '20081124',
+            'uploader_id': '71',
+        },
+    }, {
+        'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video_data = self._download_json(
+            'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
+            video_id, headers={'Origin': 'http://awaan.ae'})
+        info = self._parse_video_data(video_data, video_id, False)
+
+        embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + compat_urllib_parse_urlencode({
+            'id': video_data['id'],
+            'user_id': video_data['user_id'],
+            'signature': video_data['signature'],
+            'countries': 'Q0M=',
+            'filter': 'DENY',
+        })
+        info.update({
+            '_type': 'url_transparent',
+            'url': embed_url,
+            'ie_key': 'MangomoloVideo',
+        })
+        return info
+
+
+class AWAANLiveIE(AWAANBaseIE):
+    IE_NAME = 'awaan:live'
+    _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?live/(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://awaan.ae/live/6/dubai-tv',
+        'info_dict': {
+            'id': '6',
+            'ext': 'mp4',
+            'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+            'upload_date': '20150107',
+            'timestamp': 1420588800,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        channel_id = self._match_id(url)
+
+        channel_data = self._download_json(
+            'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id,
+            channel_id, headers={'Origin': 'http://awaan.ae'})
+        info = self._parse_video_data(channel_data, channel_id, True)
+
+        embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + compat_urllib_parse_urlencode({
+            'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
+            'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
+            'signature': channel_data['signature'],
+            'countries': 'Q0M=',
+            'filter': 'DENY',
+        })
+        info.update({
+            '_type': 'url_transparent',
+            'url': embed_url,
+            'ie_key': 'MangomoloLive',
+        })
+        return info
+
+
+class AWAANSeasonIE(InfoExtractor):
+    IE_NAME = 'awaan:season'
+    _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
+    _TEST = {
+        'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
+        'info_dict':
+        {
+            'id': '7910',
+            'title': 'محاضرات الشيخ الشعراوي',
+        },
+        'playlist_mincount': 27,
+    }
+
+    def _real_extract(self, url):
+        url, smuggled_data = unsmuggle_url(url, {})
+        show_id, season_id = re.match(self._VALID_URL, url).groups()
+
+        data = {}
+        if season_id:
+            data['season'] = season_id
+            show_id = smuggled_data.get('show_id')
+            if show_id is None:
+                season = self._download_json(
+                    'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id,
+                    season_id, headers={'Origin': 'http://awaan.ae'})
+                show_id = season['id']
+        data['show_id'] = show_id
+        show = self._download_json(
+            'http://admin.mangomolo.com/analytics/index.php/plus/show',
+            show_id, data=urlencode_postdata(data), headers={
+                'Origin': 'http://awaan.ae',
+                'Content-Type': 'application/x-www-form-urlencoded'
+            })
+        if not season_id:
+            season_id = show['default_season']
+        for season in show['seasons']:
+            if season['id'] == season_id:
+                title = season.get('title_en') or season['title_ar']
+
+                entries = []
+                for video in show['videos']:
+                    video_id = compat_str(video['id'])
+                    entries.append(self.url_result(
+                        'http://awaan.ae/media/%s' % video_id, 'AWAANVideo', video_id))
+
+                return self.playlist_result(entries, season_id, title)
diff --git a/youtube_dl/extractor/azubu.py b/youtube_dl/extractor/azubu.py

index efa624de1cbfddb741a7f8114059165d4e099095..72e1bd59d28fcd4bceaa6c1453fe80d65e9ccc96 100644 (file)
--- a/youtube_dl/extractor/azubu.py
+++ b/youtube_dl/extractor/azubu.py
@@ -46,6 +46,7 @@ class AzubuIE(InfoExtractor):
                  'uploader_id': 272749,
                  'view_count': int,
              },
+            'skip': 'Channel offline',
          },
      ]
  
@@ -56,22 +57,26 @@ class AzubuIE(InfoExtractor):
              'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data']
  
          title = data['title'].strip()
-        description = data['description']
-        thumbnail = data['thumbnail']
-        view_count = data['view_count']
-        uploader = data['user']['username']
-        uploader_id = data['user']['id']
+        description = data.get('description')
+        thumbnail = data.get('thumbnail')
+        view_count = data.get('view_count')
+        user = data.get('user', {})
+        uploader = user.get('username')
+        uploader_id = user.get('id')
  
          stream_params = json.loads(data['stream_params'])
  
-        timestamp = float_or_none(stream_params['creationDate'], 1000)
-        duration = float_or_none(stream_params['length'], 1000)
+        timestamp = float_or_none(stream_params.get('creationDate'), 1000)
+        duration = float_or_none(stream_params.get('length'), 1000)
  
          renditions = stream_params.get('renditions') or []
          video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength')
          if video:
              renditions.append(video)
  
+        if not renditions and not user.get('channel', {}).get('is_live', True):
+            raise ExtractorError('%s said: channel is offline.' % self.IE_NAME, expected=True)
+
          formats = [{
              'url': fmt['url'],
              'width': fmt['frameWidth'],
@@ -98,7 +103,7 @@ class AzubuIE(InfoExtractor):
  
  
  class AzubuLiveIE(InfoExtractor):
-    _VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$'
+    _VALID_URL = r'https?://(?:www\.)?azubu\.tv/(?P<id>[^/]+)$'
  
      _TEST = {
          'url': 'http://www.azubu.tv/MarsTVMDLen',
diff --git a/youtube_dl/extractor/bandcamp.py b/youtube_dl/extractor/bandcamp.py

index c1ef8051d3074a6551941bf140f88eee4ed8a124..88c590e98388d5f6058dd71ffb97f4f0254f0c5b 100644 (file)
--- a/youtube_dl/extractor/bandcamp.py
+++ b/youtube_dl/extractor/bandcamp.py
@@ -1,7 +1,9 @@
  from __future__ import unicode_literals
  
  import json
+import random
  import re
+import time
  
  from .common import InfoExtractor
  from ..compat import (
@@ -12,6 +14,9 @@ from ..utils import (
      ExtractorError,
      float_or_none,
      int_or_none,
+    parse_filesize,
+    unescapeHTML,
+    update_url_query,
  )
  
  
@@ -29,7 +34,7 @@ class BandcampIE(InfoExtractor):
          '_skip': 'There is a limit of 200 free downloads / month for the test song'
      }, {
          'url': 'http://benprunty.bandcamp.com/track/lanius-battle',
-        'md5': '2b68e5851514c20efdff2afc5603b8b4',
+        'md5': '73d0b3171568232574e45652f8720b5c',
          'info_dict': {
              'id': '2650410135',
              'ext': 'mp3',
@@ -48,6 +53,10 @@ class BandcampIE(InfoExtractor):
              if m_trackinfo:
                  json_code = m_trackinfo.group(1)
                  data = json.loads(json_code)[0]
+                track_id = compat_str(data['id'])
+
+                if not data.get('file'):
+                    raise ExtractorError('Not streamable', video_id=track_id, expected=True)
  
                  formats = []
                  for format_id, format_url in data['file'].items():
@@ -64,7 +73,7 @@ class BandcampIE(InfoExtractor):
                  self._sort_formats(formats)
  
                  return {
-                    'id': compat_str(data['id']),
+                    'id': track_id,
                      'title': data['title'],
                      'formats': formats,
                      'duration': float_or_none(data.get('duration')),
@@ -77,35 +86,68 @@ class BandcampIE(InfoExtractor):
              r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$',
              webpage, 'video id')
  
-        download_webpage = self._download_webpage(download_link, video_id, 'Downloading free downloads page')
-        # We get the dictionary of the track from some javascript code
-        all_info = self._parse_json(self._search_regex(
-            r'(?sm)items: (.*?),$', download_webpage, 'items'), video_id)
-        info = all_info[0]
-        # We pick mp3-320 for now, until format selection can be easily implemented.
-        mp3_info = info['downloads']['mp3-320']
-        # If we try to use this url it says the link has expired
-        initial_url = mp3_info['url']
-        m_url = re.match(
-            r'(?P<server>http://(.*?)\.bandcamp\.com)/download/track\?enc=mp3-320&fsig=(?P<fsig>.*?)&id=(?P<id>.*?)&ts=(?P<ts>.*)$',
-            initial_url)
-        # We build the url we will use to get the final track url
-        # This url is build in Bandcamp in the script download_bunde_*.js
-        request_url = '%s/statdownload/track?enc=mp3-320&fsig=%s&id=%s&ts=%s&.rand=665028774616&.vrs=1' % (m_url.group('server'), m_url.group('fsig'), video_id, m_url.group('ts'))
-        final_url_webpage = self._download_webpage(request_url, video_id, 'Requesting download url')
-        # If we could correctly generate the .rand field the url would be
-        # in the "download_url" key
-        final_url = self._proto_relative_url(self._search_regex(
-            r'"retry_url":"(.+?)"', final_url_webpage, 'final video URL'), 'http:')
+        download_webpage = self._download_webpage(
+            download_link, video_id, 'Downloading free downloads page')
+
+        blob = self._parse_json(
+            self._search_regex(
+                r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage,
+                'blob', group='blob'),
+            video_id, transform_source=unescapeHTML)
+
+        info = blob['digital_items'][0]
+
+        downloads = info['downloads']
+        track = info['title']
+
+        artist = info.get('artist')
+        title = '%s - %s' % (artist, track) if artist else track
+
+        download_formats = {}
+        for f in blob['download_formats']:
+            name, ext = f.get('name'), f.get('file_extension')
+            if all(isinstance(x, compat_str) for x in (name, ext)):
+                download_formats[name] = ext.strip('.')
+
+        formats = []
+        for format_id, f in downloads.items():
+            format_url = f.get('url')
+            if not format_url:
+                continue
+            # Stat URL generation algorithm is reverse engineered from
+            # download_*_bundle_*.js
+            stat_url = update_url_query(
+                format_url.replace('/download/', '/statdownload/'), {
+                    '.rand': int(time.time() * 1000 * random.random()),
+                })
+            format_id = f.get('encoding_name') or format_id
+            stat = self._download_json(
+                stat_url, video_id, 'Downloading %s JSON' % format_id,
+                transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1],
+                fatal=False)
+            if not stat:
+                continue
+            retry_url = stat.get('retry_url')
+            if not isinstance(retry_url, compat_str):
+                continue
+            formats.append({
+                'url': self._proto_relative_url(retry_url, 'http:'),
+                'ext': download_formats.get(format_id),
+                'format_id': format_id,
+                'format_note': f.get('description'),
+                'filesize': parse_filesize(f.get('size_mb')),
+                'vcodec': 'none',
+            })
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'title': info['title'],
-            'ext': 'mp3',
-            'vcodec': 'none',
-            'url': final_url,
+            'title': title,
              'thumbnail': info.get('thumb_url'),
              'uploader': info.get('artist'),
+            'artist': artist,
+            'track': track,
+            'formats': formats,
          }
  
  
@@ -158,6 +200,15 @@ class BandcampAlbumIE(InfoExtractor):
              'uploader_id': 'dotscale',
          },
          'playlist_mincount': 7,
+    }, {
+        # with escaped quote in title
+        'url': 'https://jstrecords.bandcamp.com/album/entropy-ep',
+        'info_dict': {
+            'title': '"Entropy" EP',
+            'uploader_id': 'jstrecords',
+            'id': 'entropy-ep',
+        },
+        'playlist_mincount': 3,
      }]
  
      def _real_extract(self, url):
@@ -172,8 +223,11 @@ class BandcampAlbumIE(InfoExtractor):
          entries = [
              self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
              for t_path in tracks_paths]
-        title = self._search_regex(
-            r'album_title\s*:\s*"(.*?)"', webpage, 'title', fatal=False)
+        title = self._html_search_regex(
+            r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"',
+            webpage, 'title', fatal=False)
+        if title:
+            title = title.replace(r'\"', '"')
          return {
              '_type': 'playlist',
              'uploader_id': uploader_id,
diff --git a/youtube_dl/extractor/bbc.py b/youtube_dl/extractor/bbc.py

index dedf721bde2724cb7f5a06186584069b3f5221a1..b17916137ec51808e8c0c869142d37bf083c90e0 100644 (file)
--- a/youtube_dl/extractor/bbc.py
+++ b/youtube_dl/extractor/bbc.py
@@ -2,19 +2,23 @@
  from __future__ import unicode_literals
  
  import re
+import itertools
  
  from .common import InfoExtractor
  from ..utils import (
+    dict_get,
      ExtractorError,
      float_or_none,
      int_or_none,
      parse_duration,
      parse_iso8601,
+    try_get,
      unescapeHTML,
  )
  from ..compat import (
      compat_etree_fromstring,
      compat_HTTPError,
+    compat_urlparse,
  )
  
  
@@ -31,7 +35,7 @@ class BBCCoUkIE(InfoExtractor):
                              music/clips[/#]|
                              radio/player/
                          )
-                        (?P<id>%s)
+                        (?P<id>%s)(?!/(?:episodes|broadcasts|clips))
                      ''' % _ID_REGEX
  
      _MEDIASELECTOR_URLS = [
@@ -192,6 +196,7 @@ class BBCCoUkIE(InfoExtractor):
                  # rtmp download
                  'skip_download': True,
              },
+            'skip': 'Now it\'s really geo-restricted',
          }, {
              # compact player (https://github.com/rg3/youtube-dl/issues/8147)
              'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
@@ -228,51 +233,6 @@ class BBCCoUkIE(InfoExtractor):
          asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
          return [ref.get('href') for ref in asx.findall('./Entry/ref')]
  
-    def _extract_connection(self, connection, programme_id):
-        formats = []
-        kind = connection.get('kind')
-        protocol = connection.get('protocol')
-        supplier = connection.get('supplier')
-        if protocol == 'http':
-            href = connection.get('href')
-            transfer_format = connection.get('transferFormat')
-            # ASX playlist
-            if supplier == 'asx':
-                for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
-                    formats.append({
-                        'url': ref,
-                        'format_id': 'ref%s_%s' % (i, supplier),
-                    })
-            # Skip DASH until supported
-            elif transfer_format == 'dash':
-                pass
-            elif transfer_format == 'hls':
-                formats.extend(self._extract_m3u8_formats(
-                    href, programme_id, ext='mp4', entry_protocol='m3u8_native',
-                    m3u8_id=supplier, fatal=False))
-            # Direct link
-            else:
-                formats.append({
-                    'url': href,
-                    'format_id': supplier or kind or protocol,
-                })
-        elif protocol == 'rtmp':
-            application = connection.get('application', 'ondemand')
-            auth_string = connection.get('authString')
-            identifier = connection.get('identifier')
-            server = connection.get('server')
-            formats.append({
-                'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
-                'play_path': identifier,
-                'app': '%s?%s' % (application, auth_string),
-                'page_url': 'http://www.bbc.co.uk',
-                'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
-                'rtmp_live': False,
-                'ext': 'flv',
-                'format_id': supplier,
-            })
-        return formats
-
      def _extract_items(self, playlist):
          return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
  
@@ -293,45 +253,6 @@ class BBCCoUkIE(InfoExtractor):
      def _extract_connections(self, media):
          return self._findall_ns(media, './{%s}connection')
  
-    def _extract_video(self, media, programme_id):
-        formats = []
-        vbr = int_or_none(media.get('bitrate'))
-        vcodec = media.get('encoding')
-        service = media.get('service')
-        width = int_or_none(media.get('width'))
-        height = int_or_none(media.get('height'))
-        file_size = int_or_none(media.get('media_file_size'))
-        for connection in self._extract_connections(media):
-            conn_formats = self._extract_connection(connection, programme_id)
-            for format in conn_formats:
-                format.update({
-                    'width': width,
-                    'height': height,
-                    'vbr': vbr,
-                    'vcodec': vcodec,
-                    'filesize': file_size,
-                })
-                if service:
-                    format['format_id'] = '%s_%s' % (service, format['format_id'])
-            formats.extend(conn_formats)
-        return formats
-
-    def _extract_audio(self, media, programme_id):
-        formats = []
-        abr = int_or_none(media.get('bitrate'))
-        acodec = media.get('encoding')
-        service = media.get('service')
-        for connection in self._extract_connections(media):
-            conn_formats = self._extract_connection(connection, programme_id)
-            for format in conn_formats:
-                format.update({
-                    'format_id': '%s_%s' % (service, format['format_id']),
-                    'abr': abr,
-                    'acodec': acodec,
-                })
-            formats.extend(conn_formats)
-        return formats
-
      def _get_subtitles(self, media, programme_id):
          subtitles = {}
          for connection in self._extract_connections(media):
@@ -377,13 +298,87 @@ class BBCCoUkIE(InfoExtractor):
      def _process_media_selector(self, media_selection, programme_id):
          formats = []
          subtitles = None
+        urls = []
  
          for media in self._extract_medias(media_selection):
              kind = media.get('kind')
-            if kind == 'audio':
-                formats.extend(self._extract_audio(media, programme_id))
-            elif kind == 'video':
-                formats.extend(self._extract_video(media, programme_id))
+            if kind in ('video', 'audio'):
+                bitrate = int_or_none(media.get('bitrate'))
+                encoding = media.get('encoding')
+                service = media.get('service')
+                width = int_or_none(media.get('width'))
+                height = int_or_none(media.get('height'))
+                file_size = int_or_none(media.get('media_file_size'))
+                for connection in self._extract_connections(media):
+                    href = connection.get('href')
+                    if href in urls:
+                        continue
+                    if href:
+                        urls.append(href)
+                    conn_kind = connection.get('kind')
+                    protocol = connection.get('protocol')
+                    supplier = connection.get('supplier')
+                    transfer_format = connection.get('transferFormat')
+                    format_id = supplier or conn_kind or protocol
+                    if service:
+                        format_id = '%s_%s' % (service, format_id)
+                    # ASX playlist
+                    if supplier == 'asx':
+                        for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
+                            formats.append({
+                                'url': ref,
+                                'format_id': 'ref%s_%s' % (i, format_id),
+                            })
+                    elif transfer_format == 'dash':
+                        formats.extend(self._extract_mpd_formats(
+                            href, programme_id, mpd_id=format_id, fatal=False))
+                    elif transfer_format == 'hls':
+                        formats.extend(self._extract_m3u8_formats(
+                            href, programme_id, ext='mp4', entry_protocol='m3u8_native',
+                            m3u8_id=format_id, fatal=False))
+                    elif transfer_format == 'hds':
+                        formats.extend(self._extract_f4m_formats(
+                            href, programme_id, f4m_id=format_id, fatal=False))
+                    else:
+                        if not service and not supplier and bitrate:
+                            format_id += '-%d' % bitrate
+                        fmt = {
+                            'format_id': format_id,
+                            'filesize': file_size,
+                        }
+                        if kind == 'video':
+                            fmt.update({
+                                'width': width,
+                                'height': height,
+                                'vbr': bitrate,
+                                'vcodec': encoding,
+                            })
+                        else:
+                            fmt.update({
+                                'abr': bitrate,
+                                'acodec': encoding,
+                                'vcodec': 'none',
+                            })
+                        if protocol == 'http':
+                            # Direct link
+                            fmt.update({
+                                'url': href,
+                            })
+                        elif protocol == 'rtmp':
+                            application = connection.get('application', 'ondemand')
+                            auth_string = connection.get('authString')
+                            identifier = connection.get('identifier')
+                            server = connection.get('server')
+                            fmt.update({
+                                'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
+                                'play_path': identifier,
+                                'app': '%s?%s' % (application, auth_string),
+                                'page_url': 'http://www.bbc.co.uk',
+                                'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
+                                'rtmp_live': False,
+                                'ext': 'flv',
+                            })
+                        formats.append(fmt)
              elif kind == 'captions':
                  subtitles = self.extract_subtitles(media, programme_id)
          return formats, subtitles
@@ -588,6 +583,7 @@ class BBCIE(BBCCoUkIE):
              'id': '150615_telabyad_kentin_cogu',
              'ext': 'mp4',
              'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
+            'description': 'md5:33a4805a855c9baf7115fcbde57e7025',
              'timestamp': 1434397334,
              'upload_date': '20150615',
          },
@@ -601,6 +597,7 @@ class BBCIE(BBCCoUkIE):
              'id': '150619_video_honduras_militares_hospitales_corrupcion_aw',
              'ext': 'mp4',
              'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción',
+            'description': 'md5:1525f17448c4ee262b64b8f0c9ce66c8',
              'timestamp': 1434713142,
              'upload_date': '20150619',
          },
@@ -650,6 +647,23 @@ class BBCIE(BBCCoUkIE):
              # rtmp download
              'skip_download': True,
          }
+    }, {
+        # single video embedded with Morph
+        'url': 'http://www.bbc.co.uk/sport/live/olympics/36895975',
+        'info_dict': {
+            'id': 'p041vhd0',
+            'ext': 'mp4',
+            'title': "Nigeria v Japan - Men's First Round",
+            'description': 'Live coverage of the first round from Group B at the Amazonia Arena.',
+            'duration': 7980,
+            'uploader': 'BBC Sport',
+            'uploader_id': 'bbc_sport',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'skip': 'Georestricted to UK',
      }, {
          # single video with playlist.sxml URL in playlist param
          'url': 'http://www.bbc.com/sport/0/football/33653409',
@@ -670,6 +684,7 @@ class BBCIE(BBCCoUkIE):
          'info_dict': {
              'id': '34475836',
              'title': 'Jurgen Klopp: Furious football from a witty and winning coach',
+            'description': 'Fast-paced football, wit, wisdom and a ready smile - why Liverpool fans should come to love new boss Jurgen Klopp.',
          },
          'playlist_count': 3,
      }, {
@@ -696,7 +711,9 @@ class BBCIE(BBCCoUkIE):
  
      @classmethod
      def suitable(cls, url):
-        return False if BBCCoUkIE.suitable(url) or BBCCoUkArticleIE.suitable(url) else super(BBCIE, cls).suitable(url)
+        EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
+        return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
+                else super(BBCIE, cls).suitable(url))
  
      def _extract_from_media_meta(self, media_meta, video_id):
          # Direct links to media in media metadata (e.g.
@@ -744,7 +761,7 @@ class BBCIE(BBCCoUkIE):
  
          webpage = self._download_webpage(url, playlist_id)
  
-        json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
+        json_ld_info = self._search_json_ld(webpage, playlist_id, default={})
          timestamp = json_ld_info.get('timestamp')
  
          playlist_title = json_ld_info.get('title')
@@ -813,8 +830,29 @@ class BBCIE(BBCCoUkIE):
                          # http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
                          playlist = data_playable.get('otherSettings', {}).get('playlist', {})
                          if playlist:
-                            entries.append(self._extract_from_playlist_sxml(
-                                playlist.get('progressiveDownloadUrl'), playlist_id, timestamp))
+                            entry = None
+                            for key in ('streaming', 'progressiveDownload'):
+                                playlist_url = playlist.get('%sUrl' % key)
+                                if not playlist_url:
+                                    continue
+                                try:
+                                    info = self._extract_from_playlist_sxml(
+                                        playlist_url, playlist_id, timestamp)
+                                    if not entry:
+                                        entry = info
+                                    else:
+                                        entry['title'] = info['title']
+                                        entry['formats'].extend(info['formats'])
+                                except Exception as e:
+                                    # Some playlist URL may fail with 500, at the same time
+                                    # the other one may work fine (e.g.
+                                    # http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
+                                    if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
+                                        continue
+                                    raise
+                            if entry:
+                                self._sort_formats(entry['formats'])
+                                entries.append(entry)
  
          if entries:
              return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
@@ -847,6 +885,50 @@ class BBCIE(BBCCoUkIE):
                  'subtitles': subtitles,
              }
  
+        # Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
+        # There are several setPayload calls may be present but the video
+        # seems to be always related to the first one
+        morph_payload = self._parse_json(
+            self._search_regex(
+                r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
+                webpage, 'morph payload', default='{}'),
+            playlist_id, fatal=False)
+        if morph_payload:
+            components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
+            for component in components:
+                if not isinstance(component, dict):
+                    continue
+                lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
+                if not lead_media:
+                    continue
+                identifiers = lead_media.get('identifiers')
+                if not identifiers or not isinstance(identifiers, dict):
+                    continue
+                programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
+                if not programme_id:
+                    continue
+                title = lead_media.get('title') or self._og_search_title(webpage)
+                formats, subtitles = self._download_media_selector(programme_id)
+                self._sort_formats(formats)
+                description = lead_media.get('summary')
+                uploader = lead_media.get('masterBrand')
+                uploader_id = lead_media.get('mid')
+                duration = None
+                duration_d = lead_media.get('duration')
+                if isinstance(duration_d, dict):
+                    duration = parse_duration(dict_get(
+                        duration_d, ('rawDuration', 'formattedDuration', 'spokenDuration')))
+                return {
+                    'id': programme_id,
+                    'title': title,
+                    'description': description,
+                    'duration': duration,
+                    'uploader': uploader,
+                    'uploader_id': uploader_id,
+                    'formats': formats,
+                    'subtitles': subtitles,
+                }
+
          def extract_all(pattern):
              return list(filter(None, map(
                  lambda s: self._parse_json(s, playlist_id, fatal=False),
@@ -864,7 +946,7 @@ class BBCIE(BBCCoUkIE):
              r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
          if entries:
              return self.playlist_result(
-                [self.url_result(entry, 'BBCCoUk') for entry in entries],
+                [self.url_result(entry_, 'BBCCoUk') for entry_ in entries],
                  playlist_id, playlist_title, playlist_description)
  
          # Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
@@ -946,7 +1028,7 @@ class BBCIE(BBCCoUkIE):
  
  
  class BBCCoUkArticleIE(InfoExtractor):
-    _VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
      IE_NAME = 'bbc.co.uk:article'
      IE_DESC = 'BBC articles'
  
@@ -973,3 +1055,116 @@ class BBCCoUkArticleIE(InfoExtractor):
              r'<div[^>]+typeof="Clip"[^>]+resource="([^"]+)"', webpage)]
  
          return self.playlist_result(entries, playlist_id, title, description)
+
+
+class BBCCoUkPlaylistBaseIE(InfoExtractor):
+    def _entries(self, webpage, url, playlist_id):
+        single_page = 'page' in compat_urlparse.parse_qs(
+            compat_urlparse.urlparse(url).query)
+        for page_num in itertools.count(2):
+            for video_id in re.findall(
+                    self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage):
+                yield self.url_result(
+                    self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
+            if single_page:
+                return
+            next_page = self._search_regex(
+                r'<li[^>]+class=(["\'])pagination_+next\1[^>]*><a[^>]+href=(["\'])(?P<url>(?:(?!\2).)+)\2',
+                webpage, 'next page url', default=None, group='url')
+            if not next_page:
+                break
+            webpage = self._download_webpage(
+                compat_urlparse.urljoin(url, next_page), playlist_id,
+                'Downloading page %d' % page_num, page_num)
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        title, description = self._extract_title_and_description(webpage)
+
+        return self.playlist_result(
+            self._entries(webpage, url, playlist_id),
+            playlist_id, title, description)
+
+
+class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
+    IE_NAME = 'bbc.co.uk:iplayer:playlist'
+    _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
+    _URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
+    _VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
+    _TESTS = [{
+        'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
+        'info_dict': {
+            'id': 'b05rcz9v',
+            'title': 'The Disappearance',
+            'description': 'French thriller serial about a missing teenager.',
+        },
+        'playlist_mincount': 6,
+        'skip': 'This programme is not currently available on BBC iPlayer',
+    }, {
+        # Available for over a year unlike 30 days for most other programmes
+        'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
+        'info_dict': {
+            'id': 'p02tcc32',
+            'title': 'Bohemian Icons',
+            'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
+        },
+        'playlist_mincount': 10,
+    }]
+
+    def _extract_title_and_description(self, webpage):
+        title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
+        description = self._search_regex(
+            r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
+            webpage, 'description', fatal=False, group='value')
+        return title, description
+
+
+class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
+    IE_NAME = 'bbc.co.uk:playlist'
+    _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/(?P<id>%s)/(?:episodes|broadcasts|clips)' % BBCCoUkIE._ID_REGEX
+    _URL_TEMPLATE = 'http://www.bbc.co.uk/programmes/%s'
+    _VIDEO_ID_TEMPLATE = r'data-pid=["\'](%s)'
+    _TESTS = [{
+        'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
+        'info_dict': {
+            'id': 'b05rcz9v',
+            'title': 'The Disappearance - Clips - BBC Four',
+            'description': 'French thriller serial about a missing teenager.',
+        },
+        'playlist_mincount': 7,
+    }, {
+        # multipage playlist, explicit page
+        'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips?page=1',
+        'info_dict': {
+            'id': 'b00mfl7n',
+            'title': 'Frozen Planet - Clips - BBC One',
+            'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
+        },
+        'playlist_mincount': 24,
+    }, {
+        # multipage playlist, all pages
+        'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips',
+        'info_dict': {
+            'id': 'b00mfl7n',
+            'title': 'Frozen Planet - Clips - BBC One',
+            'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
+        },
+        'playlist_mincount': 142,
+    }, {
+        'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.bbc.co.uk/programmes/b055jkys/episodes/player',
+        'only_matching': True,
+    }]
+
+    def _extract_title_and_description(self, webpage):
+        title = self._og_search_title(webpage, fatal=False)
+        description = self._og_search_description(webpage)
+        return title, description
diff --git a/youtube_dl/extractor/beatport.py b/youtube_dl/extractor/beatport.py

new file mode 100644 (file)

index 0000000..e607094
--- /dev/null
+++ b/youtube_dl/extractor/beatport.py
@@ -0,0 +1,103 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import int_or_none
+
+
+class BeatportIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.|pro\.)?beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'https://beatport.com/track/synesthesia-original-mix/5379371',
+        'md5': 'b3c34d8639a2f6a7f734382358478887',
+        'info_dict': {
+            'id': '5379371',
+            'display_id': 'synesthesia-original-mix',
+            'ext': 'mp4',
+            'title': 'Froxic - Synesthesia (Original Mix)',
+        },
+    }, {
+        'url': 'https://beatport.com/track/love-and-war-original-mix/3756896',
+        'md5': 'e44c3025dfa38c6577fbaeb43da43514',
+        'info_dict': {
+            'id': '3756896',
+            'display_id': 'love-and-war-original-mix',
+            'ext': 'mp3',
+            'title': 'Wolfgang Gartner - Love & War (Original Mix)',
+        },
+    }, {
+        'url': 'https://beatport.com/track/birds-original-mix/4991738',
+        'md5': 'a1fd8e8046de3950fd039304c186c05f',
+        'info_dict': {
+            'id': '4991738',
+            'display_id': 'birds-original-mix',
+            'ext': 'mp4',
+            'title': "Tos, Middle Milk, Mumblin' Johnsson - Birds (Original Mix)",
+        }
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        track_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        webpage = self._download_webpage(url, display_id)
+
+        playables = self._parse_json(
+            self._search_regex(
+                r'window\.Playables\s*=\s*({.+?});', webpage,
+                'playables info', flags=re.DOTALL),
+            track_id)
+
+        track = next(t for t in playables['tracks'] if t['id'] == int(track_id))
+
+        title = ', '.join((a['name'] for a in track['artists'])) + ' - ' + track['name']
+        if track['mix']:
+            title += ' (' + track['mix'] + ')'
+
+        formats = []
+        for ext, info in track['preview'].items():
+            if not info['url']:
+                continue
+            fmt = {
+                'url': info['url'],
+                'ext': ext,
+                'format_id': ext,
+                'vcodec': 'none',
+            }
+            if ext == 'mp3':
+                fmt['preference'] = 0
+                fmt['acodec'] = 'mp3'
+                fmt['abr'] = 96
+                fmt['asr'] = 44100
+            elif ext == 'mp4':
+                fmt['preference'] = 1
+                fmt['acodec'] = 'aac'
+                fmt['abr'] = 96
+                fmt['asr'] = 44100
+            formats.append(fmt)
+        self._sort_formats(formats)
+
+        images = []
+        for name, info in track['images'].items():
+            image_url = info.get('url')
+            if name == 'dynamic' or not image_url:
+                continue
+            image = {
+                'id': name,
+                'url': image_url,
+                'height': int_or_none(info.get('height')),
+                'width': int_or_none(info.get('width')),
+            }
+            images.append(image)
+
+        return {
+            'id': compat_str(track.get('id')) or track_id,
+            'display_id': track.get('slug') or display_id,
+            'title': title,
+            'formats': formats,
+            'thumbnails': images,
+        }
diff --git a/youtube_dl/extractor/beatportpro.py b/youtube_dl/extractor/beatportpro.py

deleted file mode 100644 (file)

index 3c7775d..0000000
--- a/youtube_dl/extractor/beatportpro.py
+++ /dev/null
@@ -1,103 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import int_or_none
-
-
-class BeatportProIE(InfoExtractor):
-    _VALID_URL = r'https?://pro\.beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)'
-    _TESTS = [{
-        'url': 'https://pro.beatport.com/track/synesthesia-original-mix/5379371',
-        'md5': 'b3c34d8639a2f6a7f734382358478887',
-        'info_dict': {
-            'id': '5379371',
-            'display_id': 'synesthesia-original-mix',
-            'ext': 'mp4',
-            'title': 'Froxic - Synesthesia (Original Mix)',
-        },
-    }, {
-        'url': 'https://pro.beatport.com/track/love-and-war-original-mix/3756896',
-        'md5': 'e44c3025dfa38c6577fbaeb43da43514',
-        'info_dict': {
-            'id': '3756896',
-            'display_id': 'love-and-war-original-mix',
-            'ext': 'mp3',
-            'title': 'Wolfgang Gartner - Love & War (Original Mix)',
-        },
-    }, {
-        'url': 'https://pro.beatport.com/track/birds-original-mix/4991738',
-        'md5': 'a1fd8e8046de3950fd039304c186c05f',
-        'info_dict': {
-            'id': '4991738',
-            'display_id': 'birds-original-mix',
-            'ext': 'mp4',
-            'title': "Tos, Middle Milk, Mumblin' Johnsson - Birds (Original Mix)",
-        }
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        track_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-
-        webpage = self._download_webpage(url, display_id)
-
-        playables = self._parse_json(
-            self._search_regex(
-                r'window\.Playables\s*=\s*({.+?});', webpage,
-                'playables info', flags=re.DOTALL),
-            track_id)
-
-        track = next(t for t in playables['tracks'] if t['id'] == int(track_id))
-
-        title = ', '.join((a['name'] for a in track['artists'])) + ' - ' + track['name']
-        if track['mix']:
-            title += ' (' + track['mix'] + ')'
-
-        formats = []
-        for ext, info in track['preview'].items():
-            if not info['url']:
-                continue
-            fmt = {
-                'url': info['url'],
-                'ext': ext,
-                'format_id': ext,
-                'vcodec': 'none',
-            }
-            if ext == 'mp3':
-                fmt['preference'] = 0
-                fmt['acodec'] = 'mp3'
-                fmt['abr'] = 96
-                fmt['asr'] = 44100
-            elif ext == 'mp4':
-                fmt['preference'] = 1
-                fmt['acodec'] = 'aac'
-                fmt['abr'] = 96
-                fmt['asr'] = 44100
-            formats.append(fmt)
-        self._sort_formats(formats)
-
-        images = []
-        for name, info in track['images'].items():
-            image_url = info.get('url')
-            if name == 'dynamic' or not image_url:
-                continue
-            image = {
-                'id': name,
-                'url': image_url,
-                'height': int_or_none(info.get('height')),
-                'width': int_or_none(info.get('width')),
-            }
-            images.append(image)
-
-        return {
-            'id': compat_str(track.get('id')) or track_id,
-            'display_id': track.get('slug') or display_id,
-            'title': title,
-            'formats': formats,
-            'thumbnails': images,
-        }
diff --git a/youtube_dl/extractor/beeg.py b/youtube_dl/extractor/beeg.py

index 34c2a756fba11f81516e87e095ef1e02d5e65417..b0b7914d89777fcba136a12562f771bf4f2af4d6 100644 (file)
--- a/youtube_dl/extractor/beeg.py
+++ b/youtube_dl/extractor/beeg.py
@@ -33,8 +33,33 @@ class BeegIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
+        webpage = self._download_webpage(url, video_id)
+
+        cpl_url = self._search_regex(
+            r'<script[^>]+src=(["\'])(?P<url>(?:https?:)?//static\.beeg\.com/cpl/\d+\.js.*?)\1',
+            webpage, 'cpl', default=None, group='url')
+
+        beeg_version, beeg_salt = [None] * 2
+
+        if cpl_url:
+            cpl = self._download_webpage(
+                self._proto_relative_url(cpl_url), video_id,
+                'Downloading cpl JS', fatal=False)
+            if cpl:
+                beeg_version = int_or_none(self._search_regex(
+                    r'beeg_version\s*=\s*([^\b]+)', cpl,
+                    'beeg version', default=None)) or self._search_regex(
+                    r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
+                beeg_salt = self._search_regex(
+                    r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt',
+                    default=None, group='beeg_salt')
+
+        beeg_version = beeg_version or '2000'
+        beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H'
+
          video = self._download_json(
-            'https://api.beeg.com/api/v5/video/%s' % video_id, video_id)
+            'https://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
+            video_id)
  
          def split(o, e):
              def cut(s, x):
@@ -50,8 +75,8 @@ class BeegIE(InfoExtractor):
              return n
  
          def decrypt_key(key):
-            # Reverse engineered from http://static.beeg.com/cpl/1105.js
-            a = '5ShMcIQlssOd7zChAIOlmeTZDaUxULbJRnywYaiB'
+            # Reverse engineered from http://static.beeg.com/cpl/1738.js
+            a = beeg_salt
              e = compat_urllib_parse_unquote(key)
              o = ''.join([
                  compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)
@@ -101,5 +126,5 @@ class BeegIE(InfoExtractor):
              'duration': duration,
              'tags': tags,
              'formats': formats,
-            'age_limit': 18,
+            'age_limit': self._rta_search(webpage),
          }
diff --git a/youtube_dl/extractor/bellmedia.py b/youtube_dl/extractor/bellmedia.py

new file mode 100644 (file)

index 0000000..32326ed
--- /dev/null
+++ b/youtube_dl/extractor/bellmedia.py
@@ -0,0 +1,75 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class BellMediaIE(InfoExtractor):
+    _VALID_URL = r'''(?x)https?://(?:www\.)?
+        (?P<domain>
+            (?:
+                ctv|
+                tsn|
+                bnn|
+                thecomedynetwork|
+                discovery|
+                discoveryvelocity|
+                sciencechannel|
+                investigationdiscovery|
+                animalplanet|
+                bravo|
+                mtv|
+                space
+            )\.ca|
+            much\.com
+        )/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})'''
+    _TESTS = [{
+        'url': 'http://www.ctv.ca/video/player?vid=706966',
+        'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
+        'info_dict': {
+            'id': '706966',
+            'ext': 'mp4',
+            'title': 'Larry Day and Richard Jutras on the TIFF red carpet of \'Stonewall\'',
+            'description': 'etalk catches up with Larry Day and Richard Jutras on the TIFF red carpet of "Stonewall”.',
+            'upload_date': '20150919',
+            'timestamp': 1442624700,
+        },
+        'expected_warnings': ['HTTP Error 404'],
+    }, {
+        'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.tsn.ca/video/expectations-high-for-milos-raonic-at-us-open~939549',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.bnn.ca/video/berman-s-call-part-two-viewer-questions~939654',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.much.com/shows/atmidnight/episode948007/tuesday-september-13-2016',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
+        'only_matching': True,
+    }]
+    _DOMAINS = {
+        'thecomedynetwork': 'comedy',
+        'discoveryvelocity': 'discvel',
+        'sciencechannel': 'discsci',
+        'investigationdiscovery': 'invdisc',
+        'animalplanet': 'aniplan',
+    }
+
+    def _real_extract(self, url):
+        domain, video_id = re.match(self._VALID_URL, url).groups()
+        domain = domain.split('.')[0]
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': '9c9media:%s_web:%s' % (self._DOMAINS.get(domain, domain), video_id),
+            'ie_key': 'NineCNineMedia',
+        }
diff --git a/youtube_dl/extractor/bet.py b/youtube_dl/extractor/bet.py

index 986245bf0568e8aaaaab8b8a32eeedca866b21cc..1f8ef030380c5fb548d14cc8e944c8dad1fca900 100644 (file)
--- a/youtube_dl/extractor/bet.py
+++ b/youtube_dl/extractor/bet.py
@@ -1,31 +1,26 @@
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
-from ..utils import (
-    xpath_text,
-    xpath_with_ns,
-    int_or_none,
-    parse_iso8601,
-)
+from .mtv import MTVServicesInfoExtractor
+from ..utils import unified_strdate
  
  
-class BetIE(InfoExtractor):
+class BetIE(MTVServicesInfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?bet\.com/(?:[^/]+/)+(?P<id>.+?)\.html'
      _TESTS = [
          {
              'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html',
              'info_dict': {
-                'id': 'news/national/2014/a-conversation-with-president-obama',
+                'id': '07e96bd3-8850-3051-b856-271b457f0ab8',
                  'display_id': 'in-bet-exclusive-obama-talks-race-and-racism',
                  'ext': 'flv',
                  'title': 'A Conversation With President Obama',
-                'description': 'md5:699d0652a350cf3e491cd15cc745b5da',
+                'description': 'President Obama urges persistence in confronting racism and bias.',
                  'duration': 1534,
-                'timestamp': 1418075340,
                  'upload_date': '20141208',
-                'uploader': 'admin',
                  'thumbnail': 're:(?i)^https?://.*\.jpg$',
+                'subtitles': {
+                    'en': 'mincount:2',
+                }
              },
              'params': {
                  # rtmp download
@@ -35,16 +30,17 @@ class BetIE(InfoExtractor):
          {
              'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html',
              'info_dict': {
-                'id': 'news/national/2014/justice-for-ferguson-a-community-reacts',
+                'id': '9f516bf1-7543-39c4-8076-dd441b459ba9',
                  'display_id': 'justice-for-ferguson-a-community-reacts',
                  'ext': 'flv',
                  'title': 'Justice for Ferguson: A Community Reacts',
                  'description': 'A BET News special.',
                  'duration': 1696,
-                'timestamp': 1416942360,
                  'upload_date': '20141125',
-                'uploader': 'admin',
                  'thumbnail': 're:(?i)^https?://.*\.jpg$',
+                'subtitles': {
+                    'en': 'mincount:2',
+                }
              },
              'params': {
                  # rtmp download
@@ -53,57 +49,32 @@ class BetIE(InfoExtractor):
          }
      ]
  
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-
-        media_url = compat_urllib_parse_unquote(self._search_regex(
-            [r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"],
-            webpage, 'media URL'))
-
-        video_id = self._search_regex(
-            r'/video/(.*)/_jcr_content/', media_url, 'video id')
-
-        mrss = self._download_xml(media_url, display_id)
-
-        item = mrss.find('./channel/item')
+    _FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player"
  
-        NS_MAP = {
-            'dc': 'http://purl.org/dc/elements/1.1/',
-            'media': 'http://search.yahoo.com/mrss/',
-            'ka': 'http://kickapps.com/karss',
+    def _get_feed_query(self, uri):
+        return {
+            'uuid': uri,
          }
  
-        title = xpath_text(item, './title', 'title')
-        description = xpath_text(
-            item, './description', 'description', fatal=False)
+    def _extract_mgid(self, webpage):
+        return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid')
  
-        timestamp = parse_iso8601(xpath_text(
-            item, xpath_with_ns('./dc:date', NS_MAP),
-            'upload date', fatal=False))
-        uploader = xpath_text(
-            item, xpath_with_ns('./dc:creator', NS_MAP),
-            'uploader', fatal=False)
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
  
-        media_content = item.find(
-            xpath_with_ns('./media:content', NS_MAP))
-        duration = int_or_none(media_content.get('duration'))
-        smil_url = media_content.get('url')
+        webpage = self._download_webpage(url, display_id)
+        mgid = self._extract_mgid(webpage)
+        videos_info = self._get_videos_info(mgid)
  
-        thumbnail = media_content.find(
-            xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
+        info_dict = videos_info['entries'][0]
  
-        formats = self._extract_smil_formats(smil_url, display_id)
-        self._sort_formats(formats)
+        upload_date = unified_strdate(self._html_search_meta('date', webpage))
+        description = self._html_search_meta('description', webpage)
  
-        return {
-            'id': video_id,
+        info_dict.update({
              'display_id': display_id,
-            'title': title,
              'description': description,
-            'thumbnail': thumbnail,
-            'timestamp': timestamp,
-            'uploader': uploader,
-            'duration': duration,
-            'formats': formats,
-        }
+            'upload_date': upload_date,
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/bigflix.py b/youtube_dl/extractor/bigflix.py

index 33762ad93eae6f7b204cd00c11058f60c621bf50..b4ce767af6735321ab08769e4d2c87b716b93e65 100644 (file)
--- a/youtube_dl/extractor/bigflix.py
+++ b/youtube_dl/extractor/bigflix.py
@@ -11,22 +11,13 @@ from ..compat import compat_urllib_parse_unquote
  class BigflixIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)'
      _TESTS = [{
-        'url': 'http://www.bigflix.com/Hindi-movies/Action-movies/Singham-Returns/16537',
-        'md5': 'ec76aa9b1129e2e5b301a474e54fab74',
-        'info_dict': {
-            'id': '16537',
-            'ext': 'mp4',
-            'title': 'Singham Returns',
-            'description': 'md5:3d2ba5815f14911d5cc6a501ae0cf65d',
-        }
-    }, {
          # 2 formats
          'url': 'http://www.bigflix.com/Tamil-movies/Drama-movies/Madarasapatinam/16070',
          'info_dict': {
              'id': '16070',
              'ext': 'mp4',
              'title': 'Madarasapatinam',
-            'description': 'md5:63b9b8ed79189c6f0418c26d9a3452ca',
+            'description': 'md5:9f0470b26a4ba8e824c823b5d95c2f6b',
              'formats': 'mincount:2',
          },
          'params': {
diff --git a/youtube_dl/extractor/bilibili.py b/youtube_dl/extractor/bilibili.py

index 8baff2041bb380d0204895cbbc6c64b16be94993..2d174e6f9a81da7412cd58ac316c7b5924dcde78 100644 (file)
--- a/youtube_dl/extractor/bilibili.py
+++ b/youtube_dl/extractor/bilibili.py
@@ -1,110 +1,125 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import hashlib
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import compat_parse_qs
  from ..utils import (
      int_or_none,
-    unescapeHTML,
-    ExtractorError,
-    xpath_text,
+    float_or_none,
+    unified_timestamp,
+    urlencode_postdata,
  )
  
  
  class BiliBiliIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
+    _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)'
  
-    _TESTS = [{
+    _TEST = {
          'url': 'http://www.bilibili.tv/video/av1074402/',
-        'md5': '2c301e4dab317596e837c3e7633e7d86',
+        'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
          'info_dict': {
-            'id': '1554319',
-            'ext': 'flv',
+            'id': '1074402',
+            'ext': 'mp4',
              'title': '【金坷垃】金泡沫',
-            'duration': 308313,
+            'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
+            'duration': 308.315,
+            'timestamp': 1398012660,
              'upload_date': '20140420',
              'thumbnail': 're:^https?://.+\.jpg',
-            'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
-            'timestamp': 1397983878,
              'uploader': '菊子桑',
+            'uploader_id': '156160',
          },
-    }, {
-        'url': 'http://www.bilibili.com/video/av1041170/',
-        'info_dict': {
-            'id': '1041170',
-            'title': '【BD1080P】刀语【诸神&异域】',
-            'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦！~',
-            'uploader': '枫叶逝去',
-            'timestamp': 1396501299,
-        },
-        'playlist_count': 9,
-    }]
+    }
  
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        page_num = mobj.group('page_num') or '1'
+    _APP_KEY = '6f90a59ac58a4123'
+    _BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326'
  
-        view_data = self._download_json(
-            'http://api.bilibili.com/view?type=json&appkey=8e9fc618fbd41e28&id=%s&page=%s' % (video_id, page_num),
-            video_id)
-        if 'error' in view_data:
-            raise ExtractorError('%s said: %s' % (self.IE_NAME, view_data['error']), expected=True)
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
  
-        cid = view_data['cid']
-        title = unescapeHTML(view_data['title'])
+        if 'anime/v' not in url:
+            cid = compat_parse_qs(self._search_regex(
+                [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
+                 r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
+                webpage, 'player parameters'))['cid'][0]
+        else:
+            js = self._download_json(
+                'http://bangumi.bilibili.com/web_api/get_source', video_id,
+                data=urlencode_postdata({'episode_id': video_id}),
+                headers={'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
+            cid = js['result']['cid']
  
-        doc = self._download_xml(
-            'http://interface.bilibili.com/v_cdn_play?appkey=8e9fc618fbd41e28&cid=%s' % cid,
-            cid,
-            'Downloading page %s/%s' % (page_num, view_data['pages'])
-        )
+        payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
+        sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
  
-        if xpath_text(doc, './result') == 'error':
-            raise ExtractorError('%s said: %s' % (self.IE_NAME, xpath_text(doc, './message')), expected=True)
+        video_info = self._download_json(
+            'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
+            video_id, note='Downloading video info page')
  
          entries = []
  
-        for durl in doc.findall('./durl'):
-            size = xpath_text(durl, ['./filesize', './size'])
+        for idx, durl in enumerate(video_info['durl']):
              formats = [{
-                'url': durl.find('./url').text,
-                'filesize': int_or_none(size),
-                'ext': 'flv',
+                'url': durl['url'],
+                'filesize': int_or_none(durl['size']),
              }]
-            backup_urls = durl.find('./backup_url')
-            if backup_urls is not None:
-                for backup_url in backup_urls.findall('./url'):
-                    formats.append({'url': backup_url.text})
-            formats.reverse()
+            for backup_url in durl.get('backup_url', []):
+                formats.append({
+                    'url': backup_url,
+                    # backup URLs have lower priorities
+                    'preference': -2 if 'hd.mp4' in backup_url else -3,
+                })
+
+            self._sort_formats(formats)
  
              entries.append({
-                'id': '%s_part%s' % (cid, xpath_text(durl, './order')),
-                'title': title,
-                'duration': int_or_none(xpath_text(durl, './length'), 1000),
+                'id': '%s_part%s' % (video_id, idx),
+                'duration': float_or_none(durl.get('length'), 1000),
                  'formats': formats,
              })
  
+        title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
+        description = self._html_search_meta('description', webpage)
+        timestamp = unified_timestamp(self._html_search_regex(
+            r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False))
+        thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
+
+        # TODO 'view_count' requires deobfuscating Javascript
          info = {
-            'id': compat_str(cid),
+            'id': video_id,
              'title': title,
-            'description': view_data.get('description'),
-            'thumbnail': view_data.get('pic'),
-            'uploader': view_data.get('author'),
-            'timestamp': int_or_none(view_data.get('created')),
-            'view_count': int_or_none(view_data.get('play')),
-            'duration': int_or_none(xpath_text(doc, './timelength')),
+            'description': description,
+            'timestamp': timestamp,
+            'thumbnail': thumbnail,
+            'duration': float_or_none(video_info.get('timelength'), scale=1000),
          }
  
+        uploader_mobj = re.search(
+            r'<a[^>]+href="https?://space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
+            webpage)
+        if uploader_mobj:
+            info.update({
+                'uploader': uploader_mobj.group('name'),
+                'uploader_id': uploader_mobj.group('id'),
+            })
+
+        for entry in entries:
+            entry.update(info)
+
          if len(entries) == 1:
-            entries[0].update(info)
              return entries[0]
          else:
-            info.update({
+            for idx, entry in enumerate(entries):
+                entry['id'] = '%s_part%d' % (video_id, (idx + 1))
+
+            return {
                  '_type': 'multi_video',
                  'id': video_id,
+                'title': title,
+                'description': description,
                  'entries': entries,
-            })
-            return info
+            }
diff --git a/youtube_dl/extractor/biobiochiletv.py b/youtube_dl/extractor/biobiochiletv.py

index 1332281337b2b5d64d0b89eb7d257a11ebffee71..7608c0a085b3c656277b03f61d19c1f60ea8d4f1 100644 (file)
--- a/youtube_dl/extractor/biobiochiletv.py
+++ b/youtube_dl/extractor/biobiochiletv.py
@@ -2,11 +2,15 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import remove_end
+from ..utils import (
+    ExtractorError,
+    remove_end,
+)
+from .rudo import RudoIE
  
  
  class BioBioChileTVIE(InfoExtractor):
-    _VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
+    _VALID_URL = r'https?://(?:tv|www)\.biobiochile\.cl/(?:notas|noticias)/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
  
      _TESTS = [{
          'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
@@ -18,6 +22,7 @@ class BioBioChileTVIE(InfoExtractor):
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': 'Fernando Atria',
          },
+        'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
      }, {
          # different uploader layout
          'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
@@ -32,6 +37,16 @@ class BioBioChileTVIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
+        'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
+    }, {
+        'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
+        'info_dict': {
+            'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
+            'ext': 'mp4',
+            'uploader': '(none)',
+            'upload_date': '20160708',
+            'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
+        },
      }, {
          'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
          'only_matching': True,
@@ -45,42 +60,22 @@ class BioBioChileTVIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
+        rudo_url = RudoIE._extract_url(webpage)
+        if not rudo_url:
+            raise ExtractorError('No videos found')
  
-        file_url = self._search_regex(
-            r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
-            webpage, 'file url', group='url')
-
-        base_url = self._search_regex(
-            r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
-            'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
-            group='url')
-
-        formats = self._extract_m3u8_formats(
-            '%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
-            entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
-        f = {
-            'url': '%s%s' % (base_url, file_url),
-            'format_id': 'http',
-            'protocol': 'http',
-            'preference': 1,
-        }
-        if formats:
-            f_copy = formats[-1].copy()
-            f_copy.update(f)
-            f = f_copy
-        formats.append(f)
-        self._sort_formats(formats)
+        title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
  
          thumbnail = self._og_search_thumbnail(webpage)
          uploader = self._html_search_regex(
-            r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>',
+            r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
              webpage, 'uploader', fatal=False)
  
          return {
+            '_type': 'url_transparent',
+            'url': rudo_url,
              'id': video_id,
              'title': title,
              'thumbnail': thumbnail,
              'uploader': uploader,
-            'formats': formats,
          }
diff --git a/youtube_dl/extractor/biqle.py b/youtube_dl/extractor/biqle.py

new file mode 100644 (file)

index 0000000..beaebfd
--- /dev/null
+++ b/youtube_dl/extractor/biqle.py
@@ -0,0 +1,40 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class BIQLEIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?biqle\.(?:com|org|ru)/watch/(?P<id>-?\d+_\d+)'
+    _TESTS = [{
+        'url': 'http://www.biqle.ru/watch/847655_160197695',
+        'md5': 'ad5f746a874ccded7b8f211aeea96637',
+        'info_dict': {
+            'id': '160197695',
+            'ext': 'mp4',
+            'title': 'Foo Fighters - The Pretender (Live at Wembley Stadium)',
+            'uploader': 'Andrey Rogozin',
+            'upload_date': '20110605',
+        }
+    }, {
+        'url': 'https://biqle.org/watch/-44781847_168547604',
+        'md5': '7f24e72af1db0edf7c1aaba513174f97',
+        'info_dict': {
+            'id': '168547604',
+            'ext': 'mp4',
+            'title': 'Ребенок в шоке от автоматической мойки',
+            'uploader': 'Dmitry Kotov',
+        },
+        'skip': ' This video was marked as adult.  Embedding adult videos on external sites is prohibited.',
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        embed_url = self._proto_relative_url(self._search_regex(
+            r'<iframe.+?src="((?:http:)?//daxab\.com/[^"]+)".*?></iframe>', webpage, 'embed url'))
+
+        return {
+            '_type': 'url_transparent',
+            'url': embed_url,
+        }
diff --git a/youtube_dl/extractor/bloomberg.py b/youtube_dl/extractor/bloomberg.py

index 13343bc258532b37bf912f0648e317103b5f428d..2a8cd64b99d2551da9777aaa259d356d8cad51ed 100644 (file)
--- a/youtube_dl/extractor/bloomberg.py
+++ b/youtube_dl/extractor/bloomberg.py
@@ -1,3 +1,4 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -17,6 +18,21 @@ class BloombergIE(InfoExtractor):
              'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
              'description': 'md5:a8ba0302912d03d246979735c17d2761',
          },
+        'params': {
+            'format': 'best[format_id^=hds]',
+        },
+    }, {
+        # video ID in BPlayer(...)
+        'url': 'http://www.bloomberg.com/features/2016-hello-world-new-zealand/',
+        'info_dict': {
+            'id': '938c7e72-3f25-4ddb-8b85-a9be731baa74',
+            'ext': 'flv',
+            'title': 'Meet the Real-Life Tech Wizards of Middle Earth',
+            'description': 'Hello World, Episode 1: New Zealand’s freaky AI babies, robot exoskeletons, and a virtual you.',
+        },
+        'params': {
+            'format': 'best[format_id^=hds]',
+        },
      }, {
          'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
          'only_matching': True,
@@ -30,7 +46,11 @@ class BloombergIE(InfoExtractor):
          webpage = self._download_webpage(url, name)
          video_id = self._search_regex(
              r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
-            webpage, 'id', group='url')
+            webpage, 'id', group='url', default=None)
+        if not video_id:
+            bplayer_data = self._parse_json(self._search_regex(
+                r'BPlayer\(null,\s*({[^;]+})\);', webpage, 'id'), name)
+            video_id = bplayer_data['id']
          title = re.sub(': Video$', '', self._og_search_title(webpage))
  
          embed_info = self._download_json(
diff --git a/youtube_dl/extractor/bpb.py b/youtube_dl/extractor/bpb.py

index 6ad45a1e6a30bac2450743de3f0d12a2c9f2b89d..9661ade4f312e5c5e1068a42d3a693dd936fd1d6 100644 (file)
--- a/youtube_dl/extractor/bpb.py
+++ b/youtube_dl/extractor/bpb.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  class BpbIE(InfoExtractor):
      IE_DESC = 'Bundeszentrale für politische Bildung'
-    _VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
+    _VALID_URL = r'https?://(?:www\.)?bpb\.de/mediathek/(?P<id>[0-9]+)/'
  
      _TEST = {
          'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',
diff --git a/youtube_dl/extractor/br.py b/youtube_dl/extractor/br.py

index 11cf498515ba572f8ef8c7f20d5620bf50289827..ff0aa11b19a7736017992d76f13a0ba5509f2f8e 100644 (file)
--- a/youtube_dl/extractor/br.py
+++ b/youtube_dl/extractor/br.py
@@ -29,7 +29,8 @@ class BRIE(InfoExtractor):
                  'duration': 180,
                  'uploader': 'Reinhard Weber',
                  'upload_date': '20150422',
-            }
+            },
+            'skip': '404 not found',
          },
          {
              'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html',
@@ -40,7 +41,8 @@ class BRIE(InfoExtractor):
                  'title': 'Manfred Schreiber ist tot',
                  'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97',
                  'duration': 26,
-            }
+            },
+            'skip': '404 not found',
          },
          {
              'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html',
@@ -51,7 +53,8 @@ class BRIE(InfoExtractor):
                  'title': 'Kurzweilig und sehr bewegend',
                  'description': 'md5:0351996e3283d64adeb38ede91fac54e',
                  'duration': 296,
-            }
+            },
+            'skip': '404 not found',
          },
          {
              'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html',
diff --git a/youtube_dl/extractor/bravotv.py b/youtube_dl/extractor/bravotv.py

index 34d451f385547b80c29a81ca1f2466e6dc84b5f1..a25d500e478b90c983cd676249fdf6bc676c68fa 100644 (file)
--- a/youtube_dl/extractor/bravotv.py
+++ b/youtube_dl/extractor/bravotv.py
@@ -1,28 +1,74 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..utils import smuggle_url
+from .adobepass import AdobePassIE
+from ..utils import (
+    smuggle_url,
+    update_url_query,
+    int_or_none,
+)
  
  
-class BravoTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+videos/(?P<id>[^/?]+)'
-    _TEST = {
+class BravoTVIE(AdobePassIE):
+    _VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
+    _TESTS = [{
          'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
-        'md5': 'd60cdf68904e854fac669bd26cccf801',
+        'md5': '9086d0b7ef0ea2aabc4781d75f4e5863',
          'info_dict': {
-            'id': 'LitrBdX64qLn',
+            'id': 'zHyk1_HU_mPy',
              'ext': 'mp4',
-            'title': 'Last Chance Kitchen Returns',
-            'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13',
+            'title': 'LCK Ep 12: Fishy Finale',
+            'description': 'S13/E12: Two eliminated chefs have just 12 minutes to cook up a delicious fish dish.',
+            'uploader': 'NBCU-BRAV',
+            'upload_date': '20160302',
+            'timestamp': 1456945320,
          }
-    }
+    }, {
+        'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-        account_pid = self._search_regex(r'"account_pid"\s*:\s*"([^"]+)"', webpage, 'account pid')
-        release_pid = self._search_regex(r'"release_pid"\s*:\s*"([^"]+)"', webpage, 'release pid')
-        return self.url_result(smuggle_url(
-            'http://link.theplatform.com/s/%s/%s?mbr=true&switch=progressive' % (account_pid, release_pid),
-            {'force_smil_url': True}), 'ThePlatform', release_pid)
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        settings = self._parse_json(self._search_regex(
+            r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
+            display_id)
+        info = {}
+        query = {
+            'mbr': 'true',
+        }
+        account_pid, release_pid = [None] * 2
+        tve = settings.get('sharedTVE')
+        if tve:
+            query['manifest'] = 'm3u'
+            account_pid = 'HNK2IC'
+            release_pid = tve['release_pid']
+            if tve.get('entitlement') == 'auth':
+                adobe_pass = settings.get('adobePass', {})
+                resource = self._get_mvpd_resource(
+                    adobe_pass.get('adobePassResourceId', 'bravo'),
+                    tve['title'], release_pid, tve.get('rating'))
+                query['auth'] = self._extract_mvpd_auth(
+                    url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
+        else:
+            shared_playlist = settings['shared_playlist']
+            account_pid = shared_playlist['account_pid']
+            metadata = shared_playlist['video_metadata'][shared_playlist['default_clip']]
+            release_pid = metadata['release_pid']
+            info.update({
+                'title': metadata['title'],
+                'description': metadata.get('description'),
+                'season_number': int_or_none(metadata.get('season_num')),
+                'episode_number': int_or_none(metadata.get('episode_num')),
+            })
+            query['switch'] = 'progressive'
+        info.update({
+            '_type': 'url_transparent',
+            'id': release_pid,
+            'url': smuggle_url(update_url_query(
+                'http://link.theplatform.com/s/%s/%s' % (account_pid, release_pid),
+                query), {'force_smil_url': True}),
+            'ie_key': 'ThePlatform',
+        })
+        return info
diff --git a/youtube_dl/extractor/brightcove.py b/youtube_dl/extractor/brightcove.py

index c9e43a2751f1f94a622895cd7dfbdabfa3d0355f..945cf19e8bce0f1f9576d26abc455c9795a250d3 100644 (file)
--- a/youtube_dl/extractor/brightcove.py
+++ b/youtube_dl/extractor/brightcove.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -26,6 +26,8 @@ from ..utils import (
      unescapeHTML,
      unsmuggle_url,
      update_url_query,
+    clean_html,
+    mimetype2ext,
  )
  
  
@@ -46,6 +48,9 @@ class BrightcoveLegacyIE(InfoExtractor):
                  'title': 'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
                  'uploader': '8TV',
                  'description': 'md5:a950cc4285c43e44d763d036710cd9cd',
+                'timestamp': 1368213670,
+                'upload_date': '20130510',
+                'uploader_id': '1589608506001',
              }
          },
          {
@@ -57,6 +62,9 @@ class BrightcoveLegacyIE(InfoExtractor):
                  'title': 'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
                  'description': 'John Rose speaks at the JVM Language Summit, August 1, 2012.',
                  'uploader': 'Oracle',
+                'timestamp': 1344975024,
+                'upload_date': '20120814',
+                'uploader_id': '1460825906',
              },
          },
          {
@@ -68,6 +76,9 @@ class BrightcoveLegacyIE(InfoExtractor):
                  'title': 'This Bracelet Acts as a Personal Thermostat',
                  'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
                  'uploader': 'Mashable',
+                'timestamp': 1382041798,
+                'upload_date': '20131017',
+                'uploader_id': '1130468786001',
              },
          },
          {
@@ -81,22 +92,26 @@ class BrightcoveLegacyIE(InfoExtractor):
                  'description': 'md5:363109c02998fee92ec02211bd8000df',
                  'uploader': 'National Ballet of Canada',
              },
+            'skip': 'Video gone',
          },
          {
              # test flv videos served by akamaihd.net
              # From http://www.redbull.com/en/bike/stories/1331655643987/replay-uci-dh-world-cup-2014-from-fort-william
-            'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3ABC2996102916001&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
+            'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3Aevent-stream-356&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
              # The md5 checksum changes on each download
              'info_dict': {
-                'id': '2996102916001',
+                'id': '3750436379001',
                  'ext': 'flv',
                  'title': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
-                'uploader': 'Red Bull TV',
+                'uploader': 'RBTV Old (do not use)',
                  'description': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
+                'timestamp': 1409122195,
+                'upload_date': '20140827',
+                'uploader_id': '710858724001',
              },
          },
          {
-            # playlist test
+            # playlist with 'videoList'
              # from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players
              'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL',
              'info_dict': {
@@ -105,7 +120,22 @@ class BrightcoveLegacyIE(InfoExtractor):
              },
              'playlist_mincount': 7,
          },
+        {
+            # playlist with 'playlistTab' (https://github.com/rg3/youtube-dl/issues/9965)
+            'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
+            'info_dict': {
+                'id': '1522758701001',
+                'title': 'Lesson 08',
+            },
+            'playlist_mincount': 10,
+        },
      ]
+    FLV_VCODECS = {
+        1: 'SORENSON',
+        2: 'ON2',
+        3: 'H264',
+        4: 'VP8',
+    }
  
      @classmethod
      def _build_brighcove_url(cls, object_str):
@@ -280,21 +310,32 @@ class BrightcoveLegacyIE(InfoExtractor):
              info_url, player_key, 'Downloading playlist information')
  
          json_data = json.loads(playlist_info)
-        if 'videoList' not in json_data:
+        if 'videoList' in json_data:
+            playlist_info = json_data['videoList']
+            playlist_dto = playlist_info['mediaCollectionDTO']
+        elif 'playlistTabs' in json_data:
+            playlist_info = json_data['playlistTabs']
+            playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
+        else:
              raise ExtractorError('Empty playlist')
-        playlist_info = json_data['videoList']
-        videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
+
+        videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
  
          return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
-                                    playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
+                                    playlist_title=playlist_dto['displayName'])
  
      def _extract_video_info(self, video_info):
+        video_id = compat_str(video_info['id'])
+        publisher_id = video_info.get('publisherId')
          info = {
-            'id': compat_str(video_info['id']),
+            'id': video_id,
              'title': video_info['displayName'].strip(),
              'description': video_info.get('shortDescription'),
              'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
              'uploader': video_info.get('publisherName'),
+            'uploader_id': compat_str(publisher_id) if publisher_id else None,
+            'duration': float_or_none(video_info.get('length'), 1000),
+            'timestamp': int_or_none(video_info.get('creationDate'), 1000),
          }
  
          renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
@@ -309,7 +350,8 @@ class BrightcoveLegacyIE(InfoExtractor):
                      url_comp = compat_urllib_parse_urlparse(url)
                      if url_comp.path.endswith('.m3u8'):
                          formats.extend(
-                            self._extract_m3u8_formats(url, info['id'], 'mp4'))
+                            self._extract_m3u8_formats(
+                                url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
                          continue
                      elif 'akamaihd.net' in url_comp.netloc:
                          # This type of renditions are served through
@@ -318,21 +360,32 @@ class BrightcoveLegacyIE(InfoExtractor):
                          ext = 'flv'
                  if ext is None:
                      ext = determine_ext(url)
-                size = rend.get('size')
+                tbr = int_or_none(rend.get('encodingRate'), 1000)
                  a_format = {
+                    'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
                      'url': url,
                      'ext': ext,
-                    'height': rend.get('frameHeight'),
-                    'width': rend.get('frameWidth'),
-                    'filesize': size if size != 0 else None,
+                    'filesize': int_or_none(rend.get('size')) or None,
+                    'tbr': tbr,
                  }
+                if rend.get('audioOnly'):
+                    a_format.update({
+                        'vcodec': 'none',
+                    })
+                else:
+                    a_format.update({
+                        'height': int_or_none(rend.get('frameHeight')),
+                        'width': int_or_none(rend.get('frameWidth')),
+                        'vcodec': rend.get('videoCodec'),
+                    })
  
                  # m3u8 manifests with remote == false are media playlists
                  # Not calling _extract_m3u8_formats here to save network traffic
                  if ext == 'm3u8':
                      a_format.update({
+                        'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
                          'ext': 'mp4',
-                        'protocol': 'm3u8',
+                        'protocol': 'm3u8_native',
                      })
  
                  formats.append(a_format)
@@ -341,6 +394,8 @@ class BrightcoveLegacyIE(InfoExtractor):
          elif video_info.get('FLVFullLengthURL') is not None:
              info.update({
                  'url': video_info['FLVFullLengthURL'],
+                'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
+                'filesize': int_or_none(video_info.get('FLVFullSize')),
              })
  
          if self._downloader.params.get('include_ads', False):
@@ -360,7 +415,7 @@ class BrightcoveLegacyIE(InfoExtractor):
                      return ad_info
  
          if 'url' not in info and not info.get('formats'):
-            raise ExtractorError('Unable to extract video url for %s' % info['id'])
+            raise ExtractorError('Unable to extract video url for %s' % video_id)
          return info
  
  
@@ -396,6 +451,7 @@ class BrightcoveNewIE(InfoExtractor):
              'formats': 'mincount:41',
          },
          'params': {
+            # m3u8 download
              'skip_download': True,
          }
      }, {
@@ -406,6 +462,10 @@ class BrightcoveNewIE(InfoExtractor):
          # non numeric ref: prefixed video id
          'url': 'http://players.brightcove.net/710858724001/default_default/index.html?videoId=ref:event-stream-356',
          'only_matching': True,
+    }, {
+        # unavailable video without message but with error_code
+        'url': 'http://players.brightcove.net/1305187701/c832abfb-641b-44eb-9da0-2fe76786505f_default/index.html?videoId=4377407326001',
+        'only_matching': True,
      }]
  
      @staticmethod
@@ -439,7 +499,7 @@ class BrightcoveNewIE(InfoExtractor):
                      </video>.*?
                      <script[^>]+
                          src=["\'](?:https?:)?//players\.brightcove\.net/
-                        (\d+)/([\da-f-]+)_([^/]+)/index(?:\.min)?\.js
+                        (\d+)/([^/]+)_([^/]+)/index(?:\.min)?\.js
                  ''', webpage):
              entries.append(
                  'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
@@ -476,23 +536,26 @@ class BrightcoveNewIE(InfoExtractor):
              })
          except ExtractorError as e:
              if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
-                json_data = self._parse_json(e.cause.read().decode(), video_id)
-                raise ExtractorError(json_data[0]['message'], expected=True)
+                json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
+                raise ExtractorError(
+                    json_data.get('message') or json_data['error_code'], expected=True)
              raise
  
-        title = json_data['name']
+        title = json_data['name'].strip()
  
          formats = []
          for source in json_data.get('sources', []):
              container = source.get('container')
-            source_type = source.get('type')
+            ext = mimetype2ext(source.get('type'))
              src = source.get('src')
-            if source_type == 'application/x-mpegURL' or container == 'M2TS':
+            if ext == 'ism':
+                continue
+            elif ext == 'm3u8' or container == 'M2TS':
                  if not src:
                      continue
                  formats.extend(self._extract_m3u8_formats(
-                    src, video_id, 'mp4', m3u8_id='hls', fatal=False))
-            elif source_type == 'application/dash+xml':
+                    src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+            elif ext == 'mpd':
                  if not src:
                      continue
                  formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False))
@@ -508,7 +571,7 @@ class BrightcoveNewIE(InfoExtractor):
                      'tbr': tbr,
                      'filesize': int_or_none(source.get('size')),
                      'container': container,
-                    'ext': container.lower(),
+                    'ext': ext or container.lower(),
                  }
                  if width == 0 and height == 0:
                      f.update({
@@ -533,7 +596,7 @@ class BrightcoveNewIE(InfoExtractor):
                      f.update({
                          'url': src or streaming_src,
                          'format_id': build_format_id('http' if src else 'http-streaming'),
-                        'preference': 2 if src else 1,
+                        'source_preference': 0 if src else -1,
                      })
                  else:
                      f.update({
@@ -542,22 +605,37 @@ class BrightcoveNewIE(InfoExtractor):
                          'format_id': build_format_id('rtmp'),
                      })
                  formats.append(f)
+
+        errors = json_data.get('errors')
+        if not formats and errors:
+            error = errors[0]
+            raise ExtractorError(
+                error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
+
          self._sort_formats(formats)
  
-        description = json_data.get('description')
-        thumbnail = json_data.get('thumbnail')
-        timestamp = parse_iso8601(json_data.get('published_at'))
+        subtitles = {}
+        for text_track in json_data.get('text_tracks', []):
+            if text_track.get('src'):
+                subtitles.setdefault(text_track.get('srclang'), []).append({
+                    'url': text_track['src'],
+                })
+
+        is_live = False
          duration = float_or_none(json_data.get('duration'), 1000)
-        tags = json_data.get('tags', [])
+        if duration and duration < 0:
+            is_live = True
  
          return {
              'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
+            'title': self._live_title(title) if is_live else title,
+            'description': clean_html(json_data.get('description')),
+            'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
              'duration': duration,
-            'timestamp': timestamp,
+            'timestamp': parse_iso8601(json_data.get('published_at')),
              'uploader_id': account_id,
              'formats': formats,
-            'tags': tags,
+            'subtitles': subtitles,
+            'tags': json_data.get('tags', []),
+            'is_live': is_live,
          }
diff --git a/youtube_dl/extractor/buzzfeed.py b/youtube_dl/extractor/buzzfeed.py

index df503ecc0f50283f0cc77a867353912a47eee5dd..75fa92d7cfc0204f4539e10c762585b2537abbb6 100644 (file)
--- a/youtube_dl/extractor/buzzfeed.py
+++ b/youtube_dl/extractor/buzzfeed.py
@@ -5,6 +5,7 @@ import json
  import re
  
  from .common import InfoExtractor
+from .facebook import FacebookIE
  
  
  class BuzzFeedIE(InfoExtractor):
@@ -20,11 +21,11 @@ class BuzzFeedIE(InfoExtractor):
              'info_dict': {
                  'id': 'aVCR29aE_OQ',
                  'ext': 'mp4',
+                'title': 'Angry Ram destroys a punching bag..',
+                'description': 'md5:c59533190ef23fd4458a5e8c8c872345',
                  'upload_date': '20141024',
                  'uploader_id': 'Buddhanz1',
-                'description': 'He likes to stay in shape with his heavy bag, he wont stop until its on the ground\n\nFollow Angry Ram on Facebook for regular updates -\nhttps://www.facebook.com/pages/Angry-Ram/1436897249899558?ref=hl',
-                'uploader': 'Buddhanz',
-                'title': 'Angry Ram destroys a punching bag',
+                'uploader': 'Angry Ram',
              }
          }]
      }, {
@@ -41,13 +42,30 @@ class BuzzFeedIE(InfoExtractor):
              'info_dict': {
                  'id': 'mVmBL8B-In0',
                  'ext': 'mp4',
+                'title': 're:Munchkin the Teddy Bear gets her exercise',
+                'description': 'md5:28faab95cda6e361bcff06ec12fc21d8',
                  'upload_date': '20141124',
                  'uploader_id': 'CindysMunchkin',
-                'description': 're:© 2014 Munchkin the',
                  'uploader': 're:^Munchkin the',
-                'title': 're:Munchkin the Teddy Bear gets her exercise',
              },
          }]
+    }, {
+        'url': 'http://www.buzzfeed.com/craigsilverman/the-most-adorable-crash-landing-ever#.eq7pX0BAmK',
+        'info_dict': {
+            'id': 'the-most-adorable-crash-landing-ever',
+            'title': 'Watch This Baby Goose Make The Most Adorable Crash Landing',
+            'description': 'This gosling knows how to stick a landing.',
+        },
+        'playlist': [{
+            'md5': '763ca415512f91ca62e4621086900a23',
+            'info_dict': {
+                'id': '971793786185728',
+                'ext': 'mp4',
+                'title': 'We set up crash pads so that the goslings on our roof would have a safe landi...',
+                'uploader': 'Calgary Outdoor Centre-University of Calgary',
+            },
+        }],
+        'add_ie': ['Facebook'],
      }]
  
      def _real_extract(self, url):
@@ -66,6 +84,10 @@ class BuzzFeedIE(InfoExtractor):
                  continue
              entries.append(self.url_result(video['url']))
  
+        facebook_url = FacebookIE._extract_url(webpage)
+        if facebook_url:
+            entries.append(self.url_result(facebook_url))
+
          return {
              '_type': 'playlist',
              'id': playlist_id,
diff --git a/youtube_dl/extractor/byutv.py b/youtube_dl/extractor/byutv.py

index dda98059e9041c651de5a211fccb2c106b11bb75..4be175d7039dd845f7c961af552bc1153b73598e 100644 (file)
--- a/youtube_dl/extractor/byutv.py
+++ b/youtube_dl/extractor/byutv.py
@@ -1,6 +1,5 @@
  from __future__ import unicode_literals
  
-import json
  import re
  
  from .common import InfoExtractor
@@ -8,42 +7,87 @@ from ..utils import ExtractorError
  
  
  class BYUtvIE(InfoExtractor):
-    _VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/(?!event/)(?P<id>[0-9a-f-]+)(?:/(?P<display_id>[^/?#&]+))?'
+    _TESTS = [{
          'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5',
          'info_dict': {
-            'id': 'studio-c-season-5-episode-5',
+            'id': '6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
+            'display_id': 'studio-c-season-5-episode-5',
              'ext': 'mp4',
-            'description': 'md5:e07269172baff037f8e8bf9956bc9747',
              'title': 'Season 5 Episode 5',
+            'description': 'md5:e07269172baff037f8e8bf9956bc9747',
              'thumbnail': 're:^https?://.*\.jpg$',
              'duration': 1486.486,
          },
          'params': {
              'skip_download': True,
-        }
-    }
+        },
+        'add_ie': ['Ooyala'],
+    }, {
+        'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('video_id')
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id') or video_id
  
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(url, display_id)
          episode_code = self._search_regex(
              r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
-        episode_json = re.sub(
-            r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', episode_code)
-        ep = json.loads(episode_json)
-
-        if ep['providerType'] == 'Ooyala':
-            return {
-                '_type': 'url_transparent',
-                'ie_key': 'Ooyala',
-                'url': 'ooyala:%s' % ep['providerId'],
-                'id': video_id,
-                'title': ep['title'],
-                'description': ep.get('description'),
-                'thumbnail': ep.get('imageThumbnail'),
-            }
-        else:
+
+        ep = self._parse_json(
+            episode_code, display_id, transform_source=lambda s:
+            re.sub(r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', s))
+
+        if ep['providerType'] != 'Ooyala':
              raise ExtractorError('Unsupported provider %s' % ep['provider'])
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'Ooyala',
+            'url': 'ooyala:%s' % ep['providerId'],
+            'id': video_id,
+            'display_id': display_id,
+            'title': ep['title'],
+            'description': ep.get('description'),
+            'thumbnail': ep.get('imageThumbnail'),
+        }
+
+
+class BYUtvEventIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/event/(?P<id>[0-9a-f-]+)'
+    _TEST = {
+        'url': 'http://www.byutv.org/watch/event/29941b9b-8bf6-48d2-aebf-7a87add9e34b',
+        'info_dict': {
+            'id': '29941b9b-8bf6-48d2-aebf-7a87add9e34b',
+            'ext': 'mp4',
+            'title': 'Toledo vs. BYU (9/30/16)',
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': ['Ooyala'],
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        ooyala_id = self._search_regex(
+            r'providerId\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
+            webpage, 'ooyala id', group='id')
+
+        title = self._search_regex(
+            r'class=["\']description["\'][^>]*>\s*<h1>([^<]+)</h1>', webpage,
+            'title').strip()
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'Ooyala',
+            'url': 'ooyala:%s' % ooyala_id,
+            'id': video_id,
+            'title': title,
+        }
diff --git a/youtube_dl/extractor/camdemy.py b/youtube_dl/extractor/camdemy.py

index 6ffbeabd371fd6f80a9ead1d23762f760a13ba2f..d4e6fbdce029b8267450b9d50d3b41556a47664d 100644 (file)
--- a/youtube_dl/extractor/camdemy.py
+++ b/youtube_dl/extractor/camdemy.py
@@ -1,7 +1,6 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import datetime
  import re
  
  from .common import InfoExtractor
@@ -10,8 +9,10 @@ from ..compat import (
      compat_urlparse,
  )
  from ..utils import (
-    parse_iso8601,
+    clean_html,
+    parse_duration,
      str_to_int,
+    unified_strdate,
  )
  
  
@@ -26,14 +27,14 @@ class CamdemyIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Ch1-1 Introduction, Signals (02-23-2012)',
              'thumbnail': 're:^https?://.*\.jpg$',
-            'description': '',
              'creator': 'ss11spring',
+            'duration': 1591,
              'upload_date': '20130114',
-            'timestamp': 1358154556,
              'view_count': int,
          }
      }, {
          # With non-empty description
+        # webpage returns "No permission or not login"
          'url': 'http://www.camdemy.com/media/13885',
          'md5': '4576a3bb2581f86c61044822adbd1249',
          'info_dict': {
@@ -41,70 +42,77 @@ class CamdemyIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'EverCam + Camdemy QuickStart',
              'thumbnail': 're:^https?://.*\.jpg$',
-            'description': 'md5:050b62f71ed62928f8a35f1a41e186c9',
+            'description': 'md5:2a9f989c2b153a2342acee579c6e7db6',
              'creator': 'evercam',
-            'upload_date': '20140620',
-            'timestamp': 1403271569,
+            'duration': 318,
          }
      }, {
-        # External source
+        # External source (YouTube)
          'url': 'http://www.camdemy.com/media/14842',
-        'md5': '50e1c3c3aa233d3d7b7daa2fa10b1cf7',
          'info_dict': {
              'id': '2vsYQzNIsJo',
              'ext': 'mp4',
+            'title': 'Excel 2013 Tutorial - How to add Password Protection',
+            'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
              'upload_date': '20130211',
              'uploader': 'Hun Kim',
-            'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
              'uploader_id': 'hunkimtutorials',
-            'title': 'Excel 2013 Tutorial - How to add Password Protection',
-        }
+        },
+        'params': {
+            'skip_download': True,
+        },
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        page = self._download_webpage(url, video_id)
+
+        webpage = self._download_webpage(url, video_id)
  
          src_from = self._html_search_regex(
-            r"<div class='srcFrom'>Source: <a title='([^']+)'", page,
-            'external source', default=None)
+            r"class=['\"]srcFrom['\"][^>]*>Sources?(?:\s+from)?\s*:\s*<a[^>]+(?:href|title)=(['\"])(?P<url>(?:(?!\1).)+)\1",
+            webpage, 'external source', default=None, group='url')
          if src_from:
              return self.url_result(src_from)
  
          oembed_obj = self._download_json(
              'http://www.camdemy.com/oembed/?format=json&url=' + url, video_id)
  
+        title = oembed_obj['title']
          thumb_url = oembed_obj['thumbnail_url']
          video_folder = compat_urlparse.urljoin(thumb_url, 'video/')
          file_list_doc = self._download_xml(
              compat_urlparse.urljoin(video_folder, 'fileList.xml'),
-            video_id, 'Filelist XML')
+            video_id, 'Downloading filelist XML')
          file_name = file_list_doc.find('./video/item/fileName').text
          video_url = compat_urlparse.urljoin(video_folder, file_name)
  
-        timestamp = parse_iso8601(self._html_search_regex(
-            r"<div class='title'>Posted\s*:</div>\s*<div class='value'>([^<>]+)<",
-            page, 'creation time', fatal=False),
-            delimiter=' ', timezone=datetime.timedelta(hours=8))
-        view_count = str_to_int(self._html_search_regex(
-            r"<div class='title'>Views\s*:</div>\s*<div class='value'>([^<>]+)<",
-            page, 'view count', fatal=False))
+        # Some URLs return "No permission or not login" in a webpage despite being
+        # freely available via oembed JSON URL (e.g. http://www.camdemy.com/media/13885)
+        upload_date = unified_strdate(self._search_regex(
+            r'>published on ([^<]+)<', webpage,
+            'upload date', default=None))
+        view_count = str_to_int(self._search_regex(
+            r'role=["\']viewCnt["\'][^>]*>([\d,.]+) views',
+            webpage, 'view count', default=None))
+        description = self._html_search_meta(
+            'description', webpage, default=None) or clean_html(
+            oembed_obj.get('description'))
  
          return {
              'id': video_id,
              'url': video_url,
-            'title': oembed_obj['title'],
+            'title': title,
              'thumbnail': thumb_url,
-            'description': self._html_search_meta('description', page),
-            'creator': oembed_obj['author_name'],
-            'duration': oembed_obj['duration'],
-            'timestamp': timestamp,
+            'description': description,
+            'creator': oembed_obj.get('author_name'),
+            'duration': parse_duration(oembed_obj.get('duration')),
+            'upload_date': upload_date,
              'view_count': view_count,
          }
  
  
  class CamdemyFolderIE(InfoExtractor):
-    _VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?camdemy\.com/folder/(?P<id>\d+)'
      _TESTS = [{
          # links with trailing slash
          'url': 'http://www.camdemy.com/folder/450',
diff --git a/youtube_dl/extractor/camwithher.py b/youtube_dl/extractor/camwithher.py

new file mode 100644 (file)

index 0000000..afbc5ea
--- /dev/null
+++ b/youtube_dl/extractor/camwithher.py
@@ -0,0 +1,87 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    unified_strdate,
+)
+
+
+class CamWithHerIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?camwithher\.tv/view_video\.php\?.*\bviewkey=(?P<id>\w+)'
+
+    _TESTS = [{
+        'url': 'http://camwithher.tv/view_video.php?viewkey=6e9a24e2c0e842e1f177&page=&viewtype=&category=',
+        'info_dict': {
+            'id': '5644',
+            'ext': 'flv',
+            'title': 'Periscope Tease',
+            'description': 'In the clouds teasing on periscope to my favorite song',
+            'duration': 240,
+            'view_count': int,
+            'comment_count': int,
+            'uploader': 'MileenaK',
+            'upload_date': '20160322',
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://camwithher.tv/view_video.php?viewkey=6dfd8b7c97531a459937',
+        'only_matching': True,
+    }, {
+        'url': 'http://camwithher.tv/view_video.php?page=&viewkey=6e9a24e2c0e842e1f177&viewtype=&category=',
+        'only_matching': True,
+    }, {
+        'url': 'http://camwithher.tv/view_video.php?viewkey=b6c3b5bea9515d1a1fc4&page=&viewtype=&category=mv',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        flv_id = self._html_search_regex(
+            r'<a[^>]+href=["\']/download/\?v=(\d+)', webpage, 'video id')
+
+        # Video URL construction algorithm is reverse-engineered from cwhplayer.swf
+        rtmp_url = 'rtmp://camwithher.tv/clipshare/%s' % (
+            ('mp4:%s.mp4' % flv_id) if int(flv_id) > 2010 else flv_id)
+
+        title = self._html_search_regex(
+            r'<div[^>]+style="float:left"[^>]*>\s*<h2>(.+?)</h2>', webpage, 'title')
+        description = self._html_search_regex(
+            r'>Description:</span>(.+?)</div>', webpage, 'description', default=None)
+
+        runtime = self._search_regex(
+            r'Runtime\s*:\s*(.+?) \|', webpage, 'duration', default=None)
+        if runtime:
+            runtime = re.sub(r'[\s-]', '', runtime)
+        duration = parse_duration(runtime)
+        view_count = int_or_none(self._search_regex(
+            r'Views\s*:\s*(\d+)', webpage, 'view count', default=None))
+        comment_count = int_or_none(self._search_regex(
+            r'Comments\s*:\s*(\d+)', webpage, 'comment count', default=None))
+
+        uploader = self._search_regex(
+            r'Added by\s*:\s*<a[^>]+>([^<]+)</a>', webpage, 'uploader', default=None)
+        upload_date = unified_strdate(self._search_regex(
+            r'Added on\s*:\s*([\d-]+)', webpage, 'upload date', default=None))
+
+        return {
+            'id': flv_id,
+            'url': rtmp_url,
+            'ext': 'flv',
+            'no_resume': True,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'view_count': view_count,
+            'comment_count': comment_count,
+            'uploader': uploader,
+            'upload_date': upload_date,
+        }
diff --git a/youtube_dl/extractor/canalplus.py b/youtube_dl/extractor/canalplus.py

index 25b2d4efe5d54e1c3264f906a3105ad05dd2ca3f..1c3c41d26619ec2fa347c4a75093b2a1cf7003a2 100644 (file)
--- a/youtube_dl/extractor/canalplus.py
+++ b/youtube_dl/extractor/canalplus.py
@@ -1,86 +1,112 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_urllib_parse_urlparse
  from ..utils import (
+    dict_get,
      ExtractorError,
      HEADRequest,
-    unified_strdate,
-    url_basename,
-    qualities,
      int_or_none,
+    qualities,
+    remove_end,
+    unified_strdate,
  )
  
  
  class CanalplusIE(InfoExtractor):
      IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
-    _VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv|itele\.fr)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
+    _VALID_URL = r'''(?x)
+                        https?://
+                            (?:
+                                (?:
+                                    (?:(?:www|m)\.)?canalplus\.fr|
+                                    (?:www\.)?piwiplus\.fr|
+                                    (?:www\.)?d8\.tv|
+                                    (?:www\.)?c8\.fr|
+                                    (?:www\.)?d17\.tv|
+                                    (?:www\.)?itele\.fr
+                                )/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
+                                player\.canalplus\.fr/#/(?P<id>\d+)
+                            )
+
+                    '''
      _VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
      _SITE_ID_MAP = {
-        'canalplus.fr': 'cplus',
-        'piwiplus.fr': 'teletoon',
-        'd8.tv': 'd8',
-        'itele.fr': 'itele',
+        'canalplus': 'cplus',
+        'piwiplus': 'teletoon',
+        'd8': 'd8',
+        'c8': 'd8',
+        'd17': 'd17',
+        'itele': 'itele',
      }
  
      _TESTS = [{
-        'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1263092',
-        'md5': '12164a6f14ff6df8bd628e8ba9b10b78',
+        'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
          'info_dict': {
-            'id': '1263092',
+            'id': '1405510',
+            'display_id': 'pid1830-c-zapping',
              'ext': 'mp4',
-            'title': 'Le Zapping - 13/05/15',
-            'description': 'md5:09738c0d06be4b5d06a0940edb0da73f',
-            'upload_date': '20150513',
+            'title': 'Zapping - 02/07/2016',
+            'description': 'Le meilleur de toutes les chaînes, tous les jours',
+            'upload_date': '20160702',
          },
      }, {
          'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
          'info_dict': {
              'id': '1108190',
-            'ext': 'flv',
-            'title': 'Le labyrinthe - Boing super ranger',
+            'display_id': 'pid1405-le-labyrinthe-boing-super-ranger',
+            'ext': 'mp4',
+            'title': 'BOING SUPER RANGER - Ep : Le labyrinthe',
              'description': 'md5:4cea7a37153be42c1ba2c1d3064376ff',
              'upload_date': '20140724',
          },
          'skip': 'Only works from France',
      }, {
-        'url': 'http://www.d8.tv/d8-docs-mags/pid6589-d8-campagne-intime.html',
+        'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html',
+        'md5': '4b47b12b4ee43002626b97fad8fb1de5',
          'info_dict': {
-            'id': '966289',
-            'ext': 'flv',
-            'title': 'Campagne intime - Documentaire exceptionnel',
-            'description': 'md5:d2643b799fb190846ae09c61e59a859f',
-            'upload_date': '20131108',
+            'id': '1420213',
+            'display_id': 'pid6318-videos-integrales',
+            'ext': 'mp4',
+            'title': 'TPMP ! Même le matin - Les 35H de Baba - 14/10/2016',
+            'description': 'md5:f96736c1b0ffaa96fd5b9e60ad871799',
+            'upload_date': '20161014',
          },
-        'skip': 'videos get deleted after a while',
+        'skip': 'Only works from France',
      }, {
-        'url': 'http://www.itele.fr/france/video/aubervilliers-un-lycee-en-colere-111559',
-        'md5': '38b8f7934def74f0d6f3ba6c036a5f82',
+        'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
          'info_dict': {
-            'id': '1213714',
+            'id': '1420176',
+            'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
              'ext': 'mp4',
-            'title': 'Aubervilliers : un lycée en colère - Le 11/02/2015 à 06h45',
-            'description': 'md5:8216206ec53426ea6321321f3b3c16db',
-            'upload_date': '20150211',
+            'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ',
+            'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
+            'upload_date': '20161014',
          },
+    }, {
+        'url': 'http://m.canalplus.fr/?vid=1398231',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.groupdict().get('id')
  
-        site_id = self._SITE_ID_MAP[mobj.group('site') or 'canal']
+        site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
  
          # Beware, some subclasses do not define an id group
-        display_id = url_basename(mobj.group('path'))
+        display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html')
  
-        if video_id is None:
-            webpage = self._download_webpage(url, display_id)
-            video_id = self._search_regex(
-                [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)', r'id=["\']canal_video_player(?P<id>\d+)'],
-                webpage, 'video id', group='id')
+        webpage = self._download_webpage(url, display_id)
+        video_id = self._search_regex(
+            [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
+             r'id=["\']canal_video_player(?P<id>\d+)'],
+            webpage, 'video id', group='id')
  
          info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
          video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
diff --git a/youtube_dl/extractor/canvas.py b/youtube_dl/extractor/canvas.py

index ec6d24d96cac80379e15eba267829f89c7b53df7..d183d5d527fb8ab4163b16fcaffd0aeedbf0dd0c 100644 (file)
--- a/youtube_dl/extractor/canvas.py
+++ b/youtube_dl/extractor/canvas.py
@@ -1,11 +1,13 @@
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import float_or_none
  
  
  class CanvasIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
          'md5': 'ea838375a547ac787d4064d8c7860a6c',
@@ -38,22 +40,42 @@ class CanvasIE(InfoExtractor):
          'params': {
              'skip_download': True,
          }
+    }, {
+        'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
+        'info_dict': {
+            'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
+            'display_id': 'herbekijk-sorry-voor-alles',
+            'ext': 'mp4',
+            'title': 'Herbekijk Sorry voor alles',
+            'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 3788.06,
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        site_id, display_id = mobj.group('site_id'), mobj.group('id')
  
          webpage = self._download_webpage(url, display_id)
  
-        title = self._search_regex(
+        title = (self._search_regex(
              r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
-            webpage, 'title', default=None) or self._og_search_title(webpage)
+            webpage, 'title', default=None) or self._og_search_title(
+            webpage)).strip()
  
          video_id = self._html_search_regex(
-            r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id')
+            r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id', group='id')
  
          data = self._download_json(
-            'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id)
+            'https://mediazone.vrt.be/api/v1/%s/assets/%s'
+            % (site_id, video_id), display_id)
  
          formats = []
          for target in data['targetUrls']:
diff --git a/youtube_dl/extractor/carambatv.py b/youtube_dl/extractor/carambatv.py

new file mode 100644 (file)

index 0000000..66c0f90
--- /dev/null
+++ b/youtube_dl/extractor/carambatv.py
@@ -0,0 +1,102 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    float_or_none,
+    int_or_none,
+    try_get,
+)
+
+from .videomore import VideomoreIE
+
+
+class CarambaTVIE(InfoExtractor):
+    _VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://video1.carambatv.ru/v/191910501',
+        'md5': '2f4a81b7cfd5ab866ee2d7270cb34a2a',
+        'info_dict': {
+            'id': '191910501',
+            'ext': 'mp4',
+            'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 2678.31,
+        },
+    }, {
+        'url': 'carambatv:191910501',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'http://video1.carambatv.ru/v/%s/videoinfo.js' % video_id,
+            video_id)
+
+        title = video['title']
+
+        base_url = video.get('video') or 'http://video1.carambatv.ru/v/%s/' % video_id
+
+        formats = [{
+            'url': base_url + f['fn'],
+            'height': int_or_none(f.get('height')),
+            'format_id': '%sp' % f['height'] if f.get('height') else None,
+        } for f in video['qualities'] if f.get('fn')]
+        self._sort_formats(formats)
+
+        thumbnail = video.get('splash')
+        duration = float_or_none(try_get(
+            video, lambda x: x['annotations'][0]['end_time'], compat_str))
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'formats': formats,
+        }
+
+
+class CarambaTVPageIE(InfoExtractor):
+    _VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/',
+        'md5': 'a49fb0ec2ad66503eeb46aac237d3c86',
+        'info_dict': {
+            'id': '475222',
+            'ext': 'flv',
+            'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
+            'thumbnail': 're:^https?://.*\.jpg',
+            # duration reported by videomore is incorrect
+            'duration': int,
+        },
+        'add_ie': [VideomoreIE.ie_key()],
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        videomore_url = VideomoreIE._extract_url(webpage)
+        if videomore_url:
+            title = self._og_search_title(webpage)
+            return {
+                '_type': 'url_transparent',
+                'url': videomore_url,
+                'ie_key': VideomoreIE.ie_key(),
+                'title': title,
+            }
+
+        video_url = self._og_search_property('video:iframe', webpage, default=None)
+
+        if not video_url:
+            video_id = self._search_regex(
+                r'(?:video_id|crmb_vuid)\s*[:=]\s*["\']?(\d+)',
+                webpage, 'video id')
+            video_url = 'carambatv:%s' % video_id
+
+        return self.url_result(video_url, CarambaTVIE.ie_key())
diff --git a/youtube_dl/extractor/cartoonnetwork.py b/youtube_dl/extractor/cartoonnetwork.py

new file mode 100644 (file)

index 0000000..086ec90
--- /dev/null
+++ b/youtube_dl/extractor/cartoonnetwork.py
@@ -0,0 +1,42 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .turner import TurnerBaseIE
+
+
+class CartoonNetworkIE(TurnerBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html'
+    _TEST = {
+        'url': 'http://www.cartoonnetwork.com/video/teen-titans-go/starfire-the-cat-lady-clip.html',
+        'info_dict': {
+            'id': '8a250ab04ed07e6c014ef3f1e2f9016c',
+            'ext': 'mp4',
+            'title': 'Starfire the Cat Lady',
+            'description': 'Robin decides to become a cat so that Starfire will finally love him.',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        id_type, video_id = re.search(r"_cnglobal\.cvp(Video|Title)Id\s*=\s*'([^']+)';", webpage).groups()
+        query = ('id' if id_type == 'Video' else 'titleId') + '=' + video_id
+        return self._extract_cvp_info(
+            'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, {
+                'secure': {
+                    'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
+                    'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do',
+                },
+            }, {
+                'url': url,
+                'site_name': 'CartoonNetwork',
+                'auth_required': self._search_regex(
+                    r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
+                    webpage, 'auth required', default='false') == 'true',
+            })
diff --git a/youtube_dl/extractor/cbc.py b/youtube_dl/extractor/cbc.py

index d8aa31038bfb85f6e5123fe8e7831a2eb22c0c45..d71fddf58a068461cd2d377b31e4c3981d6c2b3d 100644 (file)
--- a/youtube_dl/extractor/cbc.py
+++ b/youtube_dl/extractor/cbc.py
@@ -4,64 +4,92 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import js_to_json
+from ..compat import compat_str
+from ..utils import (
+    js_to_json,
+    smuggle_url,
+    try_get,
+    xpath_text,
+    xpath_element,
+    xpath_with_ns,
+    find_xpath_attr,
+    parse_iso8601,
+    parse_age_limit,
+    int_or_none,
+    ExtractorError,
+)
  
  
  class CBCIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?:[^/]+/)+(?P<id>[^/?#]+)'
+    IE_NAME = 'cbc.ca'
+    _VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?!player/)(?:[^/]+/)+(?P<id>[^/?#]+)'
      _TESTS = [{
          # with mediaId
          'url': 'http://www.cbc.ca/22minutes/videos/clips-season-23/don-cherry-play-offs',
+        'md5': '97e24d09672fc4cf56256d6faa6c25bc',
          'info_dict': {
              'id': '2682904050',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Don Cherry – All-Stars',
              'description': 'Don Cherry has a bee in his bonnet about AHL player John Scott because that guy’s got heart.',
-            'timestamp': 1454475540,
+            'timestamp': 1454463000,
              'upload_date': '20160203',
+            'uploader': 'CBCC-NEW',
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
+        'skip': 'Geo-restricted to Canada',
+    }, {
+        # with clipId, feed available via tpfeed.cbc.ca and feed.theplatform.com
+        'url': 'http://www.cbc.ca/22minutes/videos/22-minutes-update/22-minutes-update-episode-4',
+        'md5': '162adfa070274b144f4fdc3c3b8207db',
+        'info_dict': {
+            'id': '2414435309',
+            'ext': 'mp4',
+            'title': '22 Minutes Update: What Not To Wear Quebec',
+            'description': "This week's latest Canadian top political story is What Not To Wear Quebec.",
+            'upload_date': '20131025',
+            'uploader': 'CBCC-NEW',
+            'timestamp': 1382717907,
          },
      }, {
-        # with clipId
+        # with clipId, feed only available via tpfeed.cbc.ca
          'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
+        'md5': '0274a90b51a9b4971fe005c63f592f12',
          'info_dict': {
              'id': '2487345465',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Robin Williams freestyles on 90 Minutes Live',
              'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
-            'upload_date': '19700101',
-        },
-        'params': {
-            # rtmp download
-            'skip_download': True,
+            'upload_date': '19780210',
+            'uploader': 'CBCC-NEW',
+            'timestamp': 255977160,
          },
      }, {
          # multiple iframes
          'url': 'http://www.cbc.ca/natureofthings/blog/birds-eye-view-from-vancouvers-burrard-street-bridge-how-we-got-the-shot',
          'playlist': [{
+            'md5': '377572d0b49c4ce0c9ad77470e0b96b4',
              'info_dict': {
                  'id': '2680832926',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'An Eagle\'s-Eye View Off Burrard Bridge',
                  'description': 'Hercules the eagle flies from Vancouver\'s Burrard Bridge down to a nearby park with a mini-camera strapped to his back.',
-                'upload_date': '19700101',
+                'upload_date': '20160201',
+                'timestamp': 1454342820,
+                'uploader': 'CBCC-NEW',
              },
          }, {
+            'md5': '415a0e3f586113894174dfb31aa5bb1a',
              'info_dict': {
                  'id': '2658915080',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Fly like an eagle!',
                  'description': 'Eagle equipped with a mini camera flies from the world\'s tallest tower',
-                'upload_date': '19700101',
+                'upload_date': '20150315',
+                'timestamp': 1426443984,
+                'uploader': 'CBCC-NEW',
              },
          }],
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
+        'skip': 'Geo-restricted to Canada',
      }]
  
      @classmethod
@@ -79,9 +107,15 @@ class CBCIE(InfoExtractor):
              media_id = player_info.get('mediaId')
              if not media_id:
                  clip_id = player_info['clipId']
-                media_id = self._download_json(
-                    'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
-                    clip_id)['entries'][0]['id'].split('/')[-1]
+                feed = self._download_json(
+                    'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id,
+                    clip_id, fatal=False)
+                if feed:
+                    media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
+                if not media_id:
+                    media_id = self._download_json(
+                        'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
+                        clip_id)['entries'][0]['id'].split('/')[-1]
              return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
          else:
              entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
@@ -89,25 +123,219 @@ class CBCIE(InfoExtractor):
  
  
  class CBCPlayerIE(InfoExtractor):
+    IE_NAME = 'cbc.ca:player'
      _VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.cbc.ca/player/play/2683190193',
+        'md5': '64d25f841ddf4ddb28a235338af32e2c',
          'info_dict': {
              'id': '2683190193',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Gerry Runs a Sweat Shop',
              'description': 'md5:b457e1c01e8ff408d9d801c1c2cd29b0',
-            'timestamp': 1455067800,
+            'timestamp': 1455071400,
              'upload_date': '20160210',
+            'uploader': 'CBCC-NEW',
+        },
+        'skip': 'Geo-restricted to Canada',
+    }, {
+        # Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
+        'url': 'http://www.cbc.ca/player/play/2657631896',
+        'md5': 'e5e708c34ae6fca156aafe17c43e8b75',
+        'info_dict': {
+            'id': '2657631896',
+            'ext': 'mp3',
+            'title': 'CBC Montreal is organizing its first ever community hackathon!',
+            'description': 'The modern technology we tend to depend on so heavily, is never without it\'s share of hiccups and headaches. Next weekend - CBC Montreal will be getting members of the public for its first Hackathon.',
+            'timestamp': 1425704400,
+            'upload_date': '20150307',
+            'uploader': 'CBCC-NEW',
+        },
+    }, {
+        # available only when we add `formats=MPEG4,FLV,MP3` to theplatform url
+        'url': 'http://www.cbc.ca/player/play/2164402062',
+        'md5': '17a61eb813539abea40618d6323a7f82',
+        'info_dict': {
+            'id': '2164402062',
+            'ext': 'flv',
+            'title': 'Cancer survivor four times over',
+            'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
+            'timestamp': 1320410746,
+            'upload_date': '20111104',
+            'uploader': 'CBCC-NEW',
+        },
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(
+                'http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/%s?mbr=true&formats=MPEG4,FLV,MP3' % video_id, {
+                    'force_smil_url': True
+                }),
+            'id': video_id,
+        }
+
+
+class CBCWatchBaseIE(InfoExtractor):
+    _device_id = None
+    _device_token = None
+    _API_BASE_URL = 'https://api-cbc.cloud.clearleap.com/cloffice/client/'
+    _NS_MAP = {
+        'media': 'http://search.yahoo.com/mrss/',
+        'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
+    }
+
+    def _call_api(self, path, video_id):
+        url = path if path.startswith('http') else self._API_BASE_URL + path
+        result = self._download_xml(url, video_id, headers={
+            'X-Clearleap-DeviceId': self._device_id,
+            'X-Clearleap-DeviceToken': self._device_token,
+        })
+        error_message = xpath_text(result, 'userMessage') or xpath_text(result, 'systemMessage')
+        if error_message:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message))
+        return result
+
+    def _real_initialize(self):
+        if not self._device_id or not self._device_token:
+            device = self._downloader.cache.load('cbcwatch', 'device') or {}
+            self._device_id, self._device_token = device.get('id'), device.get('token')
+            if not self._device_id or not self._device_token:
+                result = self._download_xml(
+                    self._API_BASE_URL + 'device/register',
+                    None, data=b'<device><type>web</type></device>')
+                self._device_id = xpath_text(result, 'deviceId', fatal=True)
+                self._device_token = xpath_text(result, 'deviceToken', fatal=True)
+                self._downloader.cache.store(
+                    'cbcwatch', 'device', {
+                        'id': self._device_id,
+                        'token': self._device_token,
+                    })
+
+    def _parse_rss_feed(self, rss):
+        channel = xpath_element(rss, 'channel', fatal=True)
+
+        def _add_ns(path):
+            return xpath_with_ns(path, self._NS_MAP)
+
+        entries = []
+        for item in channel.findall('item'):
+            guid = xpath_text(item, 'guid', fatal=True)
+            title = xpath_text(item, 'title', fatal=True)
+
+            media_group = xpath_element(item, _add_ns('media:group'), fatal=True)
+            content = xpath_element(media_group, _add_ns('media:content'), fatal=True)
+            content_url = content.attrib['url']
+
+            thumbnails = []
+            for thumbnail in media_group.findall(_add_ns('media:thumbnail')):
+                thumbnail_url = thumbnail.get('url')
+                if not thumbnail_url:
+                    continue
+                thumbnails.append({
+                    'id': thumbnail.get('profile'),
+                    'url': thumbnail_url,
+                    'width': int_or_none(thumbnail.get('width')),
+                    'height': int_or_none(thumbnail.get('height')),
+                })
+
+            timestamp = None
+            release_date = find_xpath_attr(
+                item, _add_ns('media:credit'), 'role', 'releaseDate')
+            if release_date is not None:
+                timestamp = parse_iso8601(release_date.text)
+
+            entries.append({
+                '_type': 'url_transparent',
+                'url': content_url,
+                'id': guid,
+                'title': title,
+                'description': xpath_text(item, 'description'),
+                'timestamp': timestamp,
+                'duration': int_or_none(content.get('duration')),
+                'age_limit': parse_age_limit(xpath_text(item, _add_ns('media:rating'))),
+                'episode': xpath_text(item, _add_ns('clearleap:episode')),
+                'episode_number': int_or_none(xpath_text(item, _add_ns('clearleap:episodeInSeason'))),
+                'series': xpath_text(item, _add_ns('clearleap:series')),
+                'season_number': int_or_none(xpath_text(item, _add_ns('clearleap:season'))),
+                'thumbnails': thumbnails,
+                'ie_key': 'CBCWatchVideo',
+            })
+
+        return self.playlist_result(
+            entries, xpath_text(channel, 'guid'),
+            xpath_text(channel, 'title'),
+            xpath_text(channel, 'description'))
+
+
+class CBCWatchVideoIE(CBCWatchBaseIE):
+    IE_NAME = 'cbc.ca:watch:video'
+    _VALID_URL = r'https?://api-cbc\.cloud\.clearleap\.com/cloffice/client/web/play/?\?.*?\bcontentId=(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        result = self._call_api(url, video_id)
+
+        m3u8_url = xpath_text(result, 'url', fatal=True)
+        formats = self._extract_m3u8_formats(re.sub(r'/([^/]+)/[^/?]+\.m3u8', r'/\1/\1.m3u8', m3u8_url), video_id, 'mp4', fatal=False)
+        if len(formats) < 2:
+            formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
+        # Despite metadata in m3u8 all video+audio formats are
+        # actually video-only (no audio)
+        for f in formats:
+            if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
+                f['acodec'] = 'none'
+        self._sort_formats(formats)
+
+        info = {
+            'id': video_id,
+            'title': video_id,
+            'formats': formats,
+        }
+
+        rss = xpath_element(result, 'rss')
+        if rss:
+            info.update(self._parse_rss_feed(rss)['entries'][0])
+            del info['url']
+            del info['_type']
+            del info['ie_key']
+        return info
+
+
+class CBCWatchIE(CBCWatchBaseIE):
+    IE_NAME = 'cbc.ca:watch'
+    _VALID_URL = r'https?://watch\.cbc\.ca/(?:[^/]+/)+(?P<id>[0-9a-f-]+)'
+    _TESTS = [{
+        'url': 'http://watch.cbc.ca/doc-zone/season-6/customer-disservice/38e815a-009e3ab12e4',
+        'info_dict': {
+            'id': '38e815a-009e3ab12e4',
+            'ext': 'mp4',
+            'title': 'Customer (Dis)Service',
+            'description': 'md5:8bdd6913a0fe03d4b2a17ebe169c7c87',
+            'upload_date': '20160219',
+            'timestamp': 1455840000,
          },
          'params': {
-            # rtmp download
+            # m3u8 download
              'skip_download': True,
+            'format': 'bestvideo',
          },
-    }
+        'skip': 'Geo-restricted to Canada',
+    }, {
+        'url': 'http://watch.cbc.ca/arthur/all/1ed4b385-cd84-49cf-95f0-80f004680057',
+        'info_dict': {
+            'id': '1ed4b385-cd84-49cf-95f0-80f004680057',
+            'title': 'Arthur',
+            'description': 'Arthur, the sweetest 8-year-old aardvark, and his pals solve all kinds of problems with humour, kindness and teamwork.',
+        },
+        'playlist_mincount': 30,
+        'skip': 'Geo-restricted to Canada',
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        return self.url_result(
-            'http://feed.theplatform.com/f/ExhSPC/vms_5akSXx4Ng_Zn?byGuid=%s' % video_id,
-            'ThePlatformFeed', video_id)
+        rss = self._call_api('web/browse/' + video_id, video_id)
+        return self._parse_rss_feed(rss)
diff --git a/youtube_dl/extractor/cbs.py b/youtube_dl/extractor/cbs.py

index 40d07ab181ff8462599f89e520a1ebc1711fe1b3..58f258c54b059b09888cf0e26a4718a69c704faa 100644 (file)
--- a/youtube_dl/extractor/cbs.py
+++ b/youtube_dl/extractor/cbs.py
@@ -1,42 +1,43 @@
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
+from .theplatform import ThePlatformFeedIE
  from ..utils import (
-    sanitized_Request,
-    smuggle_url,
+    int_or_none,
+    find_xpath_attr,
+    xpath_element,
+    xpath_text,
+    update_url_query,
  )
  
  
-class CBSIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<id>[^/]+)'
+class CBSBaseIE(ThePlatformFeedIE):
+    def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
+        closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
+        return {
+            'en': [{
+                'ext': 'ttml',
+                'url': closed_caption_e.attrib['value'],
+            }]
+        } if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
+
+
+class CBSIE(CBSBaseIE):
+    _VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
  
      _TESTS = [{
          'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
          'info_dict': {
-            'id': '4JUVEwq3wUT7',
-            'display_id': 'connect-chat-feat-garth-brooks',
-            'ext': 'flv',
+            'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
+            'ext': 'mp4',
              'title': 'Connect Chat feat. Garth Brooks',
              'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
              'duration': 1495,
+            'timestamp': 1385585425,
+            'upload_date': '20131127',
+            'uploader': 'CBSI-NEW',
          },
          'params': {
-            # rtmp download
-            'skip_download': True,
-        },
-        '_skip': 'Blocked outside the US',
-    }, {
-        'url': 'http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/',
-        'info_dict': {
-            'id': 'WWF_5KqY3PK1',
-            'display_id': 'st-vincent',
-            'ext': 'flv',
-            'title': 'Live on Letterman - St. Vincent',
-            'description': 'Live On Letterman: St. Vincent in concert from New York\'s Ed Sullivan Theater on Tuesday, July 16, 2014.',
-            'duration': 3221,
-        },
-        'params': {
-            # rtmp download
+            # m3u8 download
              'skip_download': True,
          },
          '_skip': 'Blocked outside the US',
@@ -48,21 +49,52 @@ class CBSIE(InfoExtractor):
          'only_matching': True,
      }]
  
+    def _extract_video_info(self, content_id):
+        items_data = self._download_xml(
+            'http://can.cbs.com/thunder/player/videoPlayerService.php',
+            content_id, query={'partner': 'cbs', 'contentId': content_id})
+        video_data = xpath_element(items_data, './/item')
+        title = xpath_text(video_data, 'videoTitle', 'title', True)
+        tp_path = 'dJ5BDC/media/guid/2198311517/%s' % content_id
+        tp_release_url = 'http://link.theplatform.com/s/' + tp_path
+
+        asset_types = []
+        subtitles = {}
+        formats = []
+        for item in items_data.findall('.//item'):
+            asset_type = xpath_text(item, 'assetType')
+            if not asset_type or asset_type in asset_types:
+                continue
+            asset_types.append(asset_type)
+            query = {
+                'mbr': 'true',
+                'assetTypes': asset_type,
+            }
+            if asset_type.startswith('HLS') or asset_type in ('OnceURL', 'StreamPack'):
+                query['formats'] = 'MPEG4,M3U'
+            elif asset_type in ('RTMP', 'WIFI', '3G'):
+                query['formats'] = 'MPEG4,FLV'
+            tp_formats, tp_subtitles = self._extract_theplatform_smil(
+                update_url_query(tp_release_url, query), content_id,
+                'Downloading %s SMIL data' % asset_type)
+            formats.extend(tp_formats)
+            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
+        self._sort_formats(formats)
+
+        info = self._extract_theplatform_metadata(tp_path, content_id)
+        info.update({
+            'id': content_id,
+            'title': title,
+            'series': xpath_text(video_data, 'seriesTitle'),
+            'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
+            'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
+            'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
+            'thumbnail': xpath_text(video_data, 'previewImageURL'),
+            'formats': formats,
+            'subtitles': subtitles,
+        })
+        return info
+
      def _real_extract(self, url):
-        display_id = self._match_id(url)
-        request = sanitized_Request(url)
-        # Android UA is served with higher quality (720p) streams (see
-        # https://github.com/rg3/youtube-dl/issues/7490)
-        request.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4; Nexus 5)')
-        webpage = self._download_webpage(request, display_id)
-        real_id = self._search_regex(
-            [r"video\.settings\.pid\s*=\s*'([^']+)';", r"cbsplayer\.pid\s*=\s*'([^']+)';"],
-            webpage, 'real video ID')
-        return {
-            '_type': 'url_transparent',
-            'ie_key': 'ThePlatform',
-            'url': smuggle_url(
-                'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true&manifest=m3u' % real_id,
-                {'force_smil_url': True}),
-            'display_id': display_id,
-        }
+        content_id = self._match_id(url)
+        return self._extract_video_info(content_id)
diff --git a/youtube_dl/extractor/cbsinteractive.py b/youtube_dl/extractor/cbsinteractive.py

new file mode 100644 (file)

index 0000000..57b18e8
--- /dev/null
+++ b/youtube_dl/extractor/cbsinteractive.py
@@ -0,0 +1,105 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .theplatform import ThePlatformIE
+from ..utils import int_or_none
+
+
+class CBSInteractiveIE(ThePlatformIE):
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>cnet|zdnet)\.com/(?:videos|video/share)/(?P<id>[^/?]+)'
+    _TESTS = [{
+        'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
+        'info_dict': {
+            'id': '56f4ea68-bd21-4852-b08c-4de5b8354c60',
+            'ext': 'flv',
+            'title': 'Hands-on with Microsoft Windows 8.1 Update',
+            'description': 'The new update to the Windows 8 OS brings improved performance for mouse and keyboard users.',
+            'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861',
+            'uploader': 'Sarah Mitroff',
+            'duration': 70,
+            'timestamp': 1396479627,
+            'upload_date': '20140402',
+        },
+    }, {
+        'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/',
+        'info_dict': {
+            'id': '56527b93-d25d-44e3-b738-f989ce2e49ba',
+            'ext': 'flv',
+            'title': 'Whiny potholes tweet at local government when hit by cars (Tomorrow Daily 187)',
+            'description': 'Khail and Ashley wonder what other civic woes can be solved by self-tweeting objects, investigate a new kind of VR camera and watch an origami robot self-assemble, walk, climb, dig and dissolve. #TDPothole',
+            'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40',
+            'uploader': 'Ashley Esqueda',
+            'duration': 1482,
+            'timestamp': 1433289889,
+            'upload_date': '20150603',
+        },
+    }, {
+        'url': 'http://www.zdnet.com/video/share/video-keeping-android-smartphones-and-tablets-secure/',
+        'info_dict': {
+            'id': 'bc1af9f0-a2b5-4e54-880d-0d95525781c0',
+            'ext': 'mp4',
+            'title': 'Video: Keeping Android smartphones and tablets secure',
+            'description': 'Here\'s the best way to keep Android devices secure, and what you do when they\'ve come to the end of their lives.',
+            'uploader_id': 'f2d97ea2-8175-11e2-9d12-0018fe8a00b0',
+            'uploader': 'Adrian Kingsley-Hughes',
+            'timestamp': 1448961720,
+            'upload_date': '20151201',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }]
+    TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/kYEXFC/%s?mbr=true'
+    MPX_ACCOUNTS = {
+        'cnet': 2288573011,
+        'zdnet': 2387448114,
+    }
+
+    def _real_extract(self, url):
+        site, display_id = re.match(self._VALID_URL, url).groups()
+        webpage = self._download_webpage(url, display_id)
+
+        data_json = self._html_search_regex(
+            r"data-(?:cnet|zdnet)-video(?:-uvp(?:js)?)?-options='([^']+)'",
+            webpage, 'data json')
+        data = self._parse_json(data_json, display_id)
+        vdata = data.get('video') or data['videos'][0]
+
+        video_id = vdata['id']
+        title = vdata['title']
+        author = vdata.get('author')
+        if author:
+            uploader = '%s %s' % (author['firstName'], author['lastName'])
+            uploader_id = author.get('id')
+        else:
+            uploader = None
+            uploader_id = None
+
+        media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
+        formats, subtitles = [], {}
+        for (fkey, vid) in vdata['files'].items():
+            if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
+                continue
+            release_url = self.TP_RELEASE_URL_TEMPLATE % vid
+            if fkey == 'hds':
+                release_url += '&manifest=f4m'
+            tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey)
+            formats.extend(tp_formats)
+            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
+        self._sort_formats(formats)
+
+        info = self._extract_theplatform_metadata('kYEXFC/%s' % media_guid_path, video_id)
+        info.update({
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'duration': int_or_none(vdata.get('duration')),
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'subtitles': subtitles,
+            'formats': formats,
+        })
+        return info
diff --git a/youtube_dl/extractor/cbslocal.py b/youtube_dl/extractor/cbslocal.py

new file mode 100644 (file)

index 0000000..289709c
--- /dev/null
+++ b/youtube_dl/extractor/cbslocal.py
@@ -0,0 +1,75 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .anvato import AnvatoIE
+from .sendtonews import SendtoNewsIE
+from ..compat import compat_urlparse
+from ..utils import unified_timestamp
+
+
+class CBSLocalIE(AnvatoIE):
+    _VALID_URL = r'https?://[a-z]+\.cbslocal\.com/\d+/\d+/\d+/(?P<id>[0-9a-z-]+)'
+
+    _TESTS = [{
+        # Anvato backend
+        'url': 'http://losangeles.cbslocal.com/2016/05/16/safety-advocates-say-fatal-car-seat-failures-are-public-health-crisis',
+        'md5': 'f0ee3081e3843f575fccef901199b212',
+        'info_dict': {
+            'id': '3401037',
+            'ext': 'mp4',
+            'title': 'Safety Advocates Say Fatal Car Seat Failures Are \'Public Health Crisis\'',
+            'description': 'Collapsing seats have been the focus of scrutiny for decades, though experts say remarkably little has been done to address the issue. Randy Paige reports.',
+            'thumbnail': 're:^https?://.*',
+            'timestamp': 1463440500,
+            'upload_date': '20160516',
+            'uploader': 'CBS',
+            'subtitles': {
+                'en': 'mincount:5',
+            },
+            'categories': [
+                'Stations\\Spoken Word\\KCBSTV',
+                'Syndication\\MSN',
+                'Syndication\\NDN',
+                'Syndication\\AOL',
+                'Syndication\\Yahoo',
+                'Syndication\\Tribune',
+                'Syndication\\Curb.tv',
+                'Content\\News'
+            ],
+            'tags': ['CBS 2 News Evening'],
+        },
+    }, {
+        # SendtoNews embed
+        'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/',
+        'info_dict': {
+            'id': 'GxfCe0Zo7D-175909-5588',
+        },
+        'playlist_count': 9,
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        sendtonews_url = SendtoNewsIE._extract_url(webpage)
+        if sendtonews_url:
+            return self.url_result(
+                compat_urlparse.urljoin(url, sendtonews_url),
+                ie=SendtoNewsIE.ie_key())
+
+        info_dict = self._extract_anvato_videos(webpage, display_id)
+
+        time_str = self._html_search_regex(
+            r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
+        timestamp = unified_timestamp(time_str)
+
+        info_dict.update({
+            'display_id': display_id,
+            'timestamp': timestamp,
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/cbsnews.py b/youtube_dl/extractor/cbsnews.py

index e6b7f3584543b0cedc8263828a274d467241e924..91b0f5fa94c7ba919e01fd097cbdfc71fe6992b4 100644 (file)
--- a/youtube_dl/extractor/cbsnews.py
+++ b/youtube_dl/extractor/cbsnews.py
@@ -1,15 +1,15 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from .theplatform import ThePlatformIE
+from .cbs import CBSIE
  from ..utils import (
      parse_duration,
-    find_xpath_attr,
  )
  
  
-class CBSNewsIE(ThePlatformIE):
+class CBSNewsIE(CBSIE):
+    IE_NAME = 'cbsnews'
      IE_DESC = 'CBS News'
      _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
  
@@ -27,13 +27,18 @@ class CBSNewsIE(ThePlatformIE):
                  # rtmp download
                  'skip_download': True,
              },
+            'skip': 'Subscribers only',
          },
          {
              'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
              'info_dict': {
-                'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
+                'id': 'SNJBOYzXiWBOvaLsdzwH8fmtP1SCd91Y',
                  'ext': 'mp4',
                  'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
+                'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
+                'upload_date': '20140404',
+                'timestamp': 1396650660,
+                'uploader': 'CBSI-NEW',
                  'thumbnail': 're:^https?://.*\.jpg$',
                  'duration': 205,
                  'subtitles': {
@@ -49,15 +54,6 @@ class CBSNewsIE(ThePlatformIE):
          },
      ]
  
-    def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
-        closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
-        return {
-            'en': [{
-                'ext': 'ttml',
-                'url': closed_caption_e.attrib['value'],
-            }]
-        } if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
-
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
@@ -68,66 +64,44 @@ class CBSNewsIE(ThePlatformIE):
              webpage, 'video JSON info'), video_id)
  
          item = video_info['item'] if 'item' in video_info else video_info
-        title = item.get('articleTitle') or item.get('hed')
-        duration = item.get('duration')
-        thumbnail = item.get('mediaImage') or item.get('thumbnail')
-
-        subtitles = {}
-        formats = []
-        for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
-            pid = item.get('media' + format_id)
-            if not pid:
-                continue
-            release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
-            tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
-            formats.extend(tp_formats)
-            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats,
-            'subtitles': subtitles,
-        }
+        guid = item['mpxRefId']
+        return self._extract_video_info(guid)
  
  
  class CBSNewsLiveVideoIE(InfoExtractor):
+    IE_NAME = 'cbsnews:livevideo'
      IE_DESC = 'CBS News Live Videos'
-    _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
+    _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[^/?#]+)'
  
+    # Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
      _TEST = {
          'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
          'info_dict': {
              'id': 'clinton-sanders-prepare-to-face-off-in-nh',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Clinton, Sanders Prepare To Face Off In NH',
              'duration': 334,
          },
+        'skip': 'Video gone',
      }
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        display_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
+        video_info = self._download_json(
+            'http://feeds.cbsn.cbsnews.com/rundown/story', display_id, query={
+                'device': 'desktop',
+                'dvr_slug': display_id,
+            })
  
-        video_info = self._parse_json(self._html_search_regex(
-            r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
-
-        hdcore_sign = 'hdcore=3.3.1'
-        f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
-        if f4m_formats:
-            for entry in f4m_formats:
-                # URLs without the extra param induce an 404 error
-                entry.update({'extra_param_to_segment_url': hdcore_sign})
-        self._sort_formats(f4m_formats)
+        formats = self._extract_akamai_formats(video_info['url'], display_id)
+        self._sort_formats(formats)
  
          return {
-            'id': video_id,
+            'id': display_id,
+            'display_id': display_id,
              'title': video_info['headline'],
              'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
              'duration': parse_duration(video_info.get('segmentDur')),
-            'formats': f4m_formats,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/cbssports.py b/youtube_dl/extractor/cbssports.py

index 549ae32f36c8ebd258896d4189ba90ae501c40d0..3a62c840b42bace9993ddb3cb77fc89201b0578e 100644 (file)
--- a/youtube_dl/extractor/cbssports.py
+++ b/youtube_dl/extractor/cbssports.py
@@ -1,30 +1,31 @@
  from __future__ import unicode_literals
  
-import re
+from .cbs import CBSBaseIE
  
-from .common import InfoExtractor
  
+class CBSSportsIE(CBSBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
  
-class CBSSportsIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
-
-    _TEST = {
-        'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s',
+    _TESTS = [{
+        'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',
          'info_dict': {
-            'id': '_d5_GbO8p1sT',
-            'ext': 'flv',
-            'title': 'US Open flashbacks: 1990s',
-            'description': 'Bill Macatee relives the best moments in US Open history from the 1990s.',
+            'id': '708337219968',
+            'ext': 'mp4',
+            'title': 'Ben Simmons the next LeBron? Not so fast',
+            'description': 'md5:854294f627921baba1f4b9a990d87197',
+            'timestamp': 1466293740,
+            'upload_date': '20160618',
+            'uploader': 'CBSI-NEW',
          },
-    }
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }]
+
+    def _extract_video_info(self, filter_query, video_id):
+        return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        section = mobj.group('section')
-        video_id = mobj.group('id')
-        all_videos = self._download_json(
-            'http://www.cbssports.com/data/video/player/getVideos/%s?as=json' % section,
-            video_id)
-        # The json file contains the info of all the videos in the section
-        video_info = next(v for v in all_videos if v['pcid'] == video_id)
-        return self.url_result('theplatform:%s' % video_info['pid'], 'ThePlatform')
+        video_id = self._match_id(url)
+        return self._extract_video_info('byId=%s' % video_id, video_id)
diff --git a/youtube_dl/extractor/ccc.py b/youtube_dl/extractor/ccc.py

index dda2c0959882c3cd3c5de56b817ccd7815ef0068..8f7f09e22dad6eda3ca08edfbf9edc118146e893 100644 (file)
--- a/youtube_dl/extractor/ccc.py
+++ b/youtube_dl/extractor/ccc.py
@@ -1,13 +1,9 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
-    parse_duration,
-    qualities,
-    unified_strdate,
+    parse_iso8601,
  )
  
  
@@ -19,14 +15,14 @@ class CCCIE(InfoExtractor):
          'url': 'https://media.ccc.de/v/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor#video',
          'md5': '3a1eda8f3a29515d27f5adb967d7e740',
          'info_dict': {
-            'id': '30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor',
+            'id': '1839',
              'ext': 'mp4',
              'title': 'Introduction to Processor Design',
-            'description': 'md5:80be298773966f66d56cb11260b879af',
+            'description': 'md5:df55f6d073d4ceae55aae6f2fd98a0ac',
              'thumbnail': 're:^https?://.*\.jpg$',
-            'view_count': int,
              'upload_date': '20131228',
-            'duration': 3660,
+            'timestamp': 1388188800,
+            'duration': 3710,
          }
      }, {
          'url': 'https://media.ccc.de/v/32c3-7368-shopshifting#download',
@@ -34,79 +30,48 @@ class CCCIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        if self._downloader.params.get('prefer_free_formats'):
-            preference = qualities(['mp3', 'opus', 'mp4-lq', 'webm-lq', 'h264-sd', 'mp4-sd', 'webm-sd', 'mp4', 'webm', 'mp4-hd', 'h264-hd', 'webm-hd'])
-        else:
-            preference = qualities(['opus', 'mp3', 'webm-lq', 'mp4-lq', 'webm-sd', 'h264-sd', 'mp4-sd', 'webm', 'mp4', 'webm-hd', 'mp4-hd', 'h264-hd'])
-
-        title = self._html_search_regex(
-            r'(?s)<h1>(.*?)</h1>', webpage, 'title')
-        description = self._html_search_regex(
-            r'(?s)<h3>About</h3>(.+?)<h3>',
-            webpage, 'description', fatal=False)
-        upload_date = unified_strdate(self._html_search_regex(
-            r"(?s)<span[^>]+class='[^']*fa-calendar-o'[^>]*>(.+?)</span>",
-            webpage, 'upload date', fatal=False))
-        view_count = int_or_none(self._html_search_regex(
-            r"(?s)<span class='[^']*fa-eye'></span>(.*?)</li>",
-            webpage, 'view count', fatal=False))
-        duration = parse_duration(self._html_search_regex(
-            r'(?s)<span[^>]+class=(["\']).*?fa-clock-o.*?\1[^>]*></span>(?P<duration>.+?)</li',
-            webpage, 'duration', fatal=False, group='duration'))
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        event_id = self._search_regex("data-id='(\d+)'", webpage, 'event id')
+        event_data = self._download_json('https://media.ccc.de/public/events/%s' % event_id, event_id)
  
-        matches = re.finditer(r'''(?xs)
-            <(?:span|div)\s+class='label\s+filetype'>(?P<format>[^<]*)</(?:span|div)>\s*
-            <(?:span|div)\s+class='label\s+filetype'>(?P<lang>[^<]*)</(?:span|div)>\s*
-            <a\s+download\s+href='(?P<http_url>[^']+)'>\s*
-            (?:
-                .*?
-                <a\s+(?:download\s+)?href='(?P<torrent_url>[^']+\.torrent)'
-            )?''', webpage)
          formats = []
-        for m in matches:
-            format = m.group('format')
-            format_id = self._search_regex(
-                r'.*/([a-z0-9_-]+)/[^/]*$',
-                m.group('http_url'), 'format id', default=None)
-            if format_id:
-                format_id = m.group('lang') + '-' + format_id
-            vcodec = 'h264' if 'h264' in format_id else (
-                'none' if format_id in ('mp3', 'opus') else None
+        for recording in event_data.get('recordings', []):
+            recording_url = recording.get('recording_url')
+            if not recording_url:
+                continue
+            language = recording.get('language')
+            folder = recording.get('folder')
+            format_id = None
+            if language:
+                format_id = language
+            if folder:
+                if language:
+                    format_id += '-' + folder
+                else:
+                    format_id = folder
+            vcodec = 'h264' if 'h264' in folder else (
+                'none' if folder in ('mp3', 'opus') else None
              )
              formats.append({
                  'format_id': format_id,
-                'format': format,
-                'language': m.group('lang'),
-                'url': m.group('http_url'),
+                'url': recording_url,
+                'width': int_or_none(recording.get('width')),
+                'height': int_or_none(recording.get('height')),
+                'filesize': int_or_none(recording.get('size'), invscale=1024 * 1024),
+                'language': language,
                  'vcodec': vcodec,
-                'preference': preference(format_id),
              })
-
-            if m.group('torrent_url'):
-                formats.append({
-                    'format_id': 'torrent-%s' % (format if format_id is None else format_id),
-                    'format': '%s (torrent)' % format,
-                    'proto': 'torrent',
-                    'format_note': '(unsupported; will just download the .torrent file)',
-                    'vcodec': vcodec,
-                    'preference': -100 + preference(format_id),
-                    'url': m.group('torrent_url'),
-                })
          self._sort_formats(formats)
  
-        thumbnail = self._html_search_regex(
-            r"<video.*?poster='([^']+)'", webpage, 'thumbnail', fatal=False)
-
          return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'view_count': view_count,
-            'upload_date': upload_date,
-            'duration': duration,
+            'id': event_id,
+            'display_id': display_id,
+            'title': event_data['title'],
+            'description': event_data.get('description'),
+            'thumbnail': event_data.get('thumb_url'),
+            'timestamp': parse_iso8601(event_data.get('date')),
+            'duration': int_or_none(event_data.get('length')),
+            'tags': event_data.get('tags'),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/cctv.py b/youtube_dl/extractor/cctv.py

new file mode 100644 (file)

index 0000000..72a72cb
--- /dev/null
+++ b/youtube_dl/extractor/cctv.py
@@ -0,0 +1,53 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import float_or_none
+
+
+class CCTVIE(InfoExtractor):
+    _VALID_URL = r'''(?x)https?://(?:.+?\.)?
+        (?:
+            cctv\.(?:com|cn)|
+            cntv\.cn
+        )/
+        (?:
+            video/[^/]+/(?P<id>[0-9a-f]{32})|
+            \d{4}/\d{2}/\d{2}/(?P<display_id>VID[0-9A-Za-z]+)
+        )'''
+    _TESTS = [{
+        'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
+        'md5': '819c7b49fc3927d529fb4cd555621823',
+        'info_dict': {
+            'id': '454368eb19ad44a1925bf1eb96140a61',
+            'ext': 'mp4',
+            'title': 'Portrait of Real Current Life 09/03/2016 Modern Inventors Part 1',
+        }
+    }, {
+        'url': 'http://tv.cctv.com/2016/09/07/VIDE5C1FnlX5bUywlrjhxXOV160907.shtml',
+        'only_matching': True,
+    }, {
+        'url': 'http://tv.cntv.cn/video/C39296/95cfac44cabd3ddc4a9438780a4e5c44',
+        'only_matching': True
+    }]
+
+    def _real_extract(self, url):
+        video_id, display_id = re.match(self._VALID_URL, url).groups()
+        if not video_id:
+            webpage = self._download_webpage(url, display_id)
+            video_id = self._search_regex(
+                r'(?:fo\.addVariable\("videoCenterId",\s*|guid\s*=\s*)"([0-9a-f]{32})',
+                webpage, 'video_id')
+        api_data = self._download_json(
+            'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + video_id, video_id)
+        m3u8_url = re.sub(r'maxbr=\d+&?', '', api_data['hls_url'])
+
+        return {
+            'id': video_id,
+            'title': api_data['title'],
+            'formats': self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False),
+            'duration': float_or_none(api_data.get('video', {}).get('totalLength')),
+        }
diff --git a/youtube_dl/extractor/cda.py b/youtube_dl/extractor/cda.py

index 498d2c0d8a1dad129c8a1a9681da2f06503129f9..e00bdaf66a6d9eb6ac051cc169cabbf02844770b 100755 (executable)
--- a/youtube_dl/extractor/cda.py
+++ b/youtube_dl/extractor/cda.py
@@ -5,14 +5,16 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
-    decode_packed_codes,
      ExtractorError,
-    parse_duration
+    float_or_none,
+    int_or_none,
+    parse_duration,
  )
  
  
  class CDAIE(InfoExtractor):
      _VALID_URL = r'https?://(?:(?:www\.)?cda\.pl/video|ebd\.cda\.pl/[0-9]+x[0-9]+)/(?P<id>[0-9a-z]+)'
+    _BASE_URL = 'http://www.cda.pl/'
      _TESTS = [{
          'url': 'http://www.cda.pl/video/5749950c',
          'md5': '6f844bf51b15f31fae165365707ae970',
@@ -21,6 +23,9 @@ class CDAIE(InfoExtractor):
              'ext': 'mp4',
              'height': 720,
              'title': 'Oto dlaczego przed zakrętem należy zwolnić.',
+            'description': 'md5:269ccd135d550da90d1662651fcb9772',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'average_rating': float,
              'duration': 39
          }
      }, {
@@ -30,6 +35,11 @@ class CDAIE(InfoExtractor):
              'id': '57413289',
              'ext': 'mp4',
              'title': 'Lądowanie na lotnisku na Maderze',
+            'description': 'md5:60d76b71186dcce4e0ba6d4bbdb13e1a',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'crash404',
+            'view_count': int,
+            'average_rating': float,
              'duration': 137
          }
      }, {
@@ -39,30 +49,55 @@ class CDAIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage('http://ebd.cda.pl/0x0/' + video_id, video_id)
+        self._set_cookie('cda.pl', 'cda.player', 'html5')
+        webpage = self._download_webpage(
+            self._BASE_URL + '/video/' + video_id, video_id)
  
          if 'Ten film jest dostępny dla użytkowników premium' in webpage:
              raise ExtractorError('This video is only available for premium users.', expected=True)
  
-        title = self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title')
-
          formats = []
  
+        uploader = self._search_regex(r'''(?x)
+            <(span|meta)[^>]+itemprop=(["\'])author\2[^>]*>
+            (?:<\1[^>]*>[^<]*</\1>|(?!</\1>)(?:.|\n))*?
+            <(span|meta)[^>]+itemprop=(["\'])name\4[^>]*>(?P<uploader>[^<]+)</\3>
+        ''', webpage, 'uploader', default=None, group='uploader')
+        view_count = self._search_regex(
+            r'Odsłony:(?:\s|&nbsp;)*([0-9]+)', webpage,
+            'view_count', default=None)
+        average_rating = self._search_regex(
+            r'<(?:span|meta)[^>]+itemprop=(["\'])ratingValue\1[^>]*>(?P<rating_value>[0-9.]+)',
+            webpage, 'rating', fatal=False, group='rating_value')
+
          info_dict = {
              'id': video_id,
-            'title': title,
+            'title': self._og_search_title(webpage),
+            'description': self._og_search_description(webpage),
+            'uploader': uploader,
+            'view_count': int_or_none(view_count),
+            'average_rating': float_or_none(average_rating),
+            'thumbnail': self._og_search_thumbnail(webpage),
              'formats': formats,
              'duration': None,
          }
  
          def extract_format(page, version):
-            unpacked = decode_packed_codes(page)
-            format_url = self._search_regex(
-                r"url:\\'(.+?)\\'", unpacked, '%s url' % version, fatal=False)
-            if not format_url:
+            json_str = self._search_regex(
+                r'player_data=(\\?["\'])(?P<player_data>.+?)\1', page,
+                '%s player_json' % version, fatal=False, group='player_data')
+            if not json_str:
+                return
+            player_data = self._parse_json(
+                json_str, '%s player_data' % version, fatal=False)
+            if not player_data:
+                return
+            video = player_data.get('video')
+            if not video or 'file' not in video:
+                self.report_warning('Unable to extract %s version information' % version)
                  return
              f = {
-                'url': format_url,
+                'url': video['file'],
              }
              m = re.search(
                  r'<a[^>]+data-quality="(?P<format_id>[^"]+)"[^>]+href="[^"]+"[^>]+class="[^"]*quality-btn-active[^"]*">(?P<height>[0-9]+)p',
@@ -74,8 +109,7 @@ class CDAIE(InfoExtractor):
                  })
              info_dict['formats'].append(f)
              if not info_dict['duration']:
-                info_dict['duration'] = parse_duration(self._search_regex(
-                    r"duration:\\'(.+?)\\'", unpacked, 'duration', fatal=False))
+                info_dict['duration'] = parse_duration(video.get('duration'))
  
          extract_format(webpage, 'default')
  
@@ -83,7 +117,8 @@ class CDAIE(InfoExtractor):
                  r'<a[^>]+data-quality="[^"]+"[^>]+href="([^"]+)"[^>]+class="quality-btn"[^>]*>([0-9]+p)',
                  webpage):
              webpage = self._download_webpage(
-                href, video_id, 'Downloading %s version information' % resolution, fatal=False)
+                self._BASE_URL + href, video_id,
+                'Downloading %s version information' % resolution, fatal=False)
              if not webpage:
                  # Manually report warning because empty page is returned when
                  # invalid version is requested.
diff --git a/youtube_dl/extractor/ceskatelevize.py b/youtube_dl/extractor/ceskatelevize.py

index 6652c8e42a279f45bdbbc1af3d36ad2500a454eb..4ec79d19dd9db6402752ee65d462631985009cbf 100644 (file)
--- a/youtube_dl/extractor/ceskatelevize.py
+++ b/youtube_dl/extractor/ceskatelevize.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -17,7 +17,7 @@ from ..utils import (
  
  
  class CeskaTelevizeIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
+    _VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
      _TESTS = [{
          'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
          'info_dict': {
@@ -33,19 +33,33 @@ class CeskaTelevizeIE(InfoExtractor):
              'skip_download': True,
          },
      }, {
-        'url': 'http://www.ceskatelevize.cz/ivysilani/10532695142-prvni-republika/bonus/14716-zpevacka-z-duparny-bobina',
+        'url': 'http://www.ceskatelevize.cz/ivysilani/10441294653-hyde-park-civilizace/215411058090502/bonus/20641-bonus-01-en',
          'info_dict': {
-            'id': '61924494876844374',
+            'id': '61924494877028507',
              'ext': 'mp4',
-            'title': 'První republika: Zpěvačka z Dupárny Bobina',
-            'description': 'Sága mapující atmosféru první republiky od r. 1918 do r. 1945.',
+            'title': 'Hyde Park Civilizace: Bonus 01 - En',
+            'description': 'English Subtittles',
              'thumbnail': 're:^https?://.*\.jpg',
-            'duration': 88.4,
+            'duration': 81.3,
          },
          'params': {
              # m3u8 download
              'skip_download': True,
          },
+    }, {
+        # live stream
+        'url': 'http://www.ceskatelevize.cz/ivysilani/zive/ct4/',
+        'info_dict': {
+            'id': 402,
+            'ext': 'mp4',
+            'title': 're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'is_live': True,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'skip': 'Georestricted to Czech Republic',
      }, {
          # video with 18+ caution trailer
          'url': 'http://www.ceskatelevize.cz/porady/10520528904-queer/215562210900007-bogotart/',
@@ -118,19 +132,21 @@ class CeskaTelevizeIE(InfoExtractor):
          req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
          req.add_header('Referer', url)
  
-        playlist_title = self._og_search_title(webpage)
-        playlist_description = self._og_search_description(webpage)
+        playlist_title = self._og_search_title(webpage, default=None)
+        playlist_description = self._og_search_description(webpage, default=None)
  
          playlist = self._download_json(req, playlist_id)['playlist']
          playlist_len = len(playlist)
  
          entries = []
          for item in playlist:
+            is_live = item.get('type') == 'LIVE'
              formats = []
              for format_id, stream_url in item['streamUrls'].items():
                  formats.extend(self._extract_m3u8_formats(
                      stream_url, playlist_id, 'mp4',
-                    entry_protocol='m3u8_native', fatal=False))
+                    entry_protocol='m3u8' if is_live else 'm3u8_native',
+                    fatal=False))
              self._sort_formats(formats)
  
              item_id = item.get('id') or item['assetId']
@@ -145,14 +161,22 @@ class CeskaTelevizeIE(InfoExtractor):
                  if subs:
                      subtitles = self.extract_subtitles(episode_id, subs)
  
+            if playlist_len == 1:
+                final_title = playlist_title or title
+                if is_live:
+                    final_title = self._live_title(final_title)
+            else:
+                final_title = '%s (%s)' % (playlist_title, title)
+
              entries.append({
                  'id': item_id,
-                'title': playlist_title if playlist_len == 1 else '%s (%s)' % (playlist_title, title),
+                'title': final_title,
                  'description': playlist_description if playlist_len == 1 else None,
                  'thumbnail': thumbnail,
                  'duration': duration,
                  'formats': formats,
                  'subtitles': subtitles,
+                'is_live': is_live,
              })
  
          return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
diff --git a/youtube_dl/extractor/channel9.py b/youtube_dl/extractor/channel9.py

index c74553dcfa7c689b7fc8d69147625b1169e1e178..34d4e61569b110b49998768f13bb81cdda75bd75 100644 (file)
--- a/youtube_dl/extractor/channel9.py
+++ b/youtube_dl/extractor/channel9.py
@@ -20,54 +20,64 @@ class Channel9IE(InfoExtractor):
      '''
      IE_DESC = 'Channel 9'
      IE_NAME = 'channel9'
-    _VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?'
-
-    _TESTS = [
-        {
-            'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
-            'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
-            'info_dict': {
-                'id': 'Events/TechEd/Australia/2013/KOS002',
-                'ext': 'mp4',
-                'title': 'Developer Kick-Off Session: Stuff We Love',
-                'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
-                'duration': 4576,
-                'thumbnail': 're:http://.*\.jpg',
-                'session_code': 'KOS002',
-                'session_day': 'Day 1',
-                'session_room': 'Arena 1A',
-                'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug', 'Mads Kristensen'],
-            },
+    _VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+?)(?P<rss>/RSS)?/?(?:[?#&]|$)'
+
+    _TESTS = [{
+        'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
+        'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
+        'info_dict': {
+            'id': 'Events/TechEd/Australia/2013/KOS002',
+            'ext': 'mp4',
+            'title': 'Developer Kick-Off Session: Stuff We Love',
+            'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
+            'duration': 4576,
+            'thumbnail': 're:http://.*\.jpg',
+            'session_code': 'KOS002',
+            'session_day': 'Day 1',
+            'session_room': 'Arena 1A',
+            'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug',
+                                 'Mads Kristensen'],
          },
-        {
-            'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
-            'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
-            'info_dict': {
-                'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
-                'ext': 'mp4',
-                'title': 'Self-service BI with Power BI - nuclear testing',
-                'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
-                'duration': 1540,
-                'thumbnail': 're:http://.*\.jpg',
-                'authors': ['Mike Wilmot'],
-            },
+    }, {
+        'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
+        'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
+        'info_dict': {
+            'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
+            'ext': 'mp4',
+            'title': 'Self-service BI with Power BI - nuclear testing',
+            'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
+            'duration': 1540,
+            'thumbnail': 're:http://.*\.jpg',
+            'authors': ['Mike Wilmot'],
          },
-        {
-            # low quality mp4 is best
-            'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
-            'info_dict': {
-                'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
-                'ext': 'mp4',
-                'title': 'Ranges for the Standard Library',
-                'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
-                'duration': 5646,
-                'thumbnail': 're:http://.*\.jpg',
-            },
-            'params': {
-                'skip_download': True,
-            },
-        }
-    ]
+    }, {
+        # low quality mp4 is best
+        'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
+        'info_dict': {
+            'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
+            'ext': 'mp4',
+            'title': 'Ranges for the Standard Library',
+            'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
+            'duration': 5646,
+            'thumbnail': 're:http://.*\.jpg',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
+        'info_dict': {
+            'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
+            'title': 'Channel 9',
+        },
+        'playlist_count': 2,
+    }, {
+        'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
+        'only_matching': True,
+    }, {
+        'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
+        'only_matching': True,
+    }]
  
      _RSS_URL = 'http://channel9.msdn.com/%s/RSS'
  
@@ -254,22 +264,30 @@ class Channel9IE(InfoExtractor):
  
          return self.playlist_result(contents)
  
-    def _extract_list(self, content_path):
-        rss = self._download_xml(self._RSS_URL % content_path, content_path, 'Downloading RSS')
+    def _extract_list(self, video_id, rss_url=None):
+        if not rss_url:
+            rss_url = self._RSS_URL % video_id
+        rss = self._download_xml(rss_url, video_id, 'Downloading RSS')
          entries = [self.url_result(session_url.text, 'Channel9')
                     for session_url in rss.findall('./channel/item/link')]
          title_text = rss.find('./channel/title').text
-        return self.playlist_result(entries, content_path, title_text)
+        return self.playlist_result(entries, video_id, title_text)
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          content_path = mobj.group('contentpath')
+        rss = mobj.group('rss')
+
+        if rss:
+            return self._extract_list(content_path, url)
  
-        webpage = self._download_webpage(url, content_path, 'Downloading web page')
+        webpage = self._download_webpage(
+            url, content_path, 'Downloading web page')
  
-        page_type_m = re.search(r'<meta name="WT.entryid" content="(?P<pagetype>[^:]+)[^"]+"/>', webpage)
-        if page_type_m is not None:
-            page_type = page_type_m.group('pagetype')
+        page_type = self._search_regex(
+            r'<meta[^>]+name=(["\'])WT\.entryid\1[^>]+content=(["\'])(?P<pagetype>[^:]+).+?\2',
+            webpage, 'page type', default=None, group='pagetype')
+        if page_type:
              if page_type == 'Entry':      # Any 'item'-like page, may contain downloadable content
                  return self._extract_entry_item(webpage, content_path)
              elif page_type == 'Session':  # Event session page, may contain downloadable content
@@ -278,6 +296,5 @@ class Channel9IE(InfoExtractor):
                  return self._extract_list(content_path)
              else:
                  raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True)
-
          else:  # Assuming list
              return self._extract_list(content_path)
diff --git a/youtube_dl/extractor/charlierose.py b/youtube_dl/extractor/charlierose.py

new file mode 100644 (file)

index 0000000..4bf2cf7
--- /dev/null
+++ b/youtube_dl/extractor/charlierose.py
@@ -0,0 +1,51 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import remove_end
+
+
+class CharlieRoseIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?charlierose\.com/video(?:s|/player)/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://charlierose.com/videos/27996',
+        'md5': 'fda41d49e67d4ce7c2411fd2c4702e09',
+        'info_dict': {
+            'id': '27996',
+            'ext': 'mp4',
+            'title': 'Remembering Zaha Hadid',
+            'thumbnail': 're:^https?://.*\.jpg\?\d+',
+            'description': 'We revisit past conversations with Zaha Hadid, in memory of the world renowned Iraqi architect.',
+            'subtitles': {
+                'en': [{
+                    'ext': 'vtt',
+                }],
+            },
+        },
+    }, {
+        'url': 'https://charlierose.com/videos/27996',
+        'only_matching': True,
+    }]
+
+    _PLAYER_BASE = 'https://charlierose.com/video/player/%s'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(self._PLAYER_BASE % video_id, video_id)
+
+        title = remove_end(self._og_search_title(webpage), ' - Charlie Rose')
+
+        info_dict = self._parse_html5_media_entries(
+            self._PLAYER_BASE % video_id, webpage, video_id,
+            m3u8_entry_protocol='m3u8_native')[0]
+
+        self._sort_formats(info_dict['formats'])
+        self._remove_duplicate_formats(info_dict['formats'])
+
+        info_dict.update({
+            'id': video_id,
+            'title': title,
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'description': self._og_search_description(webpage),
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/chaturbate.py b/youtube_dl/extractor/chaturbate.py

index b2234549e8b6747cd76d6d1936af893522e59cae..29a8820d5835b1b3cf7aca3840705a2fb2f2e1e3 100644 (file)
--- a/youtube_dl/extractor/chaturbate.py
+++ b/youtube_dl/extractor/chaturbate.py
@@ -17,7 +17,8 @@ class ChaturbateIE(InfoExtractor):
          },
          'params': {
              'skip_download': True,
-        }
+        },
+        'skip': 'Room is offline',
      }, {
          'url': 'https://en.chaturbate.com/siswet19/',
          'only_matching': True,
diff --git a/youtube_dl/extractor/chirbit.py b/youtube_dl/extractor/chirbit.py

index b1eeaf101dda3a4a962862fa854db97ae1809329..f35df143a604695c0b1fe7b0e33d7384192d1d98 100644 (file)
--- a/youtube_dl/extractor/chirbit.py
+++ b/youtube_dl/extractor/chirbit.py
@@ -1,30 +1,34 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import base64
+import re
+
  from .common import InfoExtractor
-from ..utils import (
-    parse_duration,
-    int_or_none,
-)
+from ..utils import parse_duration
  
  
  class ChirbitIE(InfoExtractor):
      IE_NAME = 'chirbit'
      _VALID_URL = r'https?://(?:www\.)?chirb\.it/(?:(?:wp|pl)/|fb_chirbit_player\.swf\?key=)?(?P<id>[\da-zA-Z]+)'
      _TESTS = [{
-        'url': 'http://chirb.it/PrIPv5',
-        'md5': '9847b0dad6ac3e074568bf2cfb197de8',
+        'url': 'http://chirb.it/be2abG',
          'info_dict': {
-            'id': 'PrIPv5',
+            'id': 'be2abG',
              'ext': 'mp3',
-            'title': 'Фасадстрой',
-            'duration': 52,
-            'view_count': int,
-            'comment_count': int,
+            'title': 'md5:f542ea253f5255240be4da375c6a5d7e',
+            'description': 'md5:f24a4e22a71763e32da5fed59e47c770',
+            'duration': 306,
+        },
+        'params': {
+            'skip_download': True,
          }
      }, {
          'url': 'https://chirb.it/fb_chirbit_player.swf?key=PrIPv5',
          'only_matching': True,
+    }, {
+        'url': 'https://chirb.it/wp/MN58c2',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -33,38 +37,40 @@ class ChirbitIE(InfoExtractor):
          webpage = self._download_webpage(
              'http://chirb.it/%s' % audio_id, audio_id)
  
-        audio_url = self._search_regex(
-            r'"setFile"\s*,\s*"([^"]+)"', webpage, 'audio url')
+        data_fd = self._search_regex(
+            r'data-fd=(["\'])(?P<url>(?:(?!\1).)+)\1',
+            webpage, 'data fd', group='url')
+
+        # Reverse engineered from https://chirb.it/js/chirbit.player.js (look
+        # for soundURL)
+        audio_url = base64.b64decode(
+            data_fd[::-1].encode('ascii')).decode('utf-8')
  
          title = self._search_regex(
-            r'itemprop="name">([^<]+)', webpage, 'title')
-        duration = parse_duration(self._html_search_meta(
-            'duration', webpage, 'duration', fatal=False))
-        view_count = int_or_none(self._search_regex(
-            r'itemprop="playCount"\s*>(\d+)', webpage,
-            'listen count', fatal=False))
-        comment_count = int_or_none(self._search_regex(
-            r'>(\d+) Comments?:', webpage,
-            'comment count', fatal=False))
+            r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title')
+        description = self._search_regex(
+            r'<h3>Description</h3>\s*<pre[^>]*>([^<]+)</pre>',
+            webpage, 'description', default=None)
+        duration = parse_duration(self._search_regex(
+            r'class=["\']c-length["\'][^>]*>([^<]+)',
+            webpage, 'duration', fatal=False))
  
          return {
              'id': audio_id,
              'url': audio_url,
              'title': title,
+            'description': description,
              'duration': duration,
-            'view_count': view_count,
-            'comment_count': comment_count,
          }
  
  
  class ChirbitProfileIE(InfoExtractor):
      IE_NAME = 'chirbit:profile'
-    _VALID_URL = r'https?://(?:www\.)?chirbit.com/(?:rss/)?(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?chirbit\.com/(?:rss/)?(?P<id>[^/]+)'
      _TEST = {
          'url': 'http://chirbit.com/ScarletBeauty',
          'info_dict': {
              'id': 'ScarletBeauty',
-            'title': 'Chirbits by ScarletBeauty',
          },
          'playlist_mincount': 3,
      }
@@ -72,13 +78,10 @@ class ChirbitProfileIE(InfoExtractor):
      def _real_extract(self, url):
          profile_id = self._match_id(url)
  
-        rss = self._download_xml(
-            'http://chirbit.com/rss/%s' % profile_id, profile_id)
+        webpage = self._download_webpage(url, profile_id)
  
          entries = [
-            self.url_result(audio_url.text, 'Chirbit')
-            for audio_url in rss.findall('./channel/item/link')]
-
-        title = rss.find('./channel/title').text
+            self.url_result(self._proto_relative_url('//chirb.it/' + video_id))
+            for _, video_id in re.findall(r'<input[^>]+id=([\'"])copy-btn-(?P<id>[0-9a-zA-Z]+)\1', webpage)]
  
-        return self.playlist_result(entries, profile_id, title)
+        return self.playlist_result(entries, profile_id)
diff --git a/youtube_dl/extractor/cinemassacre.py b/youtube_dl/extractor/cinemassacre.py

deleted file mode 100644 (file)

index 042c4f2..0000000
--- a/youtube_dl/extractor/cinemassacre.py
+++ /dev/null
@@ -1,119 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import ExtractorError
-from .screenwavemedia import ScreenwaveMediaIE
-
-
-class CinemassacreIE(InfoExtractor):
-    _VALID_URL = 'https?://(?:www\.)?cinemassacre\.com/(?P<date_y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/(?P<display_id>[^?#/]+)'
-    _TESTS = [
-        {
-            'url': 'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
-            'md5': 'fde81fbafaee331785f58cd6c0d46190',
-            'info_dict': {
-                'id': 'Cinemassacre-19911',
-                'ext': 'mp4',
-                'upload_date': '20121110',
-                'title': '“Angry Video Game Nerd: The Movie” – Trailer',
-                'description': 'md5:fb87405fcb42a331742a0dce2708560b',
-            },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            },
-        },
-        {
-            'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
-            'md5': 'd72f10cd39eac4215048f62ab477a511',
-            'info_dict': {
-                'id': 'Cinemassacre-521be8ef82b16',
-                'ext': 'mp4',
-                'upload_date': '20131002',
-                'title': 'The Mummy’s Hand (1940)',
-            },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            },
-        },
-        {
-            # Youtube embedded video
-            'url': 'http://cinemassacre.com/2006/12/07/chronologically-confused-about-bad-movie-and-video-game-sequel-titles/',
-            'md5': 'ec9838a5520ef5409b3e4e42fcb0a3b9',
-            'info_dict': {
-                'id': 'OEVzPCY2T-g',
-                'ext': 'webm',
-                'title': 'AVGN: Chronologically Confused about Bad Movie and Video Game Sequel Titles',
-                'upload_date': '20061207',
-                'uploader': 'Cinemassacre',
-                'uploader_id': 'JamesNintendoNerd',
-                'description': 'md5:784734696c2b8b7f4b8625cc799e07f6',
-            }
-        },
-        {
-            # Youtube embedded video
-            'url': 'http://cinemassacre.com/2006/09/01/mckids/',
-            'md5': '7393c4e0f54602ad110c793eb7a6513a',
-            'info_dict': {
-                'id': 'FnxsNhuikpo',
-                'ext': 'webm',
-                'upload_date': '20060901',
-                'uploader': 'Cinemassacre Extra',
-                'description': 'md5:de9b751efa9e45fbaafd9c8a1123ed53',
-                'uploader_id': 'Cinemassacre',
-                'title': 'AVGN: McKids',
-            }
-        },
-        {
-            'url': 'http://cinemassacre.com/2015/05/25/mario-kart-64-nintendo-64-james-mike-mondays/',
-            'md5': '1376908e49572389e7b06251a53cdd08',
-            'info_dict': {
-                'id': 'Cinemassacre-555779690c440',
-                'ext': 'mp4',
-                'description': 'Let’s Play Mario Kart 64 !! Mario Kart 64 is a classic go-kart racing game released for the Nintendo 64 (N64). Today James & Mike do 4 player Battle Mode with Kyle and Bootsy!',
-                'title': 'Mario Kart 64 (Nintendo 64) James & Mike Mondays',
-                'upload_date': '20150525',
-            },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            },
-        }
-    ]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('display_id')
-        video_date = mobj.group('date_y') + mobj.group('date_m') + mobj.group('date_d')
-
-        webpage = self._download_webpage(url, display_id)
-
-        playerdata_url = self._search_regex(
-            [
-                ScreenwaveMediaIE.EMBED_PATTERN,
-                r'<iframe[^>]+src="(?P<url>(?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
-            ],
-            webpage, 'player data URL', default=None, group='url')
-        if not playerdata_url:
-            raise ExtractorError('Unable to find player data')
-
-        video_title = self._html_search_regex(
-            r'<title>(?P<title>.+?)\|', webpage, 'title')
-        video_description = self._html_search_regex(
-            r'<div class="entry-content">(?P<description>.+?)</div>',
-            webpage, 'description', flags=re.DOTALL, fatal=False)
-        video_thumbnail = self._og_search_thumbnail(webpage)
-
-        return {
-            '_type': 'url_transparent',
-            'display_id': display_id,
-            'title': video_title,
-            'description': video_description,
-            'upload_date': video_date,
-            'thumbnail': video_thumbnail,
-            'url': playerdata_url,
-        }
diff --git a/youtube_dl/extractor/clipfish.py b/youtube_dl/extractor/clipfish.py

index 3a47f6fa4e1cdf734670ff64abb9aa4c02c94a6e..bb52e0c6ff75178626f83cd0a6d2de6607e861ad 100644 (file)
--- a/youtube_dl/extractor/clipfish.py
+++ b/youtube_dl/extractor/clipfish.py
@@ -1,3 +1,4 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -10,15 +11,15 @@ from ..utils import (
  class ClipfishIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)'
      _TEST = {
-        'url': 'http://www.clipfish.de/special/game-trailer/video/3966754/fifa-14-e3-2013-trailer/',
-        'md5': '79bc922f3e8a9097b3d68a93780fd475',
+        'url': 'http://www.clipfish.de/special/ugly-americans/video/4343170/s01-e01-ugly-americans-date-in-der-hoelle/',
+        'md5': '720563e467b86374c194bdead08d207d',
          'info_dict': {
-            'id': '3966754',
+            'id': '4343170',
              'ext': 'mp4',
-            'title': 'FIFA 14 - E3 2013 Trailer',
-            'description': 'Video zu FIFA 14: E3 2013 Trailer',
-            'upload_date': '20130611',
-            'duration': 82,
+            'title': 'S01 E01 - Ugly Americans - Date in der Hölle',
+            'description': 'Mark Lilly arbeitet im Sozialdienst der Stadt New York und soll Immigranten bei ihrer Einbürgerung in die USA zur Seite stehen.',
+            'upload_date': '20161005',
+            'duration': 1291,
              'view_count': int,
          }
      }
@@ -50,10 +51,14 @@ class ClipfishIE(InfoExtractor):
                  'tbr': int_or_none(video_info.get('bitrate')),
              })
  
+        descr = video_info.get('descr')
+        if descr:
+            descr = descr.strip()
+
          return {
              'id': video_id,
              'title': video_info['title'],
-            'description': video_info.get('descr'),
+            'description': descr,
              'formats': formats,
              'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'),
              'duration': int_or_none(video_info.get('media_length')),
diff --git a/youtube_dl/extractor/cliphunter.py b/youtube_dl/extractor/cliphunter.py

index 19f8b397e44a679ea936ad638048ccb488dc4b93..252c2e846969c96d733911f2f471054286ec0777 100644 (file)
--- a/youtube_dl/extractor/cliphunter.py
+++ b/youtube_dl/extractor/cliphunter.py
@@ -23,7 +23,7 @@ class CliphunterIE(InfoExtractor):
          (?P<id>[0-9]+)/
          (?P<seo>.+?)(?:$|[#\?])
      '''
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.cliphunter.com/w/1012420/Fun_Jynx_Maze_solo',
          'md5': 'b7c9bbd4eb3a226ab91093714dcaa480',
          'info_dict': {
@@ -32,8 +32,19 @@ class CliphunterIE(InfoExtractor):
              'title': 'Fun Jynx Maze solo',
              'thumbnail': 're:^https?://.*\.jpg$',
              'age_limit': 18,
-        }
-    }
+        },
+        'skip': 'Video gone',
+    }, {
+        'url': 'http://www.cliphunter.com/w/2019449/ShesNew__My_booty_girlfriend_Victoria_Paradices_pussy_filled_with_jizz',
+        'md5': '55a723c67bfc6da6b0cfa00d55da8a27',
+        'info_dict': {
+            'id': '2019449',
+            'ext': 'mp4',
+            'title': 'ShesNew - My booty girlfriend, Victoria Paradice\'s pussy filled with jizz',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'age_limit': 18,
+        },
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/cliprs.py b/youtube_dl/extractor/cliprs.py

new file mode 100644 (file)

index 0000000..d55b26d
--- /dev/null
+++ b/youtube_dl/extractor/cliprs.py
@@ -0,0 +1,33 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .onet import OnetBaseIE
+
+
+class ClipRsIE(OnetBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
+    _TEST = {
+        'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
+        'md5': 'c412d57815ba07b56f9edc7b5d6a14e5',
+        'info_dict': {
+            'id': '1488842.1399140381',
+            'ext': 'mp4',
+            'title': 'PREMIJERA Frajle predstavljaju novi spot za pesmu Moli me, moli',
+            'description': 'md5:56ce2c3b4ab31c5a2e0b17cb9a453026',
+            'duration': 229,
+            'timestamp': 1459850243,
+            'upload_date': '20160405',
+        }
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        mvp_id = self._search_mvp_id(webpage)
+
+        info_dict = self._extract_from_id(mvp_id, webpage)
+        info_dict['display_id'] = display_id
+
+        return info_dict
diff --git a/youtube_dl/extractor/closertotruth.py b/youtube_dl/extractor/closertotruth.py

new file mode 100644 (file)

index 0000000..26243d5
--- /dev/null
+++ b/youtube_dl/extractor/closertotruth.py
@@ -0,0 +1,92 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class CloserToTruthIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?closertotruth\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'http://closertotruth.com/series/solutions-the-mind-body-problem#video-3688',
+        'info_dict': {
+            'id': '0_zof1ktre',
+            'display_id': 'solutions-the-mind-body-problem',
+            'ext': 'mov',
+            'title': 'Solutions to the Mind-Body Problem?',
+            'upload_date': '20140221',
+            'timestamp': 1392956007,
+            'uploader_id': 'CTTXML'
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://closertotruth.com/episodes/how-do-brains-work',
+        'info_dict': {
+            'id': '0_iuxai6g6',
+            'display_id': 'how-do-brains-work',
+            'ext': 'mov',
+            'title': 'How do Brains Work?',
+            'upload_date': '20140221',
+            'timestamp': 1392956024,
+            'uploader_id': 'CTTXML'
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://closertotruth.com/interviews/1725',
+        'info_dict': {
+            'id': '1725',
+            'title': 'AyaFr-002',
+        },
+        'playlist_mincount': 2,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        partner_id = self._search_regex(
+            r'<script[^>]+src=["\'].*?\b(?:partner_id|p)/(\d+)',
+            webpage, 'kaltura partner_id')
+
+        title = self._search_regex(
+            r'<title>(.+?)\s*\|\s*.+?</title>', webpage, 'video title')
+
+        select = self._search_regex(
+            r'(?s)<select[^>]+id="select-version"[^>]*>(.+?)</select>',
+            webpage, 'select version', default=None)
+        if select:
+            entry_ids = set()
+            entries = []
+            for mobj in re.finditer(
+                    r'<option[^>]+value=(["\'])(?P<id>[0-9a-z_]+)(?:#.+?)?\1[^>]*>(?P<title>[^<]+)',
+                    webpage):
+                entry_id = mobj.group('id')
+                if entry_id in entry_ids:
+                    continue
+                entry_ids.add(entry_id)
+                entries.append({
+                    '_type': 'url_transparent',
+                    'url': 'kaltura:%s:%s' % (partner_id, entry_id),
+                    'ie_key': 'Kaltura',
+                    'title': mobj.group('title'),
+                })
+            if entries:
+                return self.playlist_result(entries, display_id, title)
+
+        entry_id = self._search_regex(
+            r'<a[^>]+id=(["\'])embed-kaltura\1[^>]+data-kaltura=(["\'])(?P<id>[0-9a-z_]+)\2',
+            webpage, 'kaltura entry_id', group='id')
+
+        return {
+            '_type': 'url_transparent',
+            'display_id': display_id,
+            'url': 'kaltura:%s:%s' % (partner_id, entry_id),
+            'ie_key': 'Kaltura',
+            'title': title
+        }
diff --git a/youtube_dl/extractor/cloudy.py b/youtube_dl/extractor/cloudy.py

index 9e267e6c0260e0391ff04b61c613a2fb6d916313..ae5ba0015a0e5026f2db38eaa103417ddc8ce1da 100644 (file)
--- a/youtube_dl/extractor/cloudy.py
+++ b/youtube_dl/extractor/cloudy.py
@@ -6,7 +6,6 @@ import re
  from .common import InfoExtractor
  from ..compat import (
      compat_parse_qs,
-    compat_urllib_parse_urlencode,
      compat_HTTPError,
  )
  from ..utils import (
@@ -17,37 +16,26 @@ from ..utils import (
  
  
  class CloudyIE(InfoExtractor):
-    _IE_DESC = 'cloudy.ec and videoraj.ch'
+    _IE_DESC = 'cloudy.ec'
      _VALID_URL = r'''(?x)
-        https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.ch)/
+        https?://(?:www\.)?cloudy\.ec/
          (?:v/|embed\.php\?id=)
          (?P<id>[A-Za-z0-9]+)
          '''
-    _EMBED_URL = 'http://www.%s/embed.php?id=%s'
-    _API_URL = 'http://www.%s/api/player.api.php?%s'
+    _EMBED_URL = 'http://www.cloudy.ec/embed.php?id=%s'
+    _API_URL = 'http://www.cloudy.ec/api/player.api.php'
      _MAX_TRIES = 2
-    _TESTS = [
-        {
-            'url': 'https://www.cloudy.ec/v/af511e2527aac',
-            'md5': '5cb253ace826a42f35b4740539bedf07',
-            'info_dict': {
-                'id': 'af511e2527aac',
-                'ext': 'flv',
-                'title': 'Funny Cats and Animals Compilation june 2013',
-            }
-        },
-        {
-            'url': 'http://www.videoraj.ch/v/47f399fd8bb60',
-            'md5': '7d0f8799d91efd4eda26587421c3c3b0',
-            'info_dict': {
-                'id': '47f399fd8bb60',
-                'ext': 'flv',
-                'title': 'Burning a New iPhone 5 with Gasoline - Will it Survive?',
-            }
+    _TEST = {
+        'url': 'https://www.cloudy.ec/v/af511e2527aac',
+        'md5': '5cb253ace826a42f35b4740539bedf07',
+        'info_dict': {
+            'id': 'af511e2527aac',
+            'ext': 'flv',
+            'title': 'Funny Cats and Animals Compilation june 2013',
          }
-    ]
+    }
  
-    def _extract_video(self, video_host, video_id, file_key, error_url=None, try_num=0):
+    def _extract_video(self, video_id, file_key, error_url=None, try_num=0):
  
          if try_num > self._MAX_TRIES - 1:
              raise ExtractorError('Unable to extract video URL', expected=True)
@@ -64,9 +52,8 @@ class CloudyIE(InfoExtractor):
                  'errorUrl': error_url,
              })
  
-        data_url = self._API_URL % (video_host, compat_urllib_parse_urlencode(form))
          player_data = self._download_webpage(
-            data_url, video_id, 'Downloading player data')
+            self._API_URL, video_id, 'Downloading player data', query=form)
          data = compat_parse_qs(player_data)
  
          try_num += 1
@@ -88,7 +75,7 @@ class CloudyIE(InfoExtractor):
              except ExtractorError as e:
                  if isinstance(e.cause, compat_HTTPError) and e.cause.code in [404, 410]:
                      self.report_warning('Invalid video URL, requesting another', video_id)
-                    return self._extract_video(video_host, video_id, file_key, video_url, try_num)
+                    return self._extract_video(video_id, file_key, video_url, try_num)
  
          return {
              'id': video_id,
@@ -98,14 +85,13 @@ class CloudyIE(InfoExtractor):
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_host = mobj.group('host')
          video_id = mobj.group('id')
  
-        url = self._EMBED_URL % (video_host, video_id)
+        url = self._EMBED_URL % video_id
          webpage = self._download_webpage(url, video_id)
  
          file_key = self._search_regex(
              [r'key\s*:\s*"([^"]+)"', r'filekey\s*=\s*"([^"]+)"'],
              webpage, 'file_key')
  
-        return self._extract_video(video_host, video_id, file_key)
+        return self._extract_video(video_id, file_key)
diff --git a/youtube_dl/extractor/clubic.py b/youtube_dl/extractor/clubic.py

index 2fba93543474cd7ebd53848aca62848c32bf7164..f7ee3a8f8ebe4715b2d2a5f4634bc50836cc33f7 100644 (file)
--- a/youtube_dl/extractor/clubic.py
+++ b/youtube_dl/extractor/clubic.py
@@ -1,9 +1,6 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import json
-import re
-
  from .common import InfoExtractor
  from ..utils import (
      clean_html,
@@ -30,16 +27,14 @@ class ClubicIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          player_url = 'http://player.m6web.fr/v1/player/clubic/%s.html' % video_id
          player_page = self._download_webpage(player_url, video_id)
  
-        config_json = self._search_regex(
+        config = self._parse_json(self._search_regex(
              r'(?m)M6\.Player\.config\s*=\s*(\{.+?\});$', player_page,
-            'configuration')
-        config = json.loads(config_json)
+            'configuration'), video_id)
  
          video_info = config['videoInfo']
          sources = config['sources']
diff --git a/youtube_dl/extractor/cmt.py b/youtube_dl/extractor/cmt.py

index f1311b14f8f1c572c9647bcd36de3f19de676373..7d3e9b0c9ce89fff9b8094f2d86beaa5fb35e7e0 100644 (file)
--- a/youtube_dl/extractor/cmt.py
+++ b/youtube_dl/extractor/cmt.py
@@ -1,10 +1,12 @@
  from __future__ import unicode_literals
+
  from .mtv import MTVIE
+from ..utils import ExtractorError
  
  
  class CMTIE(MTVIE):
      IE_NAME = 'cmt.com'
-    _VALID_URL = r'https?://www\.cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
      _FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
  
      _TESTS = [{
@@ -16,7 +18,32 @@ class CMTIE(MTVIE):
              'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
              'description': 'Blame It All On My Roots',
          },
+        'skip': 'Video not available',
+    }, {
+        'url': 'http://www.cmt.com/videos/misc/1504699/still-the-king-ep-109-in-3-minutes.jhtml#id=1739908',
+        'md5': 'e61a801ca4a183a466c08bd98dccbb1c',
+        'info_dict': {
+            'id': '1504699',
+            'ext': 'mp4',
+            'title': 'Still The King Ep. 109 in 3 Minutes',
+            'description': 'Relive or catch up with Still The King by watching this recap of season 1, episode 9.',
+            'timestamp': 1469421000.0,
+            'upload_date': '20160725',
+        },
      }, {
          'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172',
          'only_matching': True,
      }]
+
+    @classmethod
+    def _transform_rtmp_url(cls, rtmp_video_url):
+        if 'error_not_available.swf' in rtmp_video_url:
+            raise ExtractorError(
+                '%s said: video is not available' % cls.IE_NAME, expected=True)
+
+        return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url)
+
+    def _extract_mgid(self, webpage):
+        return self._search_regex(
+            r'MTVN\.VIDEO\.contentUri\s*=\s*([\'"])(?P<mgid>.+?)\1',
+            webpage, 'mgid', group='mgid')
diff --git a/youtube_dl/extractor/cnbc.py b/youtube_dl/extractor/cnbc.py

new file mode 100644 (file)

index 0000000..d354d9f
--- /dev/null
+++ b/youtube_dl/extractor/cnbc.py
@@ -0,0 +1,36 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import smuggle_url
+
+
+class CNBCIE(InfoExtractor):
+    _VALID_URL = r'https?://video\.cnbc\.com/gallery/\?video=(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://video.cnbc.com/gallery/?video=3000503714',
+        'info_dict': {
+            'id': '3000503714',
+            'ext': 'mp4',
+            'title': 'Fighting zombies is big business',
+            'description': 'md5:0c100d8e1a7947bd2feec9a5550e519e',
+            'timestamp': 1459332000,
+            'upload_date': '20160330',
+            'uploader': 'NBCU-CNBC',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(
+                'http://link.theplatform.com/s/gZWlPC/media/guid/2408950221/%s?mbr=true&manifest=m3u' % video_id,
+                {'force_smil_url': True}),
+            'id': video_id,
+        }
diff --git a/youtube_dl/extractor/cnet.py b/youtube_dl/extractor/cnet.py

deleted file mode 100644 (file)

index c154b3e..0000000
--- a/youtube_dl/extractor/cnet.py
+++ /dev/null
@@ -1,82 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .theplatform import ThePlatformIE
-from ..utils import int_or_none
-
-
-class CNETIE(ThePlatformIE):
-    _VALID_URL = r'https?://(?:www\.)?cnet\.com/videos/(?P<id>[^/]+)/'
-    _TESTS = [{
-        'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
-        'info_dict': {
-            'id': '56f4ea68-bd21-4852-b08c-4de5b8354c60',
-            'ext': 'flv',
-            'title': 'Hands-on with Microsoft Windows 8.1 Update',
-            'description': 'The new update to the Windows 8 OS brings improved performance for mouse and keyboard users.',
-            'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861',
-            'uploader': 'Sarah Mitroff',
-            'duration': 70,
-        },
-    }, {
-        'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/',
-        'info_dict': {
-            'id': '56527b93-d25d-44e3-b738-f989ce2e49ba',
-            'ext': 'flv',
-            'title': 'Whiny potholes tweet at local government when hit by cars (Tomorrow Daily 187)',
-            'description': 'Khail and Ashley wonder what other civic woes can be solved by self-tweeting objects, investigate a new kind of VR camera and watch an origami robot self-assemble, walk, climb, dig and dissolve. #TDPothole',
-            'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40',
-            'uploader': 'Ashley Esqueda',
-            'duration': 1482,
-        },
-    }]
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-
-        data_json = self._html_search_regex(
-            r"data-cnet-video(?:-uvp)?-options='([^']+)'",
-            webpage, 'data json')
-        data = self._parse_json(data_json, display_id)
-        vdata = data.get('video') or data['videos'][0]
-
-        video_id = vdata['id']
-        title = vdata['title']
-        author = vdata.get('author')
-        if author:
-            uploader = '%s %s' % (author['firstName'], author['lastName'])
-            uploader_id = author.get('id')
-        else:
-            uploader = None
-            uploader_id = None
-
-        metadata = self.get_metadata('kYEXFC/%s' % list(vdata['files'].values())[0], video_id)
-        description = vdata.get('description') or metadata.get('description')
-        duration = int_or_none(vdata.get('duration')) or metadata.get('duration')
-
-        formats = []
-        subtitles = {}
-        for (fkey, vid) in vdata['files'].items():
-            if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
-                continue
-            release_url = 'http://link.theplatform.com/s/kYEXFC/%s?mbr=true' % vid
-            if fkey == 'hds':
-                release_url += '&manifest=f4m'
-            tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey)
-            formats.extend(tp_formats)
-            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': description,
-            'thumbnail': metadata.get('thumbnail'),
-            'duration': duration,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'subtitles': subtitles,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/cnn.py b/youtube_dl/extractor/cnn.py

index 53489a14e38399680c8338f4f22a521f7fa6ad45..5fc311f538eb23b0b16e99f6c5623f0db4290b40 100644 (file)
--- a/youtube_dl/extractor/cnn.py
+++ b/youtube_dl/extractor/cnn.py
@@ -3,15 +3,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    parse_duration,
-    url_basename,
-)
+from .turner import TurnerBaseIE
+from ..utils import url_basename
  
  
-class CNNIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://(?:(?:edition|www)\.)?cnn\.com/video/(?:data/.+?|\?)/
+class CNNIE(TurnerBaseIE):
+    _VALID_URL = r'''(?x)https?://(?:(?P<sub_domain>edition|www|money)\.)?cnn\.com/(?:video/(?:data/.+?|\?)/)?videos?/
          (?P<path>.+?/(?P<title>[^/]+?)(?:\.(?:[a-z\-]+)|(?=&)))'''
  
      _TESTS = [{
@@ -25,6 +22,7 @@ class CNNIE(InfoExtractor):
              'duration': 135,
              'upload_date': '20130609',
          },
+        'expected_warnings': ['Failed to download m3u8 information'],
      }, {
          'url': 'http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29',
          'md5': 'b5cc60c60a3477d185af8f19a2a26f4e',
@@ -34,7 +32,8 @@ class CNNIE(InfoExtractor):
              'title': "Student's epic speech stuns new freshmen",
              'description': "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"",
              'upload_date': '20130821',
-        }
+        },
+        'expected_warnings': ['Failed to download m3u8 information'],
      }, {
          'url': 'http://www.cnn.com/video/data/2.0/video/living/2014/12/22/growing-america-nashville-salemtown-board-episode-1.hln.html',
          'md5': 'f14d02ebd264df951feb2400e2c25a1b',
@@ -44,80 +43,61 @@ class CNNIE(InfoExtractor):
              'title': 'Nashville Ep. 1: Hand crafted skateboards',
              'description': 'md5:e7223a503315c9f150acac52e76de086',
              'upload_date': '20141222',
-        }
+        },
+        'expected_warnings': ['Failed to download m3u8 information'],
+    }, {
+        'url': 'http://money.cnn.com/video/news/2016/08/19/netflix-stunning-stats.cnnmoney/index.html',
+        'md5': '52a515dc1b0f001cd82e4ceda32be9d1',
+        'info_dict': {
+            'id': '/video/news/2016/08/19/netflix-stunning-stats.cnnmoney',
+            'ext': 'mp4',
+            'title': '5 stunning stats about Netflix',
+            'description': 'Did you know that Netflix has more than 80 million members? Here are five facts about the online video distributor that you probably didn\'t know.',
+            'upload_date': '20160819',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }, {
          'url': 'http://cnn.com/video/?/video/politics/2015/03/27/pkg-arizona-senator-church-attendance-mandatory.ktvk',
          'only_matching': True,
      }, {
          'url': 'http://cnn.com/video/?/video/us/2015/04/06/dnt-baker-refuses-anti-gay-order.wkmg',
          'only_matching': True,
+    }, {
+        'url': 'http://edition.cnn.com/videos/arts/2016/04/21/olympic-games-cultural-a-z-brazil.cnn',
+        'only_matching': True,
      }]
  
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        path = mobj.group('path')
-        page_title = mobj.group('title')
-        info_url = 'http://edition.cnn.com/video/data/3.0/%s/index.xml' % path
-        info = self._download_xml(info_url, page_title)
-
-        formats = []
-        rex = re.compile(r'''(?x)
-            (?P<width>[0-9]+)x(?P<height>[0-9]+)
-            (?:_(?P<bitrate>[0-9]+)k)?
-        ''')
-        for f in info.findall('files/file'):
-            video_url = 'http://ht.cdn.turner.com/cnn/big%s' % (f.text.strip())
-            fdct = {
-                'format_id': f.attrib['bitrate'],
-                'url': video_url,
-            }
-
-            mf = rex.match(f.attrib['bitrate'])
-            if mf:
-                fdct['width'] = int(mf.group('width'))
-                fdct['height'] = int(mf.group('height'))
-                fdct['tbr'] = int_or_none(mf.group('bitrate'))
-            else:
-                mf = rex.search(f.text)
-                if mf:
-                    fdct['width'] = int(mf.group('width'))
-                    fdct['height'] = int(mf.group('height'))
-                    fdct['tbr'] = int_or_none(mf.group('bitrate'))
-                else:
-                    mi = re.match(r'ios_(audio|[0-9]+)$', f.attrib['bitrate'])
-                    if mi:
-                        if mi.group(1) == 'audio':
-                            fdct['vcodec'] = 'none'
-                            fdct['ext'] = 'm4a'
-                        else:
-                            fdct['tbr'] = int(mi.group(1))
-
-            formats.append(fdct)
-
-        self._sort_formats(formats)
-
-        thumbnails = [{
-            'height': int(t.attrib['height']),
-            'width': int(t.attrib['width']),
-            'url': t.text,
-        } for t in info.findall('images/image')]
-
-        metas_el = info.find('metas')
-        upload_date = (
-            metas_el.attrib.get('version') if metas_el is not None else None)
+    _CONFIG = {
+        # http://edition.cnn.com/.element/apps/cvp/3.0/cfg/spider/cnn/expansion/config.xml
+        'edition': {
+            'data_src': 'http://edition.cnn.com/video/data/3.0/video/%s/index.xml',
+            'media_src': 'http://pmd.cdn.turner.com/cnn/big',
+        },
+        # http://money.cnn.com/.element/apps/cvp2/cfg/config.xml
+        'money': {
+            'data_src': 'http://money.cnn.com/video/data/4.0/video/%s.xml',
+            'media_src': 'http://ht3.cdn.turner.com/money/big',
+        },
+    }
  
-        duration_el = info.find('length')
-        duration = parse_duration(duration_el.text)
+    def _extract_timestamp(self, video_data):
+        # TODO: fix timestamp extraction
+        return None
  
-        return {
-            'id': info.attrib['id'],
-            'title': info.find('headline').text,
-            'formats': formats,
-            'thumbnails': thumbnails,
-            'description': info.find('description').text,
-            'duration': duration,
-            'upload_date': upload_date,
-        }
+    def _real_extract(self, url):
+        sub_domain, path, page_title = re.match(self._VALID_URL, url).groups()
+        if sub_domain not in ('money', 'edition'):
+            sub_domain = 'edition'
+        config = self._CONFIG[sub_domain]
+        return self._extract_cvp_info(
+            config['data_src'] % path, page_title, {
+                'default': {
+                    'media_src': config['media_src'],
+                }
+            })
  
  
  class CNNBlogsIE(InfoExtractor):
@@ -132,6 +112,7 @@ class CNNBlogsIE(InfoExtractor):
              'description': 'Glenn Greenwald responds to comments made this week on Capitol Hill that journalists could be criminal accessories.',
              'upload_date': '20140209',
          },
+        'expected_warnings': ['Failed to download m3u8 information'],
          'add_ie': ['CNN'],
      }
  
@@ -146,7 +127,7 @@ class CNNBlogsIE(InfoExtractor):
  
  
  class CNNArticleIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!video/)'
+    _VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!videos?/)'
      _TEST = {
          'url': 'http://www.cnn.com/2014/12/21/politics/obama-north-koreas-hack-not-war-but-cyber-vandalism/',
          'md5': '689034c2a3d9c6dc4aa72d65a81efd01',
@@ -154,9 +135,10 @@ class CNNArticleIE(InfoExtractor):
              'id': 'bestoftv/2014/12/21/ip-north-korea-obama.cnn',
              'ext': 'mp4',
              'title': 'Obama: Cyberattack not an act of war',
-            'description': 'md5:51ce6750450603795cad0cdfbd7d05c5',
+            'description': 'md5:0a802a40d2376f60e6b04c8d5bcebc4b',
              'upload_date': '20141221',
          },
+        'expected_warnings': ['Failed to download m3u8 information'],
          'add_ie': ['CNN'],
      }
  
diff --git a/youtube_dl/extractor/collegehumor.py b/youtube_dl/extractor/collegehumor.py

deleted file mode 100644 (file)

index 002b240..0000000
--- a/youtube_dl/extractor/collegehumor.py
+++ /dev/null
@@ -1,101 +0,0 @@
-from __future__ import unicode_literals
-
-import json
-import re
-
-from .common import InfoExtractor
-from ..utils import int_or_none
-
-
-class CollegeHumorIE(InfoExtractor):
-    _VALID_URL = r'^(?:https?://)?(?:www\.)?collegehumor\.com/(video|embed|e)/(?P<videoid>[0-9]+)/?(?P<shorttitle>.*)$'
-
-    _TESTS = [
-        {
-            'url': 'http://www.collegehumor.com/video/6902724/comic-con-cosplay-catastrophe',
-            'md5': 'dcc0f5c1c8be98dc33889a191f4c26bd',
-            'info_dict': {
-                'id': '6902724',
-                'ext': 'mp4',
-                'title': 'Comic-Con Cosplay Catastrophe',
-                'description': "Fans get creative this year at San Diego.  Too creative.  And yes, that's really Joss Whedon.",
-                'age_limit': 13,
-                'duration': 187,
-            },
-        }, {
-            'url': 'http://www.collegehumor.com/video/3505939/font-conference',
-            'md5': '72fa701d8ef38664a4dbb9e2ab721816',
-            'info_dict': {
-                'id': '3505939',
-                'ext': 'mp4',
-                'title': 'Font Conference',
-                'description': "This video wasn't long enough, so we made it double-spaced.",
-                'age_limit': 10,
-                'duration': 179,
-            },
-        }, {
-            # embedded youtube video
-            'url': 'http://www.collegehumor.com/embed/6950306',
-            'info_dict': {
-                'id': 'Z-bao9fg6Yc',
-                'ext': 'mp4',
-                'title': 'Young Americans Think President John F. Kennedy Died THIS MORNING IN A CAR ACCIDENT!!!',
-                'uploader': 'Mark Dice',
-                'uploader_id': 'MarkDice',
-                'description': 'md5:62c3dab9351fac7bb44b53b69511d87f',
-                'upload_date': '20140127',
-            },
-            'params': {
-                'skip_download': True,
-            },
-            'add_ie': ['Youtube'],
-        },
-    ]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('videoid')
-
-        jsonUrl = 'http://www.collegehumor.com/moogaloop/video/' + video_id + '.json'
-        data = json.loads(self._download_webpage(
-            jsonUrl, video_id, 'Downloading info JSON'))
-        vdata = data['video']
-        if vdata.get('youtubeId') is not None:
-            return {
-                '_type': 'url',
-                'url': vdata['youtubeId'],
-                'ie_key': 'Youtube',
-            }
-
-        AGE_LIMITS = {'nc17': 18, 'r': 18, 'pg13': 13, 'pg': 10, 'g': 0}
-        rating = vdata.get('rating')
-        if rating:
-            age_limit = AGE_LIMITS.get(rating.lower())
-        else:
-            age_limit = None  # None = No idea
-
-        PREFS = {'high_quality': 2, 'low_quality': 0}
-        formats = []
-        for format_key in ('mp4', 'webm'):
-            for qname, qurl in vdata.get(format_key, {}).items():
-                formats.append({
-                    'format_id': format_key + '_' + qname,
-                    'url': qurl,
-                    'format': format_key,
-                    'preference': PREFS.get(qname),
-                })
-        self._sort_formats(formats)
-
-        duration = int_or_none(vdata.get('duration'), 1000)
-        like_count = int_or_none(vdata.get('likes'))
-
-        return {
-            'id': video_id,
-            'title': vdata['title'],
-            'description': vdata.get('description'),
-            'thumbnail': vdata.get('thumbnail'),
-            'formats': formats,
-            'age_limit': age_limit,
-            'duration': duration,
-            'like_count': like_count,
-        }
diff --git a/youtube_dl/extractor/comcarcoff.py b/youtube_dl/extractor/comcarcoff.py

index e697d14107534e57845ea661864826ec4843735d..588aad0d911038229a4a3a97e5c74284f7bafc56 100644 (file)
--- a/youtube_dl/extractor/comcarcoff.py
+++ b/youtube_dl/extractor/comcarcoff.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -41,7 +41,13 @@ class ComCarCoffIE(InfoExtractor):
  
          display_id = full_data['activeVideo']['video']
          video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
+
          video_id = compat_str(video_data['mediaId'])
+        title = video_data['title']
+        formats = self._extract_m3u8_formats(
+            video_data['mediaUrl'], video_id, 'mp4')
+        self._sort_formats(formats)
+
          thumbnails = [{
              'url': video_data['images']['thumb'],
          }, {
@@ -54,15 +60,14 @@ class ComCarCoffIE(InfoExtractor):
              video_data.get('duration'))
  
          return {
-            '_type': 'url_transparent',
-            'url': 'crackle:%s' % video_id,
              'id': video_id,
              'display_id': display_id,
-            'title': video_data['title'],
+            'title': title,
              'description': video_data.get('description'),
              'timestamp': timestamp,
              'duration': duration,
              'thumbnails': thumbnails,
+            'formats': formats,
              'season_number': int_or_none(video_data.get('season')),
              'episode_number': int_or_none(video_data.get('episode')),
              'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),
diff --git a/youtube_dl/extractor/comedycentral.py b/youtube_dl/extractor/comedycentral.py

index 0c59102e072594857cc0f1c53e15c183b1885a93..88346dde7754a124e2b1d88d5ab8291dca4ca632 100644 (file)
--- a/youtube_dl/extractor/comedycentral.py
+++ b/youtube_dl/extractor/comedycentral.py
@@ -1,17 +1,7 @@
  from __future__ import unicode_literals
  
-import re
-
  from .mtv import MTVServicesInfoExtractor
-from ..compat import (
-    compat_str,
-    compat_urllib_parse_urlencode,
-)
-from ..utils import (
-    ExtractorError,
-    float_or_none,
-    unified_strdate,
-)
+from .common import InfoExtractor
  
  
  class ComedyCentralIE(MTVServicesInfoExtractor):
@@ -26,8 +16,10 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
          'info_dict': {
              'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
              'ext': 'mp4',
-            'title': 'CC:Stand-Up|Greg Fitzsimmons: Life on Stage|Uncensored - Too Good of a Mother',
+            'title': 'CC:Stand-Up|August 18, 2013|1|0101|Uncensored - Too Good of a Mother',
              'description': 'After a certain point, breastfeeding becomes c**kblocking.',
+            'timestamp': 1376798400,
+            'upload_date': '20130818',
          },
      }, {
          'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview',
@@ -35,238 +27,92 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
      }]
  
  
-class ComedyCentralShowsIE(MTVServicesInfoExtractor):
-    IE_DESC = 'The Daily Show / The Colbert Report'
-    # urls can be abbreviations like :thedailyshow
-    # urls for episodes like:
-    # or urls for clips like: http://www.thedailyshow.com/watch/mon-december-10-2012/any-given-gun-day
-    #                     or: http://www.colbertnation.com/the-colbert-report-videos/421667/november-29-2012/moon-shattering-news
-    #                     or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
-    _VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow)
-                      |https?://(:www\.)?
-                          (?P<showname>thedailyshow|thecolbertreport)\.(?:cc\.)?com/
-                         ((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
-                          (?P<clip>
-                              (?:(?:guests/[^/]+|videos|video-playlists|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+))
-                              |(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
-                              |(watch/(?P<date>[^/]*)/(?P<tdstitle>.*))
-                          )|
-                          (?P<interview>
-                              extended-interviews/(?P<interID>[0-9a-z]+)/
-                              (?:playlist_tds_extended_)?(?P<interview_title>[^/?#]*?)
-                              (?:/[^/?#]?|[?#]|$))))
-                     '''
+class ToshIE(MTVServicesInfoExtractor):
+    IE_DESC = 'Tosh.0'
+    _VALID_URL = r'^https?://tosh\.cc\.com/video-(?:clips|collections)/[^/]+/(?P<videotitle>[^/?#]+)'
+    _FEED_URL = 'http://tosh.cc.com/feeds/mrss'
+
      _TESTS = [{
-        'url': 'http://thedailyshow.cc.com/watch/thu-december-13-2012/kristen-stewart',
-        'md5': '4e2f5cb088a83cd8cdb7756132f9739d',
+        'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
          'info_dict': {
-            'id': 'ab9ab3e7-5a98-4dbe-8b21-551dc0523d55',
-            'ext': 'mp4',
-            'upload_date': '20121213',
-            'description': 'Kristen Stewart learns to let loose in "On the Road."',
-            'uploader': 'thedailyshow',
-            'title': 'thedailyshow kristen-stewart part 1',
-        }
-    }, {
-        'url': 'http://thedailyshow.cc.com/extended-interviews/b6364d/sarah-chayes-extended-interview',
-        'info_dict': {
-            'id': 'sarah-chayes-extended-interview',
-            'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
-            'title': 'thedailyshow Sarah Chayes Extended Interview',
+            'description': 'Tosh asked fans to share their summer plans.',
+            'title': 'Twitter Users Share Summer Plans',
          },
-        'playlist': [
-            {
-                'info_dict': {
-                    'id': '0baad492-cbec-4ec1-9e50-ad91c291127f',
-                    'ext': 'mp4',
-                    'upload_date': '20150129',
-                    'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
-                    'uploader': 'thedailyshow',
-                    'title': 'thedailyshow sarah-chayes-extended-interview part 1',
+        'playlist': [{
+            'md5': 'f269e88114c1805bb6d7653fecea9e06',
+            'info_dict': {
+                'id': '90498ec2-ed00-11e0-aca6-0026b9414f30',
+                'ext': 'mp4',
+                'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans',
+                'description': 'Tosh asked fans to share their summer plans.',
+                'thumbnail': 're:^https?://.*\.jpg',
+                # It's really reported to be published on year 2077
+                'upload_date': '20770610',
+                'timestamp': 3390510600,
+                'subtitles': {
+                    'en': 'mincount:3',
                  },
              },
-            {
-                'info_dict': {
-                    'id': '1e4fb91b-8ce7-4277-bd7c-98c9f1bbd283',
-                    'ext': 'mp4',
-                    'upload_date': '20150129',
-                    'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
-                    'uploader': 'thedailyshow',
-                    'title': 'thedailyshow sarah-chayes-extended-interview part 2',
-                },
-            },
-        ],
+        }]
+    }, {
+        'url': 'http://tosh.cc.com/video-collections/x2iz7k/just-plain-foul/m5q4fp',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def _transform_rtmp_url(cls, rtmp_video_url):
+        new_urls = super(ToshIE, cls)._transform_rtmp_url(rtmp_video_url)
+        new_urls['rtmp'] = rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm')
+        return new_urls
+
+
+class ComedyCentralTVIE(MTVServicesInfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'http://www.comedycentral.tv/staffeln/7436-the-mindy-project-staffel-4',
+        'info_dict': {
+            'id': 'local_playlist-f99b626bdfe13568579a',
+            'ext': 'flv',
+            'title': 'Episode_the-mindy-project_shows_season-4_episode-3_full-episode_part1',
+        },
          'params': {
+            # rtmp download
              'skip_download': True,
          },
      }, {
-        'url': 'http://thedailyshow.cc.com/extended-interviews/xm3fnq/andrew-napolitano-extended-interview',
+        'url': 'http://www.comedycentral.tv/shows/1074-workaholics',
          'only_matching': True,
      }, {
-        'url': 'http://thecolbertreport.cc.com/videos/29w6fx/-realhumanpraise-for-fox-news',
-        'only_matching': True,
-    }, {
-        'url': 'http://thecolbertreport.cc.com/videos/gh6urb/neil-degrasse-tyson-pt--1?xrs=eml_col_031114',
-        'only_matching': True,
-    }, {
-        'url': 'http://thedailyshow.cc.com/guests/michael-lewis/3efna8/exclusive---michael-lewis-extended-interview-pt--3',
-        'only_matching': True,
-    }, {
-        'url': 'http://thedailyshow.cc.com/episodes/sy7yv0/april-8--2014---denis-leary',
-        'only_matching': True,
-    }, {
-        'url': 'http://thecolbertreport.cc.com/episodes/8ase07/april-8--2014---jane-goodall',
-        'only_matching': True,
-    }, {
-        'url': 'http://thedailyshow.cc.com/video-playlists/npde3s/the-daily-show-19088-highlights',
-        'only_matching': True,
-    }, {
-        'url': 'http://thedailyshow.cc.com/video-playlists/t6d9sg/the-daily-show-20038-highlights/be3cwo',
-        'only_matching': True,
-    }, {
-        'url': 'http://thedailyshow.cc.com/special-editions/2l8fdb/special-edition---a-look-back-at-food',
-        'only_matching': True,
-    }, {
-        'url': 'http://thedailyshow.cc.com/news-team/michael-che/7wnfel/we-need-to-talk-about-israel',
+        'url': 'http://www.comedycentral.tv/shows/1727-the-mindy-project/bonus',
          'only_matching': True,
      }]
  
-    _available_formats = ['3500', '2200', '1700', '1200', '750', '400']
-
-    _video_extensions = {
-        '3500': 'mp4',
-        '2200': 'mp4',
-        '1700': 'mp4',
-        '1200': 'mp4',
-        '750': 'mp4',
-        '400': 'mp4',
-    }
-    _video_dimensions = {
-        '3500': (1280, 720),
-        '2200': (960, 540),
-        '1700': (768, 432),
-        '1200': (640, 360),
-        '750': (512, 288),
-        '400': (384, 216),
-    }
-
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-
-        if mobj.group('shortname'):
-            return self.url_result('http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes')
-
-        if mobj.group('clip'):
-            if mobj.group('videotitle'):
-                epTitle = mobj.group('videotitle')
-            elif mobj.group('showname') == 'thedailyshow':
-                epTitle = mobj.group('tdstitle')
-            else:
-                epTitle = mobj.group('cntitle')
-            dlNewest = False
-        elif mobj.group('interview'):
-            epTitle = mobj.group('interview_title')
-            dlNewest = False
-        else:
-            dlNewest = not mobj.group('episode')
-            if dlNewest:
-                epTitle = mobj.group('showname')
-            else:
-                epTitle = mobj.group('episode')
-        show_name = mobj.group('showname')
-
-        webpage, htmlHandle = self._download_webpage_handle(url, epTitle)
-        if dlNewest:
-            url = htmlHandle.geturl()
-            mobj = re.match(self._VALID_URL, url, re.VERBOSE)
-            if mobj is None:
-                raise ExtractorError('Invalid redirected URL: ' + url)
-            if mobj.group('episode') == '':
-                raise ExtractorError('Redirected URL is still not specific: ' + url)
-            epTitle = (mobj.group('episode') or mobj.group('videotitle')).rpartition('/')[-1]
-
-        mMovieParams = re.findall('(?:<param name="movie" value="|var url = ")(http://media.mtvnservices.com/([^"]*(?:episode|video).*?:.*?))"', webpage)
-        if len(mMovieParams) == 0:
-            # The Colbert Report embeds the information in a without
-            # a URL prefix; so extract the alternate reference
-            # and then add the URL prefix manually.
+        video_id = self._match_id(url)
  
-            altMovieParams = re.findall('data-mgid="([^"]*(?:episode|video|playlist).*?:.*?)"', webpage)
-            if len(altMovieParams) == 0:
-                raise ExtractorError('unable to find Flash URL in webpage ' + url)
-            else:
-                mMovieParams = [('http://media.mtvnservices.com/' + altMovieParams[0], altMovieParams[0])]
+        webpage = self._download_webpage(url, video_id)
  
-        uri = mMovieParams[0][1]
-        # Correct cc.com in uri
-        uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.com', uri)
+        mrss_url = self._search_regex(
+            r'data-mrss=(["\'])(?P<url>(?:(?!\1).)+)\1',
+            webpage, 'mrss url', group='url')
  
-        index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse_urlencode({'uri': uri}))
-        idoc = self._download_xml(
-            index_url, epTitle,
-            'Downloading show index', 'Unable to download episode index')
+        return self._get_videos_info_from_url(mrss_url, video_id)
  
-        title = idoc.find('./channel/title').text
-        description = idoc.find('./channel/description').text
  
-        entries = []
-        item_els = idoc.findall('.//item')
-        for part_num, itemEl in enumerate(item_els):
-            upload_date = unified_strdate(itemEl.findall('./pubDate')[0].text)
-            thumbnail = itemEl.find('.//{http://search.yahoo.com/mrss/}thumbnail').attrib.get('url')
-
-            content = itemEl.find('.//{http://search.yahoo.com/mrss/}content')
-            duration = float_or_none(content.attrib.get('duration'))
-            mediagen_url = content.attrib['url']
-            guid = itemEl.find('./guid').text.rpartition(':')[-1]
-
-            cdoc = self._download_xml(
-                mediagen_url, epTitle,
-                'Downloading configuration for segment %d / %d' % (part_num + 1, len(item_els)))
-
-            turls = []
-            for rendition in cdoc.findall('.//rendition'):
-                finfo = (rendition.attrib['bitrate'], rendition.findall('./src')[0].text)
-                turls.append(finfo)
-
-            formats = []
-            for format, rtmp_video_url in turls:
-                w, h = self._video_dimensions.get(format, (None, None))
-                formats.append({
-                    'format_id': 'vhttp-%s' % format,
-                    'url': self._transform_rtmp_url(rtmp_video_url),
-                    'ext': self._video_extensions.get(format, 'mp4'),
-                    'height': h,
-                    'width': w,
-                })
-                formats.append({
-                    'format_id': 'rtmp-%s' % format,
-                    'url': rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm'),
-                    'ext': self._video_extensions.get(format, 'mp4'),
-                    'height': h,
-                    'width': w,
-                })
-                self._sort_formats(formats)
-
-            subtitles = self._extract_subtitles(cdoc, guid)
-
-            virtual_id = show_name + ' ' + epTitle + ' part ' + compat_str(part_num + 1)
-            entries.append({
-                'id': guid,
-                'title': virtual_id,
-                'formats': formats,
-                'uploader': show_name,
-                'upload_date': upload_date,
-                'duration': duration,
-                'thumbnail': thumbnail,
-                'description': description,
-                'subtitles': subtitles,
-            })
+class ComedyCentralShortnameIE(InfoExtractor):
+    _VALID_URL = r'^:(?P<id>tds|thedailyshow)$'
+    _TESTS = [{
+        'url': ':tds',
+        'only_matching': True,
+    }, {
+        'url': ':thedailyshow',
+        'only_matching': True,
+    }]
  
-        return {
-            '_type': 'playlist',
-            'id': epTitle,
-            'entries': entries,
-            'title': show_name + ' ' + title,
-            'description': description,
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        shortcut_map = {
+            'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
+            'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
          }
+        return self.url_result(shortcut_map[video_id])
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index 9b7ab8924153e3a10e3e405d058d6199d2aad550..05c51fac9b0b4162fb126cb79a79d871b591ead8 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -21,13 +21,16 @@ from ..compat import (
      compat_os_name,
      compat_str,
      compat_urllib_error,
+    compat_urllib_parse_unquote,
      compat_urllib_parse_urlencode,
+    compat_urllib_request,
      compat_urlparse,
  )
  from ..downloader.f4m import remove_encrypted_media
  from ..utils import (
      NO_DEFAULT,
      age_restricted,
+    base_url,
      bug_reports_message,
      clean_html,
      compiled_regex_type,
@@ -43,13 +46,19 @@ from ..utils import (
      sanitized_Request,
      unescapeHTML,
      unified_strdate,
+    unified_timestamp,
      url_basename,
+    xpath_element,
      xpath_text,
      xpath_with_ns,
      determine_protocol,
      parse_duration,
      mimetype2ext,
+    update_Request,
      update_url_query,
+    parse_m3u8_attributes,
+    extract_attributes,
+    parse_codecs,
  )
  
  
@@ -80,6 +89,9 @@ class InfoExtractor(object):
  
                      Potential fields:
                      * url        Mandatory. The URL of the video file
+                    * manifest_url
+                                 The URL of the manifest file in case of
+                                 fragmented media (DASH, hls, hds)
                      * ext        Will be calculated from URL if missing
                      * format     A human-readable description of the format
                                   ("mp4 container with h264/opus").
@@ -108,6 +120,11 @@ class InfoExtractor(object):
                                   download, lower-case.
                                   "http", "https", "rtsp", "rtmp", "rtmpe",
                                   "m3u8", "m3u8_native" or "http_dash_segments".
+                    * fragments  A list of fragments of the fragmented media,
+                                 with the following entries:
+                                 * "url" (mandatory) - fragment's URL
+                                 * "duration" (optional, int or float)
+                                 * "filesize" (optional, int)
                      * preference Order number of this format. If this field is
                                   present and not None, the formats get sorted
                                   by this field, regardless of all other values.
@@ -157,11 +174,12 @@ class InfoExtractor(object):
                          * "height" (optional, int)
                          * "resolution" (optional, string "{width}x{height"},
                                          deprecated)
+                        * "filesize" (optional, int)
      thumbnail:      Full URL to a video thumbnail image.
      description:    Full video description.
      uploader:       Full name of the video uploader.
      license:        License name the video is licensed under.
-    creator:        The main artist who created the video.
+    creator:        The creator of the video.
      release_date:   The date (YYYYMMDD) when the video was released.
      timestamp:      UNIX timestamp of the moment the video became available.
      upload_date:    Video upload date (YYYYMMDD).
@@ -218,7 +236,7 @@ class InfoExtractor(object):
      chapter_id:     Id of the chapter the video belongs to, as a unicode string.
  
      The following fields should only be used when the video is an episode of some
-    series or programme:
+    series, programme or podcast:
  
      series:         Title of the series or programme the video episode belongs to.
      season:         Title of the season the video episode belongs to.
@@ -230,6 +248,24 @@ class InfoExtractor(object):
      episode_number: Number of the video episode within a season, as an integer.
      episode_id:     Id of the video episode, as a unicode string.
  
+    The following fields should only be used when the media is a track or a part of
+    a music album:
+
+    track:          Title of the track.
+    track_number:   Number of the track within an album or a disc, as an integer.
+    track_id:       Id of the track (useful in case of custom indexing, e.g. 6.iii),
+                    as a unicode string.
+    artist:         Artist(s) of the track.
+    genre:          Genre(s) of the track.
+    album:          Title of the album the track belongs to.
+    album_type:     Type of the album (e.g. "Demo", "Full-length", "Split", "Compilation", etc).
+    album_artist:   List of all artists appeared on the album (e.g.
+                    "Ash Borer / Fell Voices" or "Various Artists", useful for splits
+                    and compilations).
+    disc_number:    Number of the disc or other physical medium the track belongs to,
+                    as an integer.
+    release_year:   Year (YYYY) when the album was released.
+
      Unless mentioned otherwise, the fields should be Unicode strings.
  
      Unless mentioned otherwise, None is equivalent to absence of information.
@@ -347,7 +383,7 @@ class InfoExtractor(object):
      def IE_NAME(self):
          return compat_str(type(self).__name__[:-2])
  
-    def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers=None, query=None):
+    def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
          """ Returns the response handle """
          if note is None:
              self.report_download_webpage(video_id)
@@ -356,12 +392,14 @@ class InfoExtractor(object):
                  self.to_screen('%s' % (note,))
              else:
                  self.to_screen('%s: %s' % (video_id, note))
-        # data, headers and query params will be ignored for `Request` objects
-        if isinstance(url_or_request, compat_str):
+        if isinstance(url_or_request, compat_urllib_request.Request):
+            url_or_request = update_Request(
+                url_or_request, data=data, headers=headers, query=query)
+        else:
              if query:
                  url_or_request = update_url_query(url_or_request, query)
-            if data or headers:
-                url_or_request = sanitized_Request(url_or_request, data, headers or {})
+            if data is not None or headers:
+                url_or_request = sanitized_Request(url_or_request, data, headers)
          try:
              return self._downloader.urlopen(url_or_request)
          except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
@@ -377,7 +415,7 @@ class InfoExtractor(object):
                  self._downloader.report_warning(errmsg)
                  return False
  
-    def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers=None, query=None):
+    def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}):
          """ Returns a tuple (page content as string, URL handle) """
          # Strip hashes from the URL (#1038)
          if isinstance(url_or_request, (compat_str, str)):
@@ -470,7 +508,7 @@ class InfoExtractor(object):
  
          return content
  
-    def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers=None, query=None):
+    def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers={}, query={}):
          """ Returns the data of the page as a string """
          success = False
          try_count = 0
@@ -491,7 +529,7 @@ class InfoExtractor(object):
  
      def _download_xml(self, url_or_request, video_id,
                        note='Downloading XML', errnote='Unable to download XML',
-                      transform_source=None, fatal=True, encoding=None, data=None, headers=None, query=None):
+                      transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
          """Return the xml as an xml.etree.ElementTree.Element"""
          xml_string = self._download_webpage(
              url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query)
@@ -505,7 +543,7 @@ class InfoExtractor(object):
                         note='Downloading JSON metadata',
                         errnote='Unable to download JSON metadata',
                         transform_source=None,
-                       fatal=True, encoding=None, data=None, headers=None, query=None):
+                       fatal=True, encoding=None, data=None, headers={}, query={}):
          json_string = self._download_webpage(
              url_or_request, video_id, note, errnote, fatal=fatal,
              encoding=encoding, data=data, headers=headers, query=query)
@@ -634,35 +672,48 @@ class InfoExtractor(object):
          else:
              return res
  
-    def _get_login_info(self):
+    def _get_netrc_login_info(self, netrc_machine=None):
+        username = None
+        password = None
+        netrc_machine = netrc_machine or self._NETRC_MACHINE
+
+        if self._downloader.params.get('usenetrc', False):
+            try:
+                info = netrc.netrc().authenticators(netrc_machine)
+                if info is not None:
+                    username = info[0]
+                    password = info[2]
+                else:
+                    raise netrc.NetrcParseError(
+                        'No authenticators for %s' % netrc_machine)
+            except (IOError, netrc.NetrcParseError) as err:
+                self._downloader.report_warning(
+                    'parsing .netrc: %s' % error_to_compat_str(err))
+
+        return username, password
+
+    def _get_login_info(self, username_option='username', password_option='password', netrc_machine=None):
          """
          Get the login info as (username, password)
-        It will look in the netrc file using the _NETRC_MACHINE value
+        First look for the manually specified credentials using username_option
+        and password_option as keys in params dictionary. If no such credentials
+        available look in the netrc file using the netrc_machine or _NETRC_MACHINE
+        value.
          If there's no info available, return (None, None)
          """
          if self._downloader is None:
              return (None, None)
  
-        username = None
-        password = None
          downloader_params = self._downloader.params
  
          # Attempt to use provided username and password or .netrc data
-        if downloader_params.get('username') is not None:
-            username = downloader_params['username']
-            password = downloader_params['password']
-        elif downloader_params.get('usenetrc', False):
-            try:
-                info = netrc.netrc().authenticators(self._NETRC_MACHINE)
-                if info is not None:
-                    username = info[0]
-                    password = info[2]
-                else:
-                    raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
-            except (IOError, netrc.NetrcParseError) as err:
-                self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
+        if downloader_params.get(username_option) is not None:
+            username = downloader_params[username_option]
+            password = downloader_params[password_option]
+        else:
+            username, password = self._get_netrc_login_info(netrc_machine)
  
-        return (username, password)
+        return username, password
  
      def _get_tfa_info(self, note='two-factor verification code'):
          """
@@ -699,9 +750,14 @@ class InfoExtractor(object):
                      [^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop)
  
      def _og_search_property(self, prop, html, name=None, **kargs):
+        if not isinstance(prop, (list, tuple)):
+            prop = [prop]
          if name is None:
-            name = 'OpenGraph %s' % prop
-        escaped = self._search_regex(self._og_regexes(prop), html, name, flags=re.DOTALL, **kargs)
+            name = 'OpenGraph %s' % prop[0]
+        og_regexes = []
+        for p in prop:
+            og_regexes.extend(self._og_regexes(p))
+        escaped = self._search_regex(og_regexes, html, name, flags=re.DOTALL, **kargs)
          if escaped is None:
              return None
          return unescapeHTML(escaped)
@@ -725,10 +781,12 @@ class InfoExtractor(object):
          return self._og_search_property('url', html, **kargs)
  
      def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs):
+        if not isinstance(name, (list, tuple)):
+            name = [name]
          if display_name is None:
-            display_name = name
+            display_name = name[0]
          return self._html_search_regex(
-            self._meta_regex(name),
+            [self._meta_regex(n) for n in name],
              html, display_name, fatal=fatal, group='content', **kwargs)
  
      def _dc_search_uploader(self, html):
@@ -777,56 +835,82 @@ class InfoExtractor(object):
          return self._html_search_meta('twitter:player', html,
                                        'twitter card player')
  
-    def _search_json_ld(self, html, video_id, **kwargs):
+    def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
          json_ld = self._search_regex(
              r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
              html, 'JSON-LD', group='json_ld', **kwargs)
+        default = kwargs.get('default', NO_DEFAULT)
          if not json_ld:
-            return {}
-        return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True))
-
-    def _json_ld(self, json_ld, video_id, fatal=True):
+            return default if default is not NO_DEFAULT else {}
+        # JSON-LD may be malformed and thus `fatal` should be respected.
+        # At the same time `default` may be passed that assumes `fatal=False`
+        # for _search_regex. Let's simulate the same behavior here as well.
+        fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
+        return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
+
+    def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
          if isinstance(json_ld, compat_str):
              json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
          if not json_ld:
              return {}
          info = {}
-        if json_ld.get('@context') == 'http://schema.org':
-            item_type = json_ld.get('@type')
-            if item_type == 'TVEpisode':
-                info.update({
-                    'episode': unescapeHTML(json_ld.get('name')),
-                    'episode_number': int_or_none(json_ld.get('episodeNumber')),
-                    'description': unescapeHTML(json_ld.get('description')),
-                })
-                part_of_season = json_ld.get('partOfSeason')
-                if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
-                    info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
-                part_of_series = json_ld.get('partOfSeries')
-                if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
-                    info['series'] = unescapeHTML(part_of_series.get('name'))
-            elif item_type == 'Article':
-                info.update({
-                    'timestamp': parse_iso8601(json_ld.get('datePublished')),
-                    'title': unescapeHTML(json_ld.get('headline')),
-                    'description': unescapeHTML(json_ld.get('articleBody')),
-                })
+        if not isinstance(json_ld, (list, tuple, dict)):
+            return info
+        if isinstance(json_ld, dict):
+            json_ld = [json_ld]
+        for e in json_ld:
+            if e.get('@context') == 'http://schema.org':
+                item_type = e.get('@type')
+                if expected_type is not None and expected_type != item_type:
+                    return info
+                if item_type == 'TVEpisode':
+                    info.update({
+                        'episode': unescapeHTML(e.get('name')),
+                        'episode_number': int_or_none(e.get('episodeNumber')),
+                        'description': unescapeHTML(e.get('description')),
+                    })
+                    part_of_season = e.get('partOfSeason')
+                    if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
+                        info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
+                    part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
+                    if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
+                        info['series'] = unescapeHTML(part_of_series.get('name'))
+                elif item_type == 'Article':
+                    info.update({
+                        'timestamp': parse_iso8601(e.get('datePublished')),
+                        'title': unescapeHTML(e.get('headline')),
+                        'description': unescapeHTML(e.get('articleBody')),
+                    })
+                elif item_type == 'VideoObject':
+                    info.update({
+                        'url': e.get('contentUrl'),
+                        'title': unescapeHTML(e.get('name')),
+                        'description': unescapeHTML(e.get('description')),
+                        'thumbnail': e.get('thumbnailUrl') or e.get('thumbnailURL'),
+                        'duration': parse_duration(e.get('duration')),
+                        'timestamp': unified_timestamp(e.get('uploadDate')),
+                        'filesize': float_or_none(e.get('contentSize')),
+                        'tbr': int_or_none(e.get('bitrate')),
+                        'width': int_or_none(e.get('width')),
+                        'height': int_or_none(e.get('height')),
+                    })
+                break
          return dict((k, v) for k, v in info.items() if v is not None)
  
      @staticmethod
      def _hidden_inputs(html):
          html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
          hidden_inputs = {}
-        for input in re.findall(r'(?i)<input([^>]+)>', html):
-            if not re.search(r'type=(["\'])(?:hidden|submit)\1', input):
+        for input in re.findall(r'(?i)(<input[^>]+>)', html):
+            attrs = extract_attributes(input)
+            if not input:
                  continue
-            name = re.search(r'name=(["\'])(?P<value>.+?)\1', input)
-            if not name:
+            if attrs.get('type') not in ('hidden', 'submit'):
                  continue
-            value = re.search(r'value=(["\'])(?P<value>.*?)\1', input)
-            if not value:
-                continue
-            hidden_inputs[name.group('value')] = value.group('value')
+            name = attrs.get('name') or attrs.get('id')
+            value = attrs.get('value')
+            if name and value is not None:
+                hidden_inputs[name] = value
          return hidden_inputs
  
      def _form_hidden_inputs(self, form_id, html):
@@ -852,7 +936,11 @@ class InfoExtractor(object):
                  f['ext'] = determine_ext(f['url'])
  
              if isinstance(field_preference, (list, tuple)):
-                return tuple(f.get(field) if f.get(field) is not None else -1 for field in field_preference)
+                return tuple(
+                    f.get(field)
+                    if f.get(field) is not None
+                    else ('' if field == 'format_id' else -1)
+                    for field in field_preference)
  
              preference = f.get('preference')
              if preference is None:
@@ -860,7 +948,8 @@ class InfoExtractor(object):
                  if f.get('ext') in ['f4f', 'f4m']:  # Not yet supported
                      preference -= 0.5
  
-            proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1
+            protocol = f.get('protocol') or determine_protocol(f)
+            proto_preference = 0 if protocol in ['http', 'https'] else (-0.5 if protocol == 'rtsp' else -0.1)
  
              if f.get('vcodec') == 'none':  # audio only
                  preference -= 50
@@ -965,7 +1054,7 @@ class InfoExtractor(object):
  
      def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None,
                               transform_source=lambda s: fix_xml_ampersands(s).strip(),
-                             fatal=True):
+                             fatal=True, m3u8_id=None):
          manifest = self._download_xml(
              manifest_url, video_id, 'Downloading f4m manifest',
              'Unable to download f4m manifest',
@@ -979,11 +1068,18 @@ class InfoExtractor(object):
  
          return self._parse_f4m_formats(
              manifest, manifest_url, video_id, preference=preference, f4m_id=f4m_id,
-            transform_source=transform_source, fatal=fatal)
+            transform_source=transform_source, fatal=fatal, m3u8_id=m3u8_id)
  
      def _parse_f4m_formats(self, manifest, manifest_url, video_id, preference=None, f4m_id=None,
                             transform_source=lambda s: fix_xml_ampersands(s).strip(),
-                           fatal=True):
+                           fatal=True, m3u8_id=None):
+        # currently youtube-dl cannot decode the playerVerificationChallenge as Akamai uses Adobe Alchemy
+        akamai_pv = manifest.find('{http://ns.adobe.com/f4m/1.0}pv-2.0')
+        if akamai_pv is not None and ';' in akamai_pv.text:
+            playerVerificationChallenge = akamai_pv.text.split(';')[0]
+            if playerVerificationChallenge.strip() != '':
+                return []
+
          formats = []
          manifest_version = '1.0'
          media_nodes = manifest.findall('{http://ns.adobe.com/f4m/1.0}media')
@@ -1000,9 +1096,33 @@ class InfoExtractor(object):
              'base URL', default=None)
          if base_url:
              base_url = base_url.strip()
+
+        bootstrap_info = xpath_element(
+            manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'],
+            'bootstrap info', default=None)
+
+        vcodec = None
+        mime_type = xpath_text(
+            manifest, ['{http://ns.adobe.com/f4m/1.0}mimeType', '{http://ns.adobe.com/f4m/2.0}mimeType'],
+            'base URL', default=None)
+        if mime_type and mime_type.startswith('audio/'):
+            vcodec = 'none'
+
          for i, media_el in enumerate(media_nodes):
-            if manifest_version == '2.0':
-                media_url = media_el.attrib.get('href') or media_el.attrib.get('url')
+            tbr = int_or_none(media_el.attrib.get('bitrate'))
+            width = int_or_none(media_el.attrib.get('width'))
+            height = int_or_none(media_el.attrib.get('height'))
+            format_id = '-'.join(filter(None, [f4m_id, compat_str(i if tbr is None else tbr)]))
+            # If <bootstrapInfo> is present, the specified f4m is a
+            # stream-level manifest, and only set-level manifests may refer to
+            # external resources.  See section 11.4 and section 4 of F4M spec
+            if bootstrap_info is None:
+                media_url = None
+                # @href is introduced in 2.0, see section 11.6 of F4M spec
+                if manifest_version == '2.0':
+                    media_url = media_el.attrib.get('href')
+                if media_url is None:
+                    media_url = media_el.attrib.get('url')
                  if not media_url:
                      continue
                  manifest_url = (
@@ -1012,42 +1132,59 @@ class InfoExtractor(object):
                  # since bitrates in parent manifest (this one) and media_url manifest
                  # may differ leading to inability to resolve the format by requested
                  # bitrate in f4m downloader
-                if determine_ext(manifest_url) == 'f4m':
-                    formats.extend(self._extract_f4m_formats(
+                ext = determine_ext(manifest_url)
+                if ext == 'f4m':
+                    f4m_formats = self._extract_f4m_formats(
                          manifest_url, video_id, preference=preference, f4m_id=f4m_id,
-                        transform_source=transform_source, fatal=fatal))
+                        transform_source=transform_source, fatal=fatal)
+                    # Sometimes stream-level manifest contains single media entry that
+                    # does not contain any quality metadata (e.g. http://matchtv.ru/#live-player).
+                    # At the same time parent's media entry in set-level manifest may
+                    # contain it. We will copy it from parent in such cases.
+                    if len(f4m_formats) == 1:
+                        f = f4m_formats[0]
+                        f.update({
+                            'tbr': f.get('tbr') or tbr,
+                            'width': f.get('width') or width,
+                            'height': f.get('height') or height,
+                            'format_id': f.get('format_id') if not tbr else format_id,
+                            'vcodec': vcodec,
+                        })
+                    formats.extend(f4m_formats)
+                    continue
+                elif ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        manifest_url, video_id, 'mp4', preference=preference,
+                        m3u8_id=m3u8_id, fatal=fatal))
                      continue
-            tbr = int_or_none(media_el.attrib.get('bitrate'))
              formats.append({
-                'format_id': '-'.join(filter(None, [f4m_id, compat_str(i if tbr is None else tbr)])),
+                'format_id': format_id,
                  'url': manifest_url,
-                'ext': 'flv',
+                'manifest_url': manifest_url,
+                'ext': 'flv' if bootstrap_info is not None else None,
                  'tbr': tbr,
-                'width': int_or_none(media_el.attrib.get('width')),
-                'height': int_or_none(media_el.attrib.get('height')),
+                'width': width,
+                'height': height,
+                'vcodec': vcodec,
                  'preference': preference,
              })
          return formats
  
-    def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
-                              entry_protocol='m3u8', preference=None,
-                              m3u8_id=None, note=None, errnote=None,
-                              fatal=True):
-
-        formats = [{
+    def _m3u8_meta_format(self, m3u8_url, ext=None, preference=None, m3u8_id=None):
+        return {
              'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])),
              'url': m3u8_url,
              'ext': ext,
              'protocol': 'm3u8',
-            'preference': preference - 1 if preference else -1,
+            'preference': preference - 100 if preference else -100,
              'resolution': 'multiple',
              'format_note': 'Quality selection URL',
-        }]
+        }
  
-        format_url = lambda u: (
-            u
-            if re.match(r'^https?://', u)
-            else compat_urlparse.urljoin(m3u8_url, u))
+    def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
+                              entry_protocol='m3u8', preference=None,
+                              m3u8_id=None, note=None, errnote=None,
+                              fatal=True, live=False):
  
          res = self._download_webpage_handle(
              m3u8_url, video_id,
@@ -1059,6 +1196,13 @@ class InfoExtractor(object):
          m3u8_doc, urlh = res
          m3u8_url = urlh.geturl()
  
+        formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
+
+        format_url = lambda u: (
+            u
+            if re.match(r'^https?://', u)
+            else compat_urlparse.urljoin(m3u8_url, u))
+
          # We should try extracting formats only from master playlists [1], i.e.
          # playlists that describe available qualities. On the other hand media
          # playlists [2] should be returned as is since they contain just the media
@@ -1080,73 +1224,80 @@ class InfoExtractor(object):
                  'protocol': entry_protocol,
                  'preference': preference,
              }]
-        last_info = None
-        last_media = None
-        kv_rex = re.compile(
-            r'(?P<key>[a-zA-Z_-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)')
+        last_info = {}
+        last_media = {}
          for line in m3u8_doc.splitlines():
              if line.startswith('#EXT-X-STREAM-INF:'):
-                last_info = {}
-                for m in kv_rex.finditer(line):
-                    v = m.group('val')
-                    if v.startswith('"'):
-                        v = v[1:-1]
-                    last_info[m.group('key')] = v
+                last_info = parse_m3u8_attributes(line)
              elif line.startswith('#EXT-X-MEDIA:'):
-                last_media = {}
-                for m in kv_rex.finditer(line):
-                    v = m.group('val')
-                    if v.startswith('"'):
-                        v = v[1:-1]
-                    last_media[m.group('key')] = v
+                media = parse_m3u8_attributes(line)
+                media_type = media.get('TYPE')
+                if media_type in ('VIDEO', 'AUDIO'):
+                    media_url = media.get('URI')
+                    if media_url:
+                        format_id = []
+                        for v in (media.get('GROUP-ID'), media.get('NAME')):
+                            if v:
+                                format_id.append(v)
+                        formats.append({
+                            'format_id': '-'.join(format_id),
+                            'url': format_url(media_url),
+                            'language': media.get('LANGUAGE'),
+                            'vcodec': 'none' if media_type == 'AUDIO' else None,
+                            'ext': ext,
+                            'protocol': entry_protocol,
+                            'preference': preference,
+                        })
+                    else:
+                        # When there is no URI in EXT-X-MEDIA let this tag's
+                        # data be used by regular URI lines below
+                        last_media = media
              elif line.startswith('#') or not line.strip():
                  continue
              else:
-                if last_info is None:
-                    formats.append({'url': format_url(line)})
-                    continue
-                tbr = int_or_none(last_info.get('BANDWIDTH'), scale=1000)
+                tbr = int_or_none(last_info.get('AVERAGE-BANDWIDTH') or last_info.get('BANDWIDTH'), scale=1000)
                  format_id = []
                  if m3u8_id:
                      format_id.append(m3u8_id)
-                last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') != 'SUBTITLES' else None
-                format_id.append(last_media_name if last_media_name else '%d' % (tbr if tbr else len(formats)))
+                # Despite specification does not mention NAME attribute for
+                # EXT-X-STREAM-INF it still sometimes may be present
+                stream_name = last_info.get('NAME') or last_media.get('NAME')
+                # Bandwidth of live streams may differ over time thus making
+                # format_id unpredictable. So it's better to keep provided
+                # format_id intact.
+                if not live:
+                    format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
+                manifest_url = format_url(line.strip())
                  f = {
                      'format_id': '-'.join(format_id),
-                    'url': format_url(line.strip()),
+                    'url': manifest_url,
+                    'manifest_url': manifest_url,
                      'tbr': tbr,
                      'ext': ext,
+                    'fps': float_or_none(last_info.get('FRAME-RATE')),
                      'protocol': entry_protocol,
                      'preference': preference,
                  }
                  resolution = last_info.get('RESOLUTION')
                  if resolution:
-                    width_str, height_str = resolution.split('x')
-                    f['width'] = int(width_str)
-                    f['height'] = int(height_str)
-                codecs = last_info.get('CODECS')
-                if codecs:
-                    vcodec, acodec = [None] * 2
-                    va_codecs = codecs.split(',')
-                    if len(va_codecs) == 1:
-                        # Audio only entries usually come with single codec and
-                        # no resolution. For more robustness we also check it to
-                        # be mp4 audio.
-                        if not resolution and va_codecs[0].startswith('mp4a'):
-                            vcodec, acodec = 'none', va_codecs[0]
-                        else:
-                            vcodec = va_codecs[0]
-                    else:
-                        vcodec, acodec = va_codecs[:2]
+                    mobj = re.search(r'(?P<width>\d+)[xX](?P<height>\d+)', resolution)
+                    if mobj:
+                        f['width'] = int(mobj.group('width'))
+                        f['height'] = int(mobj.group('height'))
+                # Unified Streaming Platform
+                mobj = re.search(
+                    r'audio.*?(?:%3D|=)(\d+)(?:-video.*?(?:%3D|=)(\d+))?', f['url'])
+                if mobj:
+                    abr, vbr = mobj.groups()
+                    abr, vbr = float_or_none(abr, 1000), float_or_none(vbr, 1000)
                      f.update({
-                        'acodec': acodec,
-                        'vcodec': vcodec,
+                        'vbr': vbr,
+                        'abr': abr,
                      })
-                if last_media is not None:
-                    f['m3u8_media'] = last_media
-                    last_media = None
+                f.update(parse_codecs(last_info.get('CODECS')))
                  formats.append(f)
                  last_info = {}
+                last_media = {}
          return formats
  
      @staticmethod
@@ -1242,21 +1393,21 @@ class InfoExtractor(object):
          m3u8_count = 0
  
          srcs = []
-        videos = smil.findall(self._xpath_ns('.//video', namespace))
-        for video in videos:
-            src = video.get('src')
+        media = smil.findall(self._xpath_ns('.//video', namespace)) + smil.findall(self._xpath_ns('.//audio', namespace))
+        for medium in media:
+            src = medium.get('src')
              if not src or src in srcs:
                  continue
              srcs.append(src)
  
-            bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
-            filesize = int_or_none(video.get('size') or video.get('fileSize'))
-            width = int_or_none(video.get('width'))
-            height = int_or_none(video.get('height'))
-            proto = video.get('proto')
-            ext = video.get('ext')
+            bitrate = float_or_none(medium.get('system-bitrate') or medium.get('systemBitrate'), 1000)
+            filesize = int_or_none(medium.get('size') or medium.get('fileSize'))
+            width = int_or_none(medium.get('width'))
+            height = int_or_none(medium.get('height'))
+            proto = medium.get('proto')
+            ext = medium.get('ext')
              src_ext = determine_ext(src)
-            streamer = video.get('streamer') or base
+            streamer = medium.get('streamer') or base
  
              if proto == 'rtmp' or streamer.startswith('rtmp'):
                  rtmp_count += 1
@@ -1330,7 +1481,7 @@ class InfoExtractor(object):
              if not src or src in urls:
                  continue
              urls.append(src)
-            ext = textstream.get('ext') or determine_ext(src) or mimetype2ext(textstream.get('type'))
+            ext = textstream.get('ext') or mimetype2ext(textstream.get('type')) or determine_ext(src)
              lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName') or textstream.get('lang') or subtitles_lang
              subtitles.setdefault(lang, []).append({
                  'url': src,
@@ -1390,12 +1541,20 @@ class InfoExtractor(object):
          if res is False:
              return []
          mpd, urlh = res
-        mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
+        mpd_base_url = base_url(urlh.geturl())
  
          return self._parse_mpd_formats(
-            compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict)
+            compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
+            formats_dict=formats_dict, mpd_url=mpd_url)
  
-    def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}):
+    def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
+        """
+        Parse formats from MPD manifest.
+        References:
+         1. MPEG-DASH Standard, ISO/IEC 23009-1:2014(E),
+            http://standards.iso.org/ittf/PubliclyAvailableStandards/c065274_ISO_IEC_23009-1_2014.zip
+         2. https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP
+        """
          if mpd_doc.get('type') == 'dynamic':
              return []
  
@@ -1409,34 +1568,52 @@ class InfoExtractor(object):
  
          def extract_multisegment_info(element, ms_parent_info):
              ms_info = ms_parent_info.copy()
+
+            # As per [1, 5.3.9.2.2] SegmentList and SegmentTemplate share some
+            # common attributes and elements.  We will only extract relevant
+            # for us.
+            def extract_common(source):
+                segment_timeline = source.find(_add_ns('SegmentTimeline'))
+                if segment_timeline is not None:
+                    s_e = segment_timeline.findall(_add_ns('S'))
+                    if s_e:
+                        ms_info['total_number'] = 0
+                        ms_info['s'] = []
+                        for s in s_e:
+                            r = int(s.get('r', 0))
+                            ms_info['total_number'] += 1 + r
+                            ms_info['s'].append({
+                                't': int(s.get('t', 0)),
+                                # @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
+                                'd': int(s.attrib['d']),
+                                'r': r,
+                            })
+                start_number = source.get('startNumber')
+                if start_number:
+                    ms_info['start_number'] = int(start_number)
+                timescale = source.get('timescale')
+                if timescale:
+                    ms_info['timescale'] = int(timescale)
+                segment_duration = source.get('duration')
+                if segment_duration:
+                    ms_info['segment_duration'] = int(segment_duration)
+
+            def extract_Initialization(source):
+                initialization = source.find(_add_ns('Initialization'))
+                if initialization is not None:
+                    ms_info['initialization_url'] = initialization.attrib['sourceURL']
+
              segment_list = element.find(_add_ns('SegmentList'))
              if segment_list is not None:
+                extract_common(segment_list)
+                extract_Initialization(segment_list)
                  segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
                  if segment_urls_e:
                      ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
-                initialization = segment_list.find(_add_ns('Initialization'))
-                if initialization is not None:
-                    ms_info['initialization_url'] = initialization.attrib['sourceURL']
              else:
                  segment_template = element.find(_add_ns('SegmentTemplate'))
                  if segment_template is not None:
-                    start_number = segment_template.get('startNumber')
-                    if start_number:
-                        ms_info['start_number'] = int(start_number)
-                    segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
-                    if segment_timeline is not None:
-                        s_e = segment_timeline.findall(_add_ns('S'))
-                        if s_e:
-                            ms_info['total_number'] = 0
-                            for s in s_e:
-                                ms_info['total_number'] += 1 + int(s.get('r', '0'))
-                    else:
-                        timescale = segment_template.get('timescale')
-                        if timescale:
-                            ms_info['timescale'] = int(timescale)
-                        segment_duration = segment_template.get('duration')
-                        if segment_duration:
-                            ms_info['segment_duration'] = int(segment_duration)
+                    extract_common(segment_template)
                      media_template = segment_template.get('media')
                      if media_template:
                          ms_info['media_template'] = media_template
@@ -1444,11 +1621,14 @@ class InfoExtractor(object):
                      if initialization:
                          ms_info['initialization_url'] = initialization
                      else:
-                        initialization = segment_template.find(_add_ns('Initialization'))
-                        if initialization is not None:
-                            ms_info['initialization_url'] = initialization.attrib['sourceURL']
+                        extract_Initialization(segment_template)
              return ms_info
  
+        def combine_url(base_url, target_url):
+            if re.match(r'^https?://', target_url):
+                return target_url
+            return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
+
          mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
          formats = []
          for period in mpd_doc.findall(_add_ns('Period')):
@@ -1466,7 +1646,7 @@ class InfoExtractor(object):
                          continue
                      representation_attrib = adaptation_set.attrib.copy()
                      representation_attrib.update(representation.attrib)
-                    # According to page 41 of ISO/IEC 29001-1:2014, @mimeType is mandatory
+                    # According to [1, 5.3.7.2, Table 9, page 41], @mimeType is mandatory
                      mime_type = representation_attrib['mimeType']
                      content_type = mime_type.split('/')[0]
                      if content_type == 'text':
@@ -1491,6 +1671,7 @@ class InfoExtractor(object):
                          f = {
                              'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
                              'url': base_url,
+                            'manifest_url': mpd_url,
                              'ext': mimetype2ext(mime_type),
                              'width': int_or_none(representation_attrib.get('width')),
                              'height': int_or_none(representation_attrib.get('height')),
@@ -1505,26 +1686,88 @@ class InfoExtractor(object):
                          }
                          representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
                          if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
-                            if 'total_number' not in representation_ms_info and 'segment_duration':
-                                segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
-                                representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
+
                              media_template = representation_ms_info['media_template']
                              media_template = media_template.replace('$RepresentationID$', representation_id)
-                            media_template = re.sub(r'\$(Number|Bandwidth)(?:%(0\d+)d)?\$', r'%(\1)\2d', media_template)
+                            media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
+                            media_template = re.sub(r'\$(Number|Bandwidth|Time)%([^$]+)\$', r'%(\1)\2', media_template)
                              media_template.replace('$$', '$')
-                            representation_ms_info['segment_urls'] = [media_template % {'Number': segment_number, 'Bandwidth': representation_attrib.get('bandwidth')} for segment_number in range(representation_ms_info['start_number'], representation_ms_info['total_number'] + representation_ms_info['start_number'])]
-                        if 'segment_urls' in representation_ms_info:
+
+                            # As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
+                            # can't be used at the same time
+                            if '%(Number' in media_template and 's' not in representation_ms_info:
+                                segment_duration = None
+                                if 'total_number' not in representation_ms_info and 'segment_duration':
+                                    segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
+                                    representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
+                                representation_ms_info['fragments'] = [{
+                                    'url': media_template % {
+                                        'Number': segment_number,
+                                        'Bandwidth': int_or_none(representation_attrib.get('bandwidth')),
+                                    },
+                                    'duration': segment_duration,
+                                } for segment_number in range(
+                                    representation_ms_info['start_number'],
+                                    representation_ms_info['total_number'] + representation_ms_info['start_number'])]
+                            else:
+                                # $Number*$ or $Time$ in media template with S list available
+                                # Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
+                                # Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
+                                representation_ms_info['fragments'] = []
+                                segment_time = 0
+                                segment_d = None
+                                segment_number = representation_ms_info['start_number']
+
+                                def add_segment_url():
+                                    segment_url = media_template % {
+                                        'Time': segment_time,
+                                        'Bandwidth': int_or_none(representation_attrib.get('bandwidth')),
+                                        'Number': segment_number,
+                                    }
+                                    representation_ms_info['fragments'].append({
+                                        'url': segment_url,
+                                        'duration': float_or_none(segment_d, representation_ms_info['timescale']),
+                                    })
+
+                                for num, s in enumerate(representation_ms_info['s']):
+                                    segment_time = s.get('t') or segment_time
+                                    segment_d = s['d']
+                                    add_segment_url()
+                                    segment_number += 1
+                                    for r in range(s.get('r', 0)):
+                                        segment_time += segment_d
+                                        add_segment_url()
+                                        segment_number += 1
+                                    segment_time += segment_d
+                        elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
+                            # No media template
+                            # Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
+                            # or any YouTube dashsegments video
+                            fragments = []
+                            s_num = 0
+                            for segment_url in representation_ms_info['segment_urls']:
+                                s = representation_ms_info['s'][s_num]
+                                for r in range(s.get('r', 0) + 1):
+                                    fragments.append({
+                                        'url': segment_url,
+                                        'duration': float_or_none(s['d'], representation_ms_info['timescale']),
+                                    })
+                            representation_ms_info['fragments'] = fragments
+                        # NB: MPD manifest may contain direct URLs to unfragmented media.
+                        # No fragments key is present in this case.
+                        if 'fragments' in representation_ms_info:
                              f.update({
-                                'segment_urls': representation_ms_info['segment_urls'],
+                                'fragments': [],
                                  'protocol': 'http_dash_segments',
                              })
                              if 'initialization_url' in representation_ms_info:
                                  initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
-                                f.update({
-                                    'initialization_url': initialization_url,
-                                })
                                  if not f.get('url'):
                                      f['url'] = initialization_url
+                                f['fragments'].append({'url': initialization_url})
+                            f['fragments'].extend(representation_ms_info['fragments'])
+                            for fragment in f['fragments']:
+                                fragment['url'] = combine_url(base_url, fragment['url'])
                          try:
                              existing_format = next(
                                  fo for fo in formats
@@ -1539,6 +1782,239 @@ class InfoExtractor(object):
                          self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
          return formats
  
+    def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True):
+        res = self._download_webpage_handle(
+            ism_url, video_id,
+            note=note or 'Downloading ISM manifest',
+            errnote=errnote or 'Failed to download ISM manifest',
+            fatal=fatal)
+        if res is False:
+            return []
+        ism, urlh = res
+
+        return self._parse_ism_formats(
+            compat_etree_fromstring(ism.encode('utf-8')), urlh.geturl(), ism_id)
+
+    def _parse_ism_formats(self, ism_doc, ism_url, ism_id=None):
+        if ism_doc.get('IsLive') == 'TRUE' or ism_doc.find('Protection') is not None:
+            return []
+
+        duration = int(ism_doc.attrib['Duration'])
+        timescale = int_or_none(ism_doc.get('TimeScale')) or 10000000
+
+        formats = []
+        for stream in ism_doc.findall('StreamIndex'):
+            stream_type = stream.get('Type')
+            if stream_type not in ('video', 'audio'):
+                continue
+            url_pattern = stream.attrib['Url']
+            stream_timescale = int_or_none(stream.get('TimeScale')) or timescale
+            stream_name = stream.get('Name')
+            for track in stream.findall('QualityLevel'):
+                fourcc = track.get('FourCC')
+                # TODO: add support for WVC1 and WMAP
+                if fourcc not in ('H264', 'AVC1', 'AACL'):
+                    self.report_warning('%s is not a supported codec' % fourcc)
+                    continue
+                tbr = int(track.attrib['Bitrate']) // 1000
+                width = int_or_none(track.get('MaxWidth'))
+                height = int_or_none(track.get('MaxHeight'))
+                sampling_rate = int_or_none(track.get('SamplingRate'))
+
+                track_url_pattern = re.sub(r'{[Bb]itrate}', track.attrib['Bitrate'], url_pattern)
+                track_url_pattern = compat_urlparse.urljoin(ism_url, track_url_pattern)
+
+                fragments = []
+                fragment_ctx = {
+                    'time': 0,
+                }
+                stream_fragments = stream.findall('c')
+                for stream_fragment_index, stream_fragment in enumerate(stream_fragments):
+                    fragment_ctx['time'] = int_or_none(stream_fragment.get('t')) or fragment_ctx['time']
+                    fragment_repeat = int_or_none(stream_fragment.get('r')) or 1
+                    fragment_ctx['duration'] = int_or_none(stream_fragment.get('d'))
+                    if not fragment_ctx['duration']:
+                        try:
+                            next_fragment_time = int(stream_fragment[stream_fragment_index + 1].attrib['t'])
+                        except IndexError:
+                            next_fragment_time = duration
+                        fragment_ctx['duration'] = (next_fragment_time - fragment_ctx['time']) / fragment_repeat
+                    for _ in range(fragment_repeat):
+                        fragments.append({
+                            'url': re.sub(r'{start[ _]time}', compat_str(fragment_ctx['time']), track_url_pattern),
+                            'duration': fragment_ctx['duration'] / stream_timescale,
+                        })
+                        fragment_ctx['time'] += fragment_ctx['duration']
+
+                format_id = []
+                if ism_id:
+                    format_id.append(ism_id)
+                if stream_name:
+                    format_id.append(stream_name)
+                format_id.append(compat_str(tbr))
+
+                formats.append({
+                    'format_id': '-'.join(format_id),
+                    'url': ism_url,
+                    'manifest_url': ism_url,
+                    'ext': 'ismv' if stream_type == 'video' else 'isma',
+                    'width': width,
+                    'height': height,
+                    'tbr': tbr,
+                    'asr': sampling_rate,
+                    'vcodec': 'none' if stream_type == 'audio' else fourcc,
+                    'acodec': 'none' if stream_type == 'video' else fourcc,
+                    'protocol': 'ism',
+                    'fragments': fragments,
+                    '_download_params': {
+                        'duration': duration,
+                        'timescale': stream_timescale,
+                        'width': width or 0,
+                        'height': height or 0,
+                        'fourcc': fourcc,
+                        'codec_private_data': track.get('CodecPrivateData'),
+                        'sampling_rate': sampling_rate,
+                        'channels': int_or_none(track.get('Channels', 2)),
+                        'bits_per_sample': int_or_none(track.get('BitsPerSample', 16)),
+                        'nal_unit_length_field': int_or_none(track.get('NALUnitLengthField', 4)),
+                    },
+                })
+        return formats
+
+    def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8'):
+        def absolute_url(video_url):
+            return compat_urlparse.urljoin(base_url, video_url)
+
+        def parse_content_type(content_type):
+            if not content_type:
+                return {}
+            ctr = re.search(r'(?P<mimetype>[^/]+/[^;]+)(?:;\s*codecs="?(?P<codecs>[^"]+))?', content_type)
+            if ctr:
+                mimetype, codecs = ctr.groups()
+                f = parse_codecs(codecs)
+                f['ext'] = mimetype2ext(mimetype)
+                return f
+            return {}
+
+        def _media_formats(src, cur_media_type):
+            full_url = absolute_url(src)
+            if determine_ext(full_url) == 'm3u8':
+                is_plain_url = False
+                formats = self._extract_m3u8_formats(
+                    full_url, video_id, ext='mp4',
+                    entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
+            else:
+                is_plain_url = True
+                formats = [{
+                    'url': full_url,
+                    'vcodec': 'none' if cur_media_type == 'audio' else None,
+                }]
+            return is_plain_url, formats
+
+        entries = []
+        media_tags = [(media_tag, media_type, '')
+                      for media_tag, media_type
+                      in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
+        media_tags.extend(re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage))
+        for media_tag, media_type, media_content in media_tags:
+            media_info = {
+                'formats': [],
+                'subtitles': {},
+            }
+            media_attributes = extract_attributes(media_tag)
+            src = media_attributes.get('src')
+            if src:
+                _, formats = _media_formats(src, media_type)
+                media_info['formats'].extend(formats)
+            media_info['thumbnail'] = media_attributes.get('poster')
+            if media_content:
+                for source_tag in re.findall(r'<source[^>]+>', media_content):
+                    source_attributes = extract_attributes(source_tag)
+                    src = source_attributes.get('src')
+                    if not src:
+                        continue
+                    is_plain_url, formats = _media_formats(src, media_type)
+                    if is_plain_url:
+                        f = parse_content_type(source_attributes.get('type'))
+                        f.update(formats[0])
+                        media_info['formats'].append(f)
+                    else:
+                        media_info['formats'].extend(formats)
+                for track_tag in re.findall(r'<track[^>]+>', media_content):
+                    track_attributes = extract_attributes(track_tag)
+                    kind = track_attributes.get('kind')
+                    if not kind or kind in ('subtitles', 'captions'):
+                        src = track_attributes.get('src')
+                        if not src:
+                            continue
+                        lang = track_attributes.get('srclang') or track_attributes.get('lang') or track_attributes.get('label')
+                        media_info['subtitles'].setdefault(lang, []).append({
+                            'url': absolute_url(src),
+                        })
+            if media_info['formats'] or media_info['subtitles']:
+                entries.append(media_info)
+        return entries
+
+    def _extract_akamai_formats(self, manifest_url, video_id):
+        formats = []
+        hdcore_sign = 'hdcore=3.7.0'
+        f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
+        if 'hdcore=' not in f4m_url:
+            f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
+        f4m_formats = self._extract_f4m_formats(
+            f4m_url, video_id, f4m_id='hds', fatal=False)
+        for entry in f4m_formats:
+            entry.update({'extra_param_to_segment_url': hdcore_sign})
+        formats.extend(f4m_formats)
+        m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
+        formats.extend(self._extract_m3u8_formats(
+            m3u8_url, video_id, 'mp4', 'm3u8_native',
+            m3u8_id='hls', fatal=False))
+        return formats
+
+    def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
+        url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url)
+        url_base = self._search_regex(r'(?:https?|rtmp|rtsp)(://[^?]+)', url, 'format url')
+        http_base_url = 'http' + url_base
+        formats = []
+        if 'm3u8' not in skip_protocols:
+            formats.extend(self._extract_m3u8_formats(
+                http_base_url + '/playlist.m3u8', video_id, 'mp4',
+                m3u8_entry_protocol, m3u8_id='hls', fatal=False))
+        if 'f4m' not in skip_protocols:
+            formats.extend(self._extract_f4m_formats(
+                http_base_url + '/manifest.f4m',
+                video_id, f4m_id='hds', fatal=False))
+        if 'dash' not in skip_protocols:
+            formats.extend(self._extract_mpd_formats(
+                http_base_url + '/manifest.mpd',
+                video_id, mpd_id='dash', fatal=False))
+        if re.search(r'(?:/smil:|\.smil)', url_base):
+            if 'smil' not in skip_protocols:
+                rtmp_formats = self._extract_smil_formats(
+                    http_base_url + '/jwplayer.smil',
+                    video_id, fatal=False)
+                for rtmp_format in rtmp_formats:
+                    rtsp_format = rtmp_format.copy()
+                    rtsp_format['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
+                    del rtsp_format['play_path']
+                    del rtsp_format['ext']
+                    rtsp_format.update({
+                        'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
+                        'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
+                        'protocol': 'rtsp',
+                    })
+                    formats.extend([rtmp_format, rtsp_format])
+        else:
+            for protocol in ('rtmp', 'rtsp'):
+                if protocol not in skip_protocols:
+                    formats.append({
+                        'url': protocol + url_base,
+                        'format_id': protocol,
+                        'protocol': protocol,
+                    })
+        return formats
+
      def _live_title(self, name):
          """ Generate the title for a live video """
          now = datetime.datetime.now()
@@ -1599,7 +2075,7 @@ class InfoExtractor(object):
  
          any_restricted = False
          for tc in self.get_testcases(include_onlymatching=False):
-            if 'playlist' in tc:
+            if tc.get('playlist', []):
                  tc = tc['playlist'][0]
              is_restricted = age_restricted(
                  tc.get('info_dict', {}).get('age_limit'), age_limit)
@@ -1652,6 +2128,19 @@ class InfoExtractor(object):
      def _mark_watched(self, *args, **kwargs):
          raise NotImplementedError('This method must be implemented by subclasses')
  
+    def geo_verification_headers(self):
+        headers = {}
+        geo_verification_proxy = self._downloader.params.get('geo_verification_proxy')
+        if geo_verification_proxy:
+            headers['Ytdl-request-proxy'] = geo_verification_proxy
+        return headers
+
+    def _generic_id(self, url):
+        return compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
+
+    def _generic_title(self, url):
+        return compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
+
  
  class SearchInfoExtractor(InfoExtractor):
      """
diff --git a/youtube_dl/extractor/commonprotocols.py b/youtube_dl/extractor/commonprotocols.py

index 5d130a170ed79454e05087cae24ab0d132448b32..d98331a4e400b23389bee55e4b648b31d464b8d6 100644 (file)
--- a/youtube_dl/extractor/commonprotocols.py
+++ b/youtube_dl/extractor/commonprotocols.py
@@ -1,13 +1,9 @@
  from __future__ import unicode_literals
  
-import os
-
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse_unquote,
      compat_urlparse,
  )
-from ..utils import url_basename
  
  
  class RtmpIE(InfoExtractor):
@@ -23,8 +19,8 @@ class RtmpIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
-        title = compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
+        video_id = self._generic_id(url)
+        title = self._generic_title(url)
          return {
              'id': video_id,
              'title': title,
@@ -34,3 +30,31 @@ class RtmpIE(InfoExtractor):
                  'format_id': compat_urlparse.urlparse(url).scheme,
              }],
          }
+
+
+class MmsIE(InfoExtractor):
+    IE_DESC = False  # Do not list
+    _VALID_URL = r'(?i)mms://.+'
+
+    _TEST = {
+        # Direct MMS link
+        'url': 'mms://kentro.kaist.ac.kr/200907/MilesReid(0709).wmv',
+        'info_dict': {
+            'id': 'MilesReid(0709)',
+            'ext': 'wmv',
+            'title': 'MilesReid(0709)',
+        },
+        'params': {
+            'skip_download': True,  # rtsp downloads, requiring mplayer or mpv
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._generic_id(url)
+        title = self._generic_title(url)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'url': url,
+        }
diff --git a/youtube_dl/extractor/condenast.py b/youtube_dl/extractor/condenast.py

index e8f2b5a07591410c16fe6fe096678a12006abe48..8d8f605980bdf531c49a0c5d5067223d1f41dc4d 100644 (file)
--- a/youtube_dl/extractor/condenast.py
+++ b/youtube_dl/extractor/condenast.py
@@ -5,13 +5,17 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse_urlencode,
      compat_urllib_parse_urlparse,
      compat_urlparse,
  )
  from ..utils import (
      orderedSet,
      remove_end,
+    extract_attributes,
+    mimetype2ext,
+    determine_ext,
+    int_or_none,
+    parse_iso8601,
  )
  
  
@@ -58,6 +62,9 @@ class CondeNastIE(InfoExtractor):
              'ext': 'mp4',
              'title': '3D Printed Speakers Lit With LED',
              'description': 'Check out these beautiful 3D printed LED speakers.  You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.',
+            'uploader': 'wired',
+            'upload_date': '20130314',
+            'timestamp': 1363219200,
          }
      }, {
          # JS embed
@@ -67,70 +74,93 @@ class CondeNastIE(InfoExtractor):
              'id': '55f9cf8b61646d1acf00000c',
              'ext': 'mp4',
              'title': '3D printed TSA Travel Sentry keys really do open TSA locks',
+            'uploader': 'arstechnica',
+            'upload_date': '20150916',
+            'timestamp': 1442434955,
          }
      }]
  
      def _extract_series(self, url, webpage):
-        title = self._html_search_regex(r'<div class="cne-series-info">.*?<h1>(.+?)</h1>',
-                                        webpage, 'series title', flags=re.DOTALL)
+        title = self._html_search_regex(
+            r'(?s)<div class="cne-series-info">.*?<h1>(.+?)</h1>',
+            webpage, 'series title')
          url_object = compat_urllib_parse_urlparse(url)
          base_url = '%s://%s' % (url_object.scheme, url_object.netloc)
-        m_paths = re.finditer(r'<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]',
-                              webpage, flags=re.DOTALL)
+        m_paths = re.finditer(
+            r'(?s)<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]', webpage)
          paths = orderedSet(m.group(1) for m in m_paths)
          build_url = lambda path: compat_urlparse.urljoin(base_url, path)
          entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
          return self.playlist_result(entries, playlist_title=title)
  
      def _extract_video(self, webpage, url_type):
-        if url_type != 'embed':
-            description = self._html_search_regex(
-                [
-                    r'<div class="cne-video-description">(.+?)</div>',
-                    r'<div class="video-post-content">(.+?)</div>',
-                ],
-                webpage, 'description', fatal=False, flags=re.DOTALL)
+        query = {}
+        params = self._search_regex(
+            r'(?s)var params = {(.+?)}[;,]', webpage, 'player params', default=None)
+        if params:
+            query.update({
+                'videoId': self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id'),
+                'playerId': self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id'),
+                'target': self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target'),
+            })
          else:
-            description = None
-        params = self._search_regex(r'var params = {(.+?)}[;,]', webpage,
-                                    'player params', flags=re.DOTALL)
-        video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id')
-        player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id')
-        target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target')
-        data = compat_urllib_parse_urlencode({'videoId': video_id,
-                                              'playerId': player_id,
-                                              'target': target,
-                                              })
-        base_info_url = self._search_regex(r'url = [\'"](.+?)[\'"][,;]',
-                                           webpage, 'base info url',
-                                           default='http://player.cnevids.com/player/loader.js?')
-        info_url = base_info_url + data
-        info_page = self._download_webpage(info_url, video_id,
-                                           'Downloading video info')
-        video_info = self._search_regex(r'var\s+video\s*=\s*({.+?});', info_page, 'video info')
-        video_info = self._parse_json(video_info, video_id)
-
-        formats = [{
-            'format_id': '%s-%s' % (fdata['type'].split('/')[-1], fdata['quality']),
-            'url': fdata['src'],
-            'ext': fdata['type'].split('/')[-1],
-            'quality': 1 if fdata['quality'] == 'high' else 0,
-        } for fdata in video_info['sources'][0]]
+            params = extract_attributes(self._search_regex(
+                r'(<[^>]+data-js="video-player"[^>]+>)',
+                webpage, 'player params element'))
+            query.update({
+                'videoId': params['data-video'],
+                'playerId': params['data-player'],
+                'target': params['id'],
+            })
+        video_id = query['videoId']
+        video_info = None
+        info_page = self._download_webpage(
+            'http://player.cnevids.com/player/video.js',
+            video_id, 'Downloading video info', query=query, fatal=False)
+        if info_page:
+            video_info = self._parse_json(self._search_regex(
+                r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
+        else:
+            info_page = self._download_webpage(
+                'http://player.cnevids.com/player/loader.js',
+                video_id, 'Downloading loader info', query=query)
+            video_info = self._parse_json(self._search_regex(
+                r'var\s+video\s*=\s*({.+?});', info_page, 'video info'), video_id)
+        title = video_info['title']
+
+        formats = []
+        for fdata in video_info.get('sources', [{}])[0]:
+            src = fdata.get('src')
+            if not src:
+                continue
+            ext = mimetype2ext(fdata.get('type')) or determine_ext(src)
+            quality = fdata.get('quality')
+            formats.append({
+                'format_id': ext + ('-%s' % quality if quality else ''),
+                'url': src,
+                'ext': ext,
+                'quality': 1 if quality == 'high' else 0,
+            })
          self._sort_formats(formats)
  
-        return {
+        info = self._search_json_ld(
+            webpage, video_id, fatal=False) if url_type != 'embed' else {}
+        info.update({
              'id': video_id,
              'formats': formats,
-            'title': video_info['title'],
-            'thumbnail': video_info['poster_frame'],
-            'description': description,
-        }
+            'title': title,
+            'thumbnail': video_info.get('poster_frame'),
+            'uploader': video_info.get('brand'),
+            'duration': int_or_none(video_info.get('duration')),
+            'tags': video_info.get('tags'),
+            'series': video_info.get('series_title'),
+            'season': video_info.get('season_title'),
+            'timestamp': parse_iso8601(video_info.get('premiere_date')),
+        })
+        return info
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        site = mobj.group('site')
-        url_type = mobj.group('type')
-        item_id = mobj.group('id')
+        site, url_type, item_id = re.match(self._VALID_URL, url).groups()
  
          # Convert JS embed to regular embed
          if url_type == 'embedjs':
diff --git a/youtube_dl/extractor/coub.py b/youtube_dl/extractor/coub.py

new file mode 100644 (file)

index 0000000..a901b8d
--- /dev/null
+++ b/youtube_dl/extractor/coub.py
@@ -0,0 +1,143 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    float_or_none,
+    int_or_none,
+    parse_iso8601,
+    qualities,
+)
+
+
+class CoubIE(InfoExtractor):
+    _VALID_URL = r'(?:coub:|https?://(?:coub\.com/(?:view|embed|coubs)/|c-cdn\.coub\.com/fb-player\.swf\?.*\bcoub(?:ID|id)=))(?P<id>[\da-z]+)'
+
+    _TESTS = [{
+        'url': 'http://coub.com/view/5u5n1',
+        'info_dict': {
+            'id': '5u5n1',
+            'ext': 'mp4',
+            'title': 'The Matrix Moonwalk',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 4.6,
+            'timestamp': 1428527772,
+            'upload_date': '20150408',
+            'uploader': 'Артём Лоскутников',
+            'uploader_id': 'artyom.loskutnikov',
+            'view_count': int,
+            'like_count': int,
+            'repost_count': int,
+            'comment_count': int,
+            'age_limit': 0,
+        },
+    }, {
+        'url': 'http://c-cdn.coub.com/fb-player.swf?bot_type=vk&coubID=7w5a4',
+        'only_matching': True,
+    }, {
+        'url': 'coub:5u5n1',
+        'only_matching': True,
+    }, {
+        # longer video id
+        'url': 'http://coub.com/view/237d5l5h',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        coub = self._download_json(
+            'http://coub.com/api/v2/coubs/%s.json' % video_id, video_id)
+
+        if coub.get('error'):
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, coub['error']), expected=True)
+
+        title = coub['title']
+
+        file_versions = coub['file_versions']
+
+        QUALITIES = ('low', 'med', 'high')
+
+        MOBILE = 'mobile'
+        IPHONE = 'iphone'
+        HTML5 = 'html5'
+
+        SOURCE_PREFERENCE = (MOBILE, IPHONE, HTML5)
+
+        quality_key = qualities(QUALITIES)
+        preference_key = qualities(SOURCE_PREFERENCE)
+
+        formats = []
+
+        for kind, items in file_versions.get(HTML5, {}).items():
+            if kind not in ('video', 'audio'):
+                continue
+            if not isinstance(items, dict):
+                continue
+            for quality, item in items.items():
+                if not isinstance(item, dict):
+                    continue
+                item_url = item.get('url')
+                if not item_url:
+                    continue
+                formats.append({
+                    'url': item_url,
+                    'format_id': '%s-%s-%s' % (HTML5, kind, quality),
+                    'filesize': int_or_none(item.get('size')),
+                    'vcodec': 'none' if kind == 'audio' else None,
+                    'quality': quality_key(quality),
+                    'preference': preference_key(HTML5),
+                })
+
+        iphone_url = file_versions.get(IPHONE, {}).get('url')
+        if iphone_url:
+            formats.append({
+                'url': iphone_url,
+                'format_id': IPHONE,
+                'preference': preference_key(IPHONE),
+            })
+
+        mobile_url = file_versions.get(MOBILE, {}).get('audio_url')
+        if mobile_url:
+            formats.append({
+                'url': mobile_url,
+                'format_id': '%s-audio' % MOBILE,
+                'preference': preference_key(MOBILE),
+            })
+
+        self._sort_formats(formats)
+
+        thumbnail = coub.get('picture')
+        duration = float_or_none(coub.get('duration'))
+        timestamp = parse_iso8601(coub.get('published_at') or coub.get('created_at'))
+        uploader = coub.get('channel', {}).get('title')
+        uploader_id = coub.get('channel', {}).get('permalink')
+
+        view_count = int_or_none(coub.get('views_count') or coub.get('views_increase_count'))
+        like_count = int_or_none(coub.get('likes_count'))
+        repost_count = int_or_none(coub.get('recoubs_count'))
+        comment_count = int_or_none(coub.get('comments_count'))
+
+        age_restricted = coub.get('age_restricted', coub.get('age_restricted_by_admin'))
+        if age_restricted is not None:
+            age_limit = 18 if age_restricted is True else 0
+        else:
+            age_limit = None
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'timestamp': timestamp,
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'view_count': view_count,
+            'like_count': like_count,
+            'repost_count': repost_count,
+            'comment_count': comment_count,
+            'age_limit': age_limit,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/crackle.py b/youtube_dl/extractor/crackle.py

index 79238cce7a22d040e70c977219a86438bea4dfc7..cc68f1c0082674eaf850c2a0c1e3d6ae0f670d74 100644 (file)
--- a/youtube_dl/extractor/crackle.py
+++ b/youtube_dl/extractor/crackle.py
@@ -1,5 +1,5 @@
  # coding: utf-8
-from __future__ import unicode_literals
+from __future__ import unicode_literals, division
  
  from .common import InfoExtractor
  from ..utils import int_or_none
@@ -8,12 +8,22 @@ from ..utils import int_or_none
  class CrackleIE(InfoExtractor):
      _VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
      _TEST = {
-        'url': 'http://www.crackle.com/the-art-of-more/2496419',
+        'url': 'http://www.crackle.com/comedians-in-cars-getting-coffee/2498934',
          'info_dict': {
-            'id': '2496419',
+            'id': '2498934',
              'ext': 'mp4',
-            'title': 'Heavy Lies the Head',
-            'description': 'md5:bb56aa0708fe7b9a4861535f15c3abca',
+            'title': 'Everybody Respects A Bloody Nose',
+            'description': 'Jerry is kaffeeklatsching in L.A. with funnyman J.B. Smoove (Saturday Night Live, Real Husbands of Hollywood). They’re headed for brew at 10 Speed Coffee in a 1964 Studebaker Avanti.',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 906,
+            'series': 'Comedians In Cars Getting Coffee',
+            'season_number': 8,
+            'episode_number': 4,
+            'subtitles': {
+                'en-US': [{
+                    'ext': 'ttml',
+                }]
+            },
          },
          'params': {
              # m3u8 download
@@ -21,12 +31,8 @@ class CrackleIE(InfoExtractor):
          }
      }
  
-    # extracted from http://legacyweb-us.crackle.com/flash/QueryReferrer.ashx
-    _SUBTITLE_SERVER = 'http://web-us-az.crackle.com'
-    _UPLYNK_OWNER_ID = 'e8773f7770a44dbd886eee4fca16a66b'
-    _THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
-
      # extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
+    _THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
      _MEDIA_FILE_SLOTS = {
          'c544.flv': {
              'width': 544,
@@ -48,16 +54,21 @@ class CrackleIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+
+        config_doc = self._download_xml(
+            'http://legacyweb-us.crackle.com/flash/QueryReferrer.ashx?site=16',
+            video_id, 'Downloading config')
+
          item = self._download_xml(
              'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
              video_id).find('i')
          title = item.attrib['t']
  
-        thumbnail = None
          subtitles = {}
          formats = self._extract_m3u8_formats(
-            'http://content.uplynk.com/ext/%s/%s.m3u8' % (self._UPLYNK_OWNER_ID, video_id),
+            'http://content.uplynk.com/ext/%s/%s.m3u8' % (config_doc.attrib['strUplynkOwnerId'], video_id),
              video_id, 'mp4', m3u8_id='hls', fatal=None)
+        thumbnail = None
          path = item.attrib.get('p')
          if path:
              thumbnail = self._THUMBNAIL_TEMPLATE % path
@@ -76,7 +87,7 @@ class CrackleIE(InfoExtractor):
                      if locale not in subtitles:
                          subtitles[locale] = []
                      subtitles[locale] = [{
-                        'url': '%s/%s%s_%s.xml' % (self._SUBTITLE_SERVER, path, locale, v),
+                        'url': '%s/%s%s_%s.xml' % (config_doc.attrib['strSubtitleServer'], path, locale, v),
                          'ext': 'ttml',
                      }]
          self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
@@ -85,7 +96,7 @@ class CrackleIE(InfoExtractor):
              'id': video_id,
              'title': title,
              'description': item.attrib.get('d'),
-            'duration': int(item.attrib.get('r'), 16) if item.attrib.get('r') else None,
+            'duration': int(item.attrib.get('r'), 16) / 1000 if item.attrib.get('r') else None,
              'series': item.attrib.get('sn'),
              'season_number': int_or_none(item.attrib.get('se')),
              'episode_number': int_or_none(item.attrib.get('ep')),
diff --git a/youtube_dl/extractor/criterion.py b/youtube_dl/extractor/criterion.py

index dedb810a092618a090641dfaf582939efabf3fc0..cf6a5d6cbe906443b1db592616cd89926860bbdd 100644 (file)
--- a/youtube_dl/extractor/criterion.py
+++ b/youtube_dl/extractor/criterion.py
@@ -1,13 +1,11 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  
  
  class CriterionIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.criterion\.com/films/(?P<id>[0-9]+)-.+'
+    _VALID_URL = r'https?://(?:www\.)?criterion\.com/films/(?P<id>[0-9]+)-.+'
      _TEST = {
          'url': 'http://www.criterion.com/films/184-le-samourai',
          'md5': 'bc51beba55685509883a9a7830919ec3',
@@ -16,20 +14,20 @@ class CriterionIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Le Samouraï',
              'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
+            'thumbnail': 're:^https?://.*\.jpg$',
          }
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
          final_url = self._search_regex(
-            r'so.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
+            r'so\.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
          title = self._og_search_title(webpage)
          description = self._html_search_meta('description', webpage)
          thumbnail = self._search_regex(
-            r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
+            r'so\.addVariable\("thumbnailURL", "(.+?)"\)\;',
              webpage, 'thumbnail url')
  
          return {
diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py

index 44c720aaab59737b5d7749e0909f0e529f4bdf09..8d5b69f68d3ddb345dc67487db998cf164b2765c 100644 (file)
--- a/youtube_dl/extractor/crunchyroll.py
+++ b/youtube_dl/extractor/crunchyroll.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -11,7 +11,6 @@ from math import pow, sqrt, floor
  from .common import InfoExtractor
  from ..compat import (
      compat_etree_fromstring,
-    compat_urllib_parse_unquote,
      compat_urllib_parse_urlencode,
      compat_urllib_request,
      compat_urlparse,
@@ -27,6 +26,7 @@ from ..utils import (
      unified_strdate,
      urlencode_postdata,
      xpath_text,
+    extract_attributes,
  )
  from ..aes import (
      aes_cbc_decrypt,
@@ -34,22 +34,58 @@ from ..aes import (
  
  
  class CrunchyrollBaseIE(InfoExtractor):
+    _LOGIN_URL = 'https://www.crunchyroll.com/login'
+    _LOGIN_FORM = 'login_form'
      _NETRC_MACHINE = 'crunchyroll'
  
      def _login(self):
          (username, password) = self._get_login_info()
          if username is None:
              return
-        self.report_login()
-        login_url = 'https://www.crunchyroll.com/?a=formhandler'
-        data = urlencode_postdata({
-            'formname': 'RpcApiUser_Login',
-            'name': username,
-            'password': password,
+
+        login_page = self._download_webpage(
+            self._LOGIN_URL, None, 'Downloading login page')
+
+        def is_logged(webpage):
+            return '<title>Redirecting' in webpage
+
+        # Already logged in
+        if is_logged(login_page):
+            return
+
+        login_form_str = self._search_regex(
+            r'(?P<form><form[^>]+?id=(["\'])%s\2[^>]*>)' % self._LOGIN_FORM,
+            login_page, 'login form', group='form')
+
+        post_url = extract_attributes(login_form_str).get('action')
+        if not post_url:
+            post_url = self._LOGIN_URL
+        elif not post_url.startswith('http'):
+            post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
+
+        login_form = self._form_hidden_inputs(self._LOGIN_FORM, login_page)
+
+        login_form.update({
+            'login_form[name]': username,
+            'login_form[password]': password,
          })
-        login_request = sanitized_Request(login_url, data)
-        login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        self._download_webpage(login_request, None, False, 'Wrong login info')
+
+        response = self._download_webpage(
+            post_url, None, 'Logging in', 'Wrong login info',
+            data=urlencode_postdata(login_form),
+            headers={'Content-Type': 'application/x-www-form-urlencoded'})
+
+        # Successful login
+        if is_logged(response):
+            return
+
+        error = self._html_search_regex(
+            '(?s)<ul[^>]+class=["\']messages["\'][^>]*>(.+?)</ul>',
+            response, 'error message', default=None)
+        if error:
+            raise ExtractorError('Unable to login: %s' % error, expected=True)
+
+        raise ExtractorError('Unable to log in')
  
      def _real_initialize(self):
          self._login()
@@ -114,6 +150,22 @@ class CrunchyrollIE(CrunchyrollBaseIE):
              # rtmp
              'skip_download': True,
          },
+        'skip': 'Video gone',
+    }, {
+        'url': 'http://www.crunchyroll.com/rezero-starting-life-in-another-world-/episode-5-the-morning-of-our-promise-is-still-distant-702409',
+        'info_dict': {
+            'id': '702409',
+            'ext': 'mp4',
+            'title': 'Re:ZERO -Starting Life in Another World- Episode 5 – The Morning of Our Promise Is Still Distant',
+            'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'TV TOKYO',
+            'upload_date': '20160508',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
          'only_matching': True,
@@ -306,31 +358,48 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
              r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', webpage,
              'video_uploader', fatal=False)
  
-        playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
-        playerdata_req = sanitized_Request(playerdata_url)
-        playerdata_req.data = urlencode_postdata({'current_page': webpage_url})
-        playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
-
-        stream_id = self._search_regex(r'<media_id>([^<]+)', playerdata, 'stream_id')
-        video_thumbnail = self._search_regex(r'<episode_image_url>([^<]+)', playerdata, 'thumbnail', fatal=False)
-
+        available_fmts = []
+        for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
+            attrs = extract_attributes(a)
+            href = attrs.get('href')
+            if href and '/freetrial' in href:
+                continue
+            available_fmts.append(fmt)
+        if not available_fmts:
+            for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
+                available_fmts = re.findall(p, webpage)
+                if available_fmts:
+                    break
+        video_encode_ids = []
          formats = []
-        for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
+        for fmt in available_fmts:
              stream_quality, stream_format = self._FORMAT_IDS[fmt]
              video_format = fmt + 'p'
              streamdata_req = sanitized_Request(
                  'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
-                % (stream_id, stream_format, stream_quality),
+                % (video_id, stream_format, stream_quality),
                  compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8'))
              streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
              streamdata = self._download_xml(
                  streamdata_req, video_id,
                  note='Downloading media info for %s' % video_format)
              stream_info = streamdata.find('./{default}preload/stream_info')
+            video_encode_id = xpath_text(stream_info, './video_encode_id')
+            if video_encode_id in video_encode_ids:
+                continue
+            video_encode_ids.append(video_encode_id)
+
+            video_file = xpath_text(stream_info, './file')
+            if not video_file:
+                continue
+            if video_file.startswith('http'):
+                formats.extend(self._extract_m3u8_formats(
+                    video_file, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+                continue
+
              video_url = xpath_text(stream_info, './host')
-            video_play_path = xpath_text(stream_info, './file')
-            if not video_url or not video_play_path:
+            if not video_url:
                  continue
              metadata = stream_info.find('./metadata')
              format_info = {
@@ -345,7 +414,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
                  parsed_video_url = compat_urlparse.urlparse(video_url)
                  direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
                      netloc='v.lvlt.crcdn.net',
-                    path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_play_path.split(':')[-1])))
+                    path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
                  if self._is_valid_url(direct_video_url, video_id, video_format):
                      format_info.update({
                          'url': direct_video_url,
@@ -355,10 +424,18 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
  
              format_info.update({
                  'url': video_url,
-                'play_path': video_play_path,
+                'play_path': video_file,
                  'ext': 'flv',
              })
              formats.append(format_info)
+        self._sort_formats(formats)
+
+        metadata = self._download_xml(
+            'http://www.crunchyroll.com/xml', video_id,
+            note='Downloading media info', query={
+                'req': 'RpcApiVideoPlayer_GetMediaMetadata',
+                'media_id': video_id,
+            })
  
          subtitles = self.extract_subtitles(video_id, webpage)
  
@@ -366,9 +443,12 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
              'id': video_id,
              'title': video_title,
              'description': video_description,
-            'thumbnail': video_thumbnail,
+            'thumbnail': xpath_text(metadata, 'episode_image_url'),
              'uploader': video_uploader,
              'upload_date': video_upload_date,
+            'series': xpath_text(metadata, 'series_title'),
+            'episode': xpath_text(metadata, 'episode_title'),
+            'episode_number': int_or_none(xpath_text(metadata, 'episode_number')),
              'subtitles': subtitles,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/cspan.py b/youtube_dl/extractor/cspan.py

index 84b36f44cfac7bd45a8a7d28adb6767093a7d19b..7e5d4f2276385a363eade175dba78519cea515fe 100644 (file)
--- a/youtube_dl/extractor/cspan.py
+++ b/youtube_dl/extractor/cspan.py
@@ -51,8 +51,11 @@ class CSpanIE(InfoExtractor):
          'url': 'http://www.c-span.org/video/?104517-1/immigration-reforms-needed-protect-skilled-american-workers',
          'info_dict': {
              'id': 'judiciary031715',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Immigration Reforms Needed to Protect Skilled American Workers',
+        },
+        'params': {
+            'skip_download': True,  # m3u8 downloads
          }
      }]
  
diff --git a/youtube_dl/extractor/ctsnews.py b/youtube_dl/extractor/ctsnews.py

index 1622fc844a1b8d4794fc12694f03f37c00076f15..83ca90c3b68a66c8c612bd29cda89ae6d91f1478 100644 (file)
--- a/youtube_dl/extractor/ctsnews.py
+++ b/youtube_dl/extractor/ctsnews.py
@@ -1,13 +1,12 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import parse_iso8601, ExtractorError
+from ..utils import unified_timestamp
  
  
  class CtsNewsIE(InfoExtractor):
      IE_DESC = '華視新聞'
-    # https connection failed (Connection reset)
      _VALID_URL = r'https?://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
      _TESTS = [{
          'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html',
@@ -16,7 +15,7 @@ class CtsNewsIE(InfoExtractor):
              'id': '201501291578109',
              'ext': 'mp4',
              'title': '以色列.真主黨交火 3人死亡',
-            'description': 'md5:95e9b295c898b7ff294f09d450178d7d',
+            'description': '以色列和黎巴嫩真主黨，爆發五年最嚴重衝突，雙方砲轟交火，兩名以軍死亡，還有一名西班牙籍的聯合國維和人...',
              'timestamp': 1422528540,
              'upload_date': '20150129',
          }
@@ -28,7 +27,7 @@ class CtsNewsIE(InfoExtractor):
              'id': '201309031304098',
              'ext': 'mp4',
              'title': '韓國31歲童顏男 貌如十多歲小孩',
-            'description': 'md5:f183feeba3752b683827aab71adad584',
+            'description': '越有年紀的人，越希望看起來年輕一點，而南韓卻有一位31歲的男子，看起來像是11、12歲的小孩，身...',
              'thumbnail': 're:^https?://.*\.jpg$',
              'timestamp': 1378205880,
              'upload_date': '20130903',
@@ -36,8 +35,7 @@ class CtsNewsIE(InfoExtractor):
      }, {
          # With Youtube embedded video
          'url': 'http://news.cts.com.tw/cts/money/201501/201501291578003.html',
-        'md5': '1d842c771dc94c8c3bca5af2cc1db9c5',
-        'add_ie': ['Youtube'],
+        'md5': 'e4726b2ccd70ba2c319865e28f0a91d1',
          'info_dict': {
              'id': 'OVbfO7d0_hQ',
              'ext': 'mp4',
@@ -47,42 +45,37 @@ class CtsNewsIE(InfoExtractor):
              'upload_date': '20150128',
              'uploader_id': 'TBSCTS',
              'uploader': '中華電視公司',
-        }
+        },
+        'add_ie': ['Youtube'],
      }]
  
      def _real_extract(self, url):
          news_id = self._match_id(url)
          page = self._download_webpage(url, news_id)
  
-        if self._search_regex(r'(CTSPlayer2)', page, 'CTSPlayer2 identifier', default=None):
-            feed_url = self._html_search_regex(
-                r'(http://news\.cts\.com\.tw/action/mp4feed\.php\?news_id=\d+)',
-                page, 'feed url')
-            video_url = self._download_webpage(
-                feed_url, news_id, note='Fetching feed')
+        news_id = self._hidden_inputs(page).get('get_id')
+
+        if news_id:
+            mp4_feed = self._download_json(
+                'http://news.cts.com.tw/action/test_mp4feed.php',
+                news_id, note='Fetching feed', query={'news_id': news_id})
+            video_url = mp4_feed['source_url']
          else:
              self.to_screen('Not CTSPlayer video, trying Youtube...')
              youtube_url = self._search_regex(
-                r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url',
-                default=None)
-            if not youtube_url:
-                raise ExtractorError('The news includes no videos!', expected=True)
+                r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
  
-            return {
-                '_type': 'url',
-                'url': youtube_url,
-                'ie_key': 'Youtube',
-            }
+            return self.url_result(youtube_url, ie='Youtube')
  
          description = self._html_search_meta('description', page)
-        title = self._html_search_meta('title', page)
+        title = self._html_search_meta('title', page, fatal=True)
          thumbnail = self._html_search_meta('image', page)
  
          datetime_str = self._html_search_regex(
-            r'(\d{4}/\d{2}/\d{2} \d{2}:\d{2})', page, 'date and time')
-        # Transform into ISO 8601 format with timezone info
-        datetime_str = datetime_str.replace('/', '-') + ':00+0800'
-        timestamp = parse_iso8601(datetime_str, delimiter=' ')
+            r'(\d{4}/\d{2}/\d{2} \d{2}:\d{2})', page, 'date and time', fatal=False)
+        timestamp = None
+        if datetime_str:
+            timestamp = unified_timestamp(datetime_str) - 8 * 3600
  
          return {
              'id': news_id,
diff --git a/youtube_dl/extractor/ctvnews.py b/youtube_dl/extractor/ctvnews.py

new file mode 100644 (file)

index 0000000..1023b61
--- /dev/null
+++ b/youtube_dl/extractor/ctvnews.py
@@ -0,0 +1,65 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import orderedSet
+
+
+class CTVNewsIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)'
+    _TESTS = [{
+        'url': 'http://www.ctvnews.ca/video?clipId=901995',
+        'md5': '10deb320dc0ccb8d01d34d12fc2ea672',
+        'info_dict': {
+            'id': '901995',
+            'ext': 'mp4',
+            'title': 'Extended: \'That person cannot be me\' Johnson says',
+            'description': 'md5:958dd3b4f5bbbf0ed4d045c790d89285',
+            'timestamp': 1467286284,
+            'upload_date': '20160630',
+        }
+    }, {
+        'url': 'http://www.ctvnews.ca/video?playlistId=1.2966224',
+        'info_dict':
+        {
+            'id': '1.2966224',
+        },
+        'playlist_mincount': 19,
+    }, {
+        'url': 'http://www.ctvnews.ca/video?binId=1.2876780',
+        'info_dict':
+        {
+            'id': '1.2876780',
+        },
+        'playlist_mincount': 100,
+    }, {
+        'url': 'http://www.ctvnews.ca/1.810401',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ctvnews.ca/canadiens-send-p-k-subban-to-nashville-in-blockbuster-trade-1.2967231',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        page_id = self._match_id(url)
+
+        def ninecninemedia_url_result(clip_id):
+            return {
+                '_type': 'url_transparent',
+                'id': clip_id,
+                'url': '9c9media:ctvnews_web:%s' % clip_id,
+                'ie_key': 'NineCNineMedia',
+            }
+
+        if page_id.isdigit():
+            return ninecninemedia_url_result(page_id)
+        else:
+            webpage = self._download_webpage('http://www.ctvnews.ca/%s' % page_id, page_id, query={
+                'ot': 'example.AjaxPageLayout.ot',
+                'maxItemsPerPage': 1000000,
+            })
+            entries = [ninecninemedia_url_result(clip_id) for clip_id in orderedSet(
+                re.findall(r'clip\.id\s*=\s*(\d+);', webpage))]
+            return self.playlist_result(entries, page_id)
diff --git a/youtube_dl/extractor/cultureunplugged.py b/youtube_dl/extractor/cultureunplugged.py

index 9c764fe68c57314d8524b2705f8bae7c30520c26..9f26fa5878777d3302383646ad581056f429841a 100644 (file)
--- a/youtube_dl/extractor/cultureunplugged.py
+++ b/youtube_dl/extractor/cultureunplugged.py
@@ -1,9 +1,13 @@
  from __future__ import unicode_literals
  
  import re
+import time
  
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..utils import (
+    int_or_none,
+    HEADRequest,
+)
  
  
  class CultureUnpluggedIE(InfoExtractor):
@@ -32,6 +36,9 @@ class CultureUnpluggedIE(InfoExtractor):
          video_id = mobj.group('id')
          display_id = mobj.group('display_id') or video_id
  
+        # request setClientTimezone.php to get PHPSESSID cookie which is need to get valid json data in the next request
+        self._request_webpage(HEADRequest(
+            'http://www.cultureunplugged.com/setClientTimezone.php?timeOffset=%d' % -(time.timezone / 3600)), display_id)
          movie_data = self._download_json(
              'http://www.cultureunplugged.com/movie-data/cu-%s.json' % video_id, display_id)
  
diff --git a/youtube_dl/extractor/curiositystream.py b/youtube_dl/extractor/curiositystream.py

new file mode 100644 (file)

index 0000000..e3c9946
--- /dev/null
+++ b/youtube_dl/extractor/curiositystream.py
@@ -0,0 +1,120 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    urlencode_postdata,
+    compat_str,
+    ExtractorError,
+)
+
+
+class CuriosityStreamBaseIE(InfoExtractor):
+    _NETRC_MACHINE = 'curiositystream'
+    _auth_token = None
+    _API_BASE_URL = 'https://api.curiositystream.com/v1/'
+
+    def _handle_errors(self, result):
+        error = result.get('error', {}).get('message')
+        if error:
+            if isinstance(error, dict):
+                error = ', '.join(error.values())
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, error), expected=True)
+
+    def _call_api(self, path, video_id):
+        headers = {}
+        if self._auth_token:
+            headers['X-Auth-Token'] = self._auth_token
+        result = self._download_json(
+            self._API_BASE_URL + path, video_id, headers=headers)
+        self._handle_errors(result)
+        return result['data']
+
+    def _real_initialize(self):
+        (email, password) = self._get_login_info()
+        if email is None:
+            return
+        result = self._download_json(
+            self._API_BASE_URL + 'login', None, data=urlencode_postdata({
+                'email': email,
+                'password': password,
+            }))
+        self._handle_errors(result)
+        self._auth_token = result['message']['auth_token']
+
+    def _extract_media_info(self, media):
+        video_id = compat_str(media['id'])
+        limelight_media_id = media['limelight_media_id']
+        title = media['title']
+
+        subtitles = {}
+        for closed_caption in media.get('closed_captions', []):
+            sub_url = closed_caption.get('file')
+            if not sub_url:
+                continue
+            lang = closed_caption.get('code') or closed_caption.get('language') or 'en'
+            subtitles.setdefault(lang, []).append({
+                'url': sub_url,
+            })
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': 'limelight:media:' + limelight_media_id,
+            'title': title,
+            'description': media.get('description'),
+            'thumbnail': media.get('image_large') or media.get('image_medium') or media.get('image_small'),
+            'duration': int_or_none(media.get('duration')),
+            'tags': media.get('tags'),
+            'subtitles': subtitles,
+            'ie_key': 'LimelightMedia',
+        }
+
+
+class CuriosityStreamIE(CuriosityStreamBaseIE):
+    IE_NAME = 'curiositystream'
+    _VALID_URL = r'https?://app\.curiositystream\.com/video/(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://app.curiositystream.com/video/2',
+        'md5': 'a0074c190e6cddaf86900b28d3e9ee7a',
+        'info_dict': {
+            'id': '2',
+            'ext': 'mp4',
+            'title': 'How Did You Develop The Internet?',
+            'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
+            'timestamp': 1448388615,
+            'upload_date': '20151124',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        media = self._call_api('media/' + video_id, video_id)
+        return self._extract_media_info(media)
+
+
+class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
+    IE_NAME = 'curiositystream:collection'
+    _VALID_URL = r'https?://app\.curiositystream\.com/collection/(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://app.curiositystream.com/collection/2',
+        'info_dict': {
+            'id': '2',
+            'title': 'Curious Minds: The Internet',
+            'description': 'How is the internet shaping our lives in the 21st Century?',
+        },
+        'playlist_mincount': 17,
+    }
+
+    def _real_extract(self, url):
+        collection_id = self._match_id(url)
+        collection = self._call_api(
+            'collections/' + collection_id, collection_id)
+        entries = []
+        for media in collection.get('media', []):
+            entries.append(self._extract_media_info(media))
+        return self.playlist_result(
+            entries, collection_id,
+            collection.get('title'), collection.get('description'))
diff --git a/youtube_dl/extractor/cwtv.py b/youtube_dl/extractor/cwtv.py

index f5cefd9660829d1ab65ec789c208ba6938e0de5a..1ab9333b2b15c4ec96bcf1583aa6d0f491aafb76 100644 (file)
--- a/youtube_dl/extractor/cwtv.py
+++ b/youtube_dl/extractor/cwtv.py
@@ -9,7 +9,7 @@ from ..utils import (
  
  
  class CWTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/shows/(?:[^/]+/){2}\?play=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
+    _VALID_URL = r'https?://(?:www\.)?cw(?:tv(?:pr)?|seed)\.com/(?:shows/)?(?:[^/]+/)+[^?]*\?.*\b(?:play|watch)=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
      _TESTS = [{
          'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
          'info_dict': {
@@ -28,7 +28,8 @@ class CWTVIE(InfoExtractor):
          'params': {
              # m3u8 download
              'skip_download': True,
-        }
+        },
+        'skip': 'redirect to http://cwtv.com/shows/arrow/',
      }, {
          'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088',
          'info_dict': {
@@ -44,19 +45,43 @@ class CWTVIE(InfoExtractor):
              'upload_date': '20151006',
              'timestamp': 1444107300,
          },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        }
+    }, {
+        'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
+        'only_matching': True,
+    }, {
+        'url': 'http://cwtvpr.com/the-cw/video?watch=9eee3f60-ef4e-440b-b3b2-49428ac9c54e',
+        'only_matching': True,
+    }, {
+        'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?watch=6b15e985-9345-4f60-baf8-56e96be57c63',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        video_data = self._download_json(
-            'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/132?format=json' % video_id, video_id)
-
-        formats = self._extract_m3u8_formats(
-            video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
+        video_data = None
+        formats = []
+        for partner in (154, 213):
+            vdata = self._download_json(
+                'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/%d?format=json' % (video_id, partner), video_id, fatal=False)
+            if not vdata:
+                continue
+            video_data = vdata
+            for quality, quality_data in vdata.get('videos', {}).items():
+                quality_url = quality_data.get('uri')
+                if not quality_url:
+                    continue
+                if quality == 'variantplaylist':
+                    formats.extend(self._extract_m3u8_formats(
+                        quality_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+                else:
+                    tbr = int_or_none(quality_data.get('bitrate'))
+                    format_id = 'http' + ('-%d' % tbr if tbr else '')
+                    if self._is_valid_url(quality_url, video_id, format_id):
+                        formats.append({
+                            'format_id': format_id,
+                            'url': quality_url,
+                            'tbr': tbr,
+                        })
          self._sort_formats(formats)
  
          thumbnails = [{
diff --git a/youtube_dl/extractor/dailymail.py b/youtube_dl/extractor/dailymail.py

new file mode 100644 (file)

index 0000000..98c835b
--- /dev/null
+++ b/youtube_dl/extractor/dailymail.py
@@ -0,0 +1,62 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    determine_protocol,
+    unescapeHTML,
+)
+
+
+class DailyMailIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/video/[^/]+/video-(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.dailymail.co.uk/video/tvshowbiz/video-1295863/The-Mountain-appears-sparkling-water-ad-Heavy-Bubbles.html',
+        'md5': 'f6129624562251f628296c3a9ffde124',
+        'info_dict': {
+            'id': '1295863',
+            'ext': 'mp4',
+            'title': 'The Mountain appears in sparkling water ad for \'Heavy Bubbles\'',
+            'description': 'md5:a93d74b6da172dd5dc4d973e0b766a84',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        video_data = self._parse_json(self._search_regex(
+            r"data-opts='({.+?})'", webpage, 'video data'), video_id)
+        title = unescapeHTML(video_data['title'])
+        video_sources = self._download_json(video_data.get(
+            'sources', {}).get('url') or 'http://www.dailymail.co.uk/api/player/%s/video-sources.json' % video_id, video_id)
+
+        formats = []
+        for rendition in video_sources['renditions']:
+            rendition_url = rendition.get('url')
+            if not rendition_url:
+                continue
+            tbr = int_or_none(rendition.get('encodingRate'), 1000)
+            container = rendition.get('videoContainer')
+            is_hls = container == 'M2TS'
+            protocol = 'm3u8_native' if is_hls else determine_protocol({'url': rendition_url})
+            formats.append({
+                'format_id': ('hls' if is_hls else protocol) + ('-%d' % tbr if tbr else ''),
+                'url': rendition_url,
+                'width': int_or_none(rendition.get('frameWidth')),
+                'height': int_or_none(rendition.get('frameHeight')),
+                'tbr': tbr,
+                'vcodec': rendition.get('videoCodec'),
+                'container': container,
+                'protocol': protocol,
+                'ext': 'mp4' if is_hls else None,
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': unescapeHTML(video_data.get('descr')),
+            'thumbnail': video_data.get('poster') or video_data.get('thumbnail'),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py

index 2e6226ea0774af2e636cbc4b4a4ca9f1ecb763a3..4a3314ea7d4fc2df95543cda554d32a8caf586ac 100644 (file)
--- a/youtube_dl/extractor/dailymotion.py
+++ b/youtube_dl/extractor/dailymotion.py
@@ -16,6 +16,7 @@ from ..utils import (
      sanitized_Request,
      str_to_int,
      unescapeHTML,
+    mimetype2ext,
  )
  
  
@@ -93,7 +94,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
                  'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]',
                  'uploader': 'HotWaves1012',
                  'age_limit': 18,
-            }
+            },
+            'skip': 'video gone',
          },
          # geo-restricted, player v5
          {
@@ -111,6 +113,13 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
          }
      ]
  
+    @staticmethod
+    def _extract_urls(webpage):
+        # Look for embedded Dailymotion player
+        matches = re.findall(
+            r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
+        return list(map(lambda m: unescapeHTML(m[1]), matches))
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
@@ -136,7 +145,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
          player_v5 = self._search_regex(
              [r'buildPlayer\(({.+?})\);\n',  # See https://github.com/rg3/youtube-dl/issues/7826
               r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
-             r'buildPlayer\(({.+?})\);'],
+             r'buildPlayer\(({.+?})\);',
+             r'var\s+config\s*=\s*({.+?});'],
              webpage, 'player v5', default=None)
          if player_v5:
              player = self._parse_json(player_v5, video_id)
@@ -153,18 +163,19 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
                      type_ = media.get('type')
                      if type_ == 'application/vnd.lumberjack.manifest':
                          continue
-                    ext = determine_ext(media_url)
-                    if type_ == 'application/x-mpegURL' or ext == 'm3u8':
+                    ext = mimetype2ext(type_) or determine_ext(media_url)
+                    if ext == 'm3u8':
                          formats.extend(self._extract_m3u8_formats(
                              media_url, video_id, 'mp4', preference=-1,
                              m3u8_id='hls', fatal=False))
-                    elif type_ == 'application/f4m' or ext == 'f4m':
+                    elif ext == 'f4m':
                          formats.extend(self._extract_f4m_formats(
                              media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
                      else:
                          f = {
                              'url': media_url,
                              'format_id': 'http-%s' % quality,
+                            'ext': ext,
                          }
                          m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
                          if m:
@@ -322,7 +333,9 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
  
              for video_id in re.findall(r'data-xid="(.+?)"', webpage):
                  if video_id not in video_ids:
-                    yield self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
+                    yield self.url_result(
+                        'http://www.dailymotion.com/video/%s' % video_id,
+                        DailymotionIE.ie_key(), video_id)
                      video_ids.add(video_id)
  
              if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
@@ -383,7 +396,7 @@ class DailymotionUserIE(DailymotionPlaylistIE):
  
  
  class DailymotionCloudIE(DailymotionBaseInfoExtractor):
-    _VALID_URL_PREFIX = r'http://api\.dmcloud\.net/(?:player/)?embed/'
+    _VALID_URL_PREFIX = r'https?://api\.dmcloud\.net/(?:player/)?embed/'
      _VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX
      _VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX
  
diff --git a/youtube_dl/extractor/daum.py b/youtube_dl/extractor/daum.py

index 86024a745661dda2da9d3fb883ccf4db017a722c..732b4362a96488e67f4b1858f83429a85e877555 100644 (file)
--- a/youtube_dl/extractor/daum.py
+++ b/youtube_dl/extractor/daum.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  
  from __future__ import unicode_literals
  
@@ -66,22 +66,32 @@ class DaumIE(InfoExtractor):
              'view_count': int,
              'comment_count': int,
          },
+    }, {
+        # Requires dte_type=WEB (#9972)
+        'url': 'http://tvpot.daum.net/v/s3794Uf1NZeZ1qMpGpeqeRU',
+        'md5': 'a8917742069a4dd442516b86e7d66529',
+        'info_dict': {
+            'id': 's3794Uf1NZeZ1qMpGpeqeRU',
+            'ext': 'mp4',
+            'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny) [쇼! 음악중심] 508회 20160611',
+            'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\n\n[쇼! 음악중심] 20160611, 507회',
+            'upload_date': '20160611',
+        },
      }]
  
      def _real_extract(self, url):
          video_id = compat_urllib_parse_unquote(self._match_id(url))
-        query = compat_urllib_parse_urlencode({'vid': video_id})
          movie_data = self._download_json(
-            'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query,
-            video_id, 'Downloading video formats info')
+            'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json',
+            video_id, 'Downloading video formats info', query={'vid': video_id, 'dte_type': 'WEB'})
  
          # For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
          if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
              return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
  
          info = self._download_xml(
-            'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
-            'Downloading video info')
+            'http://tvpot.daum.net/clip/ClipInfoXml.do', video_id,
+            'Downloading video info', query={'vid': video_id})
  
          formats = []
          for format_el in movie_data['output_list']['output_list']:
diff --git a/youtube_dl/extractor/dbtv.py b/youtube_dl/extractor/dbtv.py

index 133cdc50b8c8379021d6d7fffb7c7c28dd2ba30d..6d880d43d6507077018f9489749947d83a36f64b 100644 (file)
--- a/youtube_dl/extractor/dbtv.py
+++ b/youtube_dl/extractor/dbtv.py
@@ -4,78 +4,53 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
-    float_or_none,
-    int_or_none,
-    clean_html,
-)
  
  
  class DBTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:(?:lazyplayer|player)/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
+    _VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
      _TESTS = [{
          'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
-        'md5': 'b89953ed25dacb6edb3ef6c6f430f8bc',
+        'md5': '2e24f67936517b143a234b4cadf792ec',
          'info_dict': {
-            'id': '33100',
+            'id': '3649835190001',
              'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
              'ext': 'mp4',
              'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
              'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
-            'thumbnail': 're:https?://.*\.jpg$',
-            'timestamp': 1404039863.438,
+            'thumbnail': 're:https?://.*\.jpg',
+            'timestamp': 1404039863,
              'upload_date': '20140629',
              'duration': 69.544,
-            'view_count': int,
-            'categories': list,
-        }
+            'uploader_id': '1027729757001',
+        },
+        'add_ie': ['BrightcoveNew']
      }, {
          'url': 'http://dbtv.no/3649835190001',
          'only_matching': True,
      }, {
          'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
          'only_matching': True,
+    }, {
+        'url': 'http://dbtv.no/vice/5000634109001',
+        'only_matching': True,
+    }, {
+        'url': 'http://dbtv.no/filmtrailer/3359293614001',
+        'only_matching': True,
      }]
  
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id') or video_id
+    @staticmethod
+    def _extract_urls(webpage):
+        return [url for _, url in re.findall(
+            r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dbtv\.no/(?:lazy)?player/\d+.*?)\1',
+            webpage)]
  
-        data = self._download_json(
-            'http://api.dbtv.no/discovery/%s' % video_id, display_id)
-
-        video = data['playlist'][0]
-
-        formats = [{
-            'url': f['URL'],
-            'vcodec': f.get('container'),
-            'width': int_or_none(f.get('width')),
-            'height': int_or_none(f.get('height')),
-            'vbr': float_or_none(f.get('rate'), 1000),
-            'filesize': int_or_none(f.get('size')),
-        } for f in video['renditions'] if 'URL' in f]
-
-        if not formats:
-            for url_key, format_id in [('URL', 'mp4'), ('HLSURL', 'hls')]:
-                if url_key in video:
-                    formats.append({
-                        'url': video[url_key],
-                        'format_id': format_id,
-                    })
-
-        self._sort_formats(formats)
+    def _real_extract(self, url):
+        video_id, display_id = re.match(self._VALID_URL, url).groups()
  
          return {
-            'id': compat_str(video['id']),
+            '_type': 'url_transparent',
+            'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
+            'id': video_id,
              'display_id': display_id,
-            'title': video['title'],
-            'description': clean_html(video['desc']),
-            'thumbnail': video.get('splash') or video.get('thumb'),
-            'timestamp': float_or_none(video.get('publishedAt'), 1000),
-            'duration': float_or_none(video.get('length'), 1000),
-            'view_count': int_or_none(video.get('views')),
-            'categories': video.get('tags'),
-            'formats': formats,
+            'ie_key': 'BrightcoveNew',
          }
diff --git a/youtube_dl/extractor/dcn.py b/youtube_dl/extractor/dcn.py

deleted file mode 100644 (file)

index 5deff5f..0000000
--- a/youtube_dl/extractor/dcn.py
+++ /dev/null
@@ -1,197 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-import base64
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlencode,
-    compat_str,
-)
-from ..utils import (
-    int_or_none,
-    parse_iso8601,
-    sanitized_Request,
-    smuggle_url,
-    unsmuggle_url,
-    urlencode_postdata,
-)
-
-
-class DCNIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
-
-    def _real_extract(self, url):
-        show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
-        if video_id and int(video_id) > 0:
-            return self.url_result(
-                'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo')
-        elif season_id and int(season_id) > 0:
-            return self.url_result(smuggle_url(
-                'http://www.dcndigital.ae/program/season/%s' % season_id,
-                {'show_id': show_id}), 'DCNSeason')
-        else:
-            return self.url_result(
-                'http://www.dcndigital.ae/program/%s' % show_id, 'DCNSeason')
-
-
-class DCNBaseIE(InfoExtractor):
-    def _extract_video_info(self, video_data, video_id, is_live):
-        title = video_data.get('title_en') or video_data['title_ar']
-        img = video_data.get('img')
-        thumbnail = 'http://admin.mangomolo.com/analytics/%s' % img if img else None
-        duration = int_or_none(video_data.get('duration'))
-        description = video_data.get('description_en') or video_data.get('description_ar')
-        timestamp = parse_iso8601(video_data.get('create_time'), ' ')
-
-        return {
-            'id': video_id,
-            'title': self._live_title(title) if is_live else title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'timestamp': timestamp,
-            'is_live': is_live,
-        }
-
-    def _extract_video_formats(self, webpage, video_id, entry_protocol):
-        formats = []
-        m3u8_url = self._html_search_regex(
-            r'file\s*:\s*"([^"]+)', webpage, 'm3u8 url', fatal=False)
-        if m3u8_url:
-            formats.extend(self._extract_m3u8_formats(
-                m3u8_url, video_id, 'mp4', entry_protocol, m3u8_id='hls', fatal=None))
-
-        rtsp_url = self._search_regex(
-            r'<a[^>]+href="(rtsp://[^"]+)"', webpage, 'rtsp url', fatal=False)
-        if rtsp_url:
-            formats.append({
-                'url': rtsp_url,
-                'format_id': 'rtsp',
-            })
-
-        self._sort_formats(formats)
-        return formats
-
-
-class DCNVideoIE(DCNBaseIE):
-    IE_NAME = 'dcn:video'
-    _VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?(?:video/[^/]+|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
-    _TEST = {
-        'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
-        'info_dict':
-        {
-            'id': '17375',
-            'ext': 'mp4',
-            'title': 'رحلة العمر : الحلقة 1',
-            'description': 'md5:0156e935d870acb8ef0a66d24070c6d6',
-            'duration': 2041,
-            'timestamp': 1227504126,
-            'upload_date': '20081124',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        request = sanitized_Request(
-            'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
-            headers={'Origin': 'http://www.dcndigital.ae'})
-        video_data = self._download_json(request, video_id)
-        info = self._extract_video_info(video_data, video_id, False)
-
-        webpage = self._download_webpage(
-            'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
-            compat_urllib_parse_urlencode({
-                'id': video_data['id'],
-                'user_id': video_data['user_id'],
-                'signature': video_data['signature'],
-                'countries': 'Q0M=',
-                'filter': 'DENY',
-            }), video_id)
-        info['formats'] = self._extract_video_formats(webpage, video_id, 'm3u8_native')
-        return info
-
-
-class DCNLiveIE(DCNBaseIE):
-    IE_NAME = 'dcn:live'
-    _VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?live/(?P<id>\d+)'
-
-    def _real_extract(self, url):
-        channel_id = self._match_id(url)
-
-        request = sanitized_Request(
-            'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id,
-            headers={'Origin': 'http://www.dcndigital.ae'})
-
-        channel_data = self._download_json(request, channel_id)
-        info = self._extract_video_info(channel_data, channel_id, True)
-
-        webpage = self._download_webpage(
-            'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
-            compat_urllib_parse_urlencode({
-                'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
-                'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
-                'signature': channel_data['signature'],
-                'countries': 'Q0M=',
-                'filter': 'DENY',
-            }), channel_id)
-        info['formats'] = self._extract_video_formats(webpage, channel_id, 'm3u8')
-        return info
-
-
-class DCNSeasonIE(InfoExtractor):
-    IE_NAME = 'dcn:season'
-    _VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
-    _TEST = {
-        'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
-        'info_dict':
-        {
-            'id': '7910',
-            'title': 'محاضرات الشيخ الشعراوي',
-        },
-        'playlist_mincount': 27,
-    }
-
-    def _real_extract(self, url):
-        url, smuggled_data = unsmuggle_url(url, {})
-        show_id, season_id = re.match(self._VALID_URL, url).groups()
-
-        data = {}
-        if season_id:
-            data['season'] = season_id
-            show_id = smuggled_data.get('show_id')
-            if show_id is None:
-                request = sanitized_Request(
-                    'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id,
-                    headers={'Origin': 'http://www.dcndigital.ae'})
-                season = self._download_json(request, season_id)
-                show_id = season['id']
-        data['show_id'] = show_id
-        request = sanitized_Request(
-            'http://admin.mangomolo.com/analytics/index.php/plus/show',
-            urlencode_postdata(data),
-            {
-                'Origin': 'http://www.dcndigital.ae',
-                'Content-Type': 'application/x-www-form-urlencoded'
-            })
-
-        show = self._download_json(request, show_id)
-        if not season_id:
-            season_id = show['default_season']
-        for season in show['seasons']:
-            if season['id'] == season_id:
-                title = season.get('title_en') or season['title_ar']
-
-                entries = []
-                for video in show['videos']:
-                    video_id = compat_str(video['id'])
-                    entries.append(self.url_result(
-                        'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo', video_id))
-
-                return self.playlist_result(entries, season_id, title)
diff --git a/youtube_dl/extractor/dctp.py b/youtube_dl/extractor/dctp.py

index 9099f5046a14ad7c769a6da50d813076f8b9231e..14ba88715887caeb9144e68384417b2e7b518b07 100644 (file)
--- a/youtube_dl/extractor/dctp.py
+++ b/youtube_dl/extractor/dctp.py
@@ -1,61 +1,54 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_str
+from ..utils import unified_strdate
  
  
  class DctpTvIE(InfoExtractor):
-    _VALID_URL = r'https?://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
+    _VALID_URL = r'https?://(?:www\.)?dctp\.tv/(#/)?filme/(?P<id>.+?)/$'
      _TEST = {
          'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
+        'md5': '174dd4a8a6225cf5655952f969cfbe24',
          'info_dict': {
-            'id': '1324',
+            'id': '95eaa4f33dad413aa17b4ee613cccc6c',
              'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
-            'ext': 'flv',
-            'title': 'Videoinstallation für eine Kaufhausfassade'
+            'ext': 'mp4',
+            'title': 'Videoinstallation für eine Kaufhausfassade',
+            'description': 'Kurzfilm',
+            'upload_date': '20110407',
+            'thumbnail': 're:^https?://.*\.jpg$',
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        base_url = 'http://dctp-ivms2-restapi.s3.amazonaws.com/'
-        version_json = self._download_json(
-            base_url + 'version.json',
-            video_id, note='Determining file version')
-        version = version_json['version_name']
-        info_json = self._download_json(
-            '{0}{1}/restapi/slugs/{2}.json'.format(base_url, version, video_id),
-            video_id, note='Fetching object ID')
-        object_id = compat_str(info_json['object_id'])
-        meta_json = self._download_json(
-            '{0}{1}/restapi/media/{2}.json'.format(base_url, version, object_id),
-            video_id, note='Downloading metadata')
-        uuid = meta_json['uuid']
-        title = meta_json['title']
-        wide = meta_json['is_wide']
-        if wide:
-            ratio = '16x9'
-        else:
-            ratio = '4x3'
-        play_path = 'mp4:{0}_dctp_0500_{1}.m4v'.format(uuid, ratio)
+        webpage = self._download_webpage(url, video_id)
+
+        object_id = self._html_search_meta('DC.identifier', webpage)
  
          servers_json = self._download_json(
-            'http://www.dctp.tv/streaming_servers/',
+            'http://www.dctp.tv/elastic_streaming_client/get_streaming_server/',
              video_id, note='Downloading server list')
-        url = servers_json[0]['endpoint']
+        server = servers_json[0]['server']
+        m3u8_path = self._search_regex(
+            r'\'([^\'"]+/playlist\.m3u8)"', webpage, 'm3u8 path')
+        formats = self._extract_m3u8_formats(
+            'http://%s%s' % (server, m3u8_path), video_id, ext='mp4',
+            entry_protocol='m3u8_native')
+
+        title = self._og_search_title(webpage)
+        description = self._html_search_meta('DC.description', webpage)
+        upload_date = unified_strdate(
+            self._html_search_meta('DC.date.created', webpage))
+        thumbnail = self._og_search_thumbnail(webpage)
  
          return {
              'id': object_id,
              'title': title,
-            'format': 'rtmp',
-            'url': url,
-            'play_path': play_path,
-            'rtmp_real_time': True,
-            'ext': 'flv',
-            'display_id': video_id
+            'formats': formats,
+            'display_id': video_id,
+            'description': description,
+            'upload_date': upload_date,
+            'thumbnail': thumbnail,
          }
diff --git a/youtube_dl/extractor/deezer.py b/youtube_dl/extractor/deezer.py

index c3205ff5fc243c069494e9d50f3522299ea84d34..7a07f3267db874649e5bcc5228a1c7881ebe19d3 100644 (file)
--- a/youtube_dl/extractor/deezer.py
+++ b/youtube_dl/extractor/deezer.py
@@ -41,7 +41,9 @@ class DeezerPlaylistIE(InfoExtractor):
                  'Deezer said: %s' % geoblocking_msg, expected=True)
  
          data_json = self._search_regex(
-            r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n', webpage, 'data JSON')
+            (r'__DZR_APP_STATE__\s*=\s*({.+?})\s*</script>',
+             r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n'),
+            webpage, 'data JSON')
          data = json.loads(data_json)
  
          playlist_title = data.get('DATA', {}).get('TITLE')
diff --git a/youtube_dl/extractor/democracynow.py b/youtube_dl/extractor/democracynow.py

index 6cd395e1169d8253c589efe3da2d24f1632b0356..bdfe638b4d7bd6fa39a4d22a0718033eb130cfa7 100644 (file)
--- a/youtube_dl/extractor/democracynow.py
+++ b/youtube_dl/extractor/democracynow.py
@@ -13,41 +13,57 @@ from ..utils import (
  
  
  class DemocracynowIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?democracynow.org/(?P<id>[^\?]*)'
+    _VALID_URL = r'https?://(?:www\.)?democracynow\.org/(?P<id>[^\?]*)'
      IE_NAME = 'democracynow'
      _TESTS = [{
          'url': 'http://www.democracynow.org/shows/2015/7/3',
-        'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
+        'md5': '3757c182d3d84da68f5c8f506c18c196',
          'info_dict': {
              'id': '2015-0703-001',
              'ext': 'mp4',
-            'title': 'July 03, 2015 - Democracy Now!',
-            'description': 'A daily independent global news hour with Amy Goodman & Juan González "What to the Slave is 4th of July?": James Earl Jones Reads Frederick Douglass\u2019 Historic Speech : "This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag : "We Shall Overcome": Remembering Folk Icon, Activist Pete Seeger in His Own Words & Songs',
+            'title': 'Daily Show',
          },
      }, {
          'url': 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree',
-        'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
          'info_dict': {
              'id': '2015-0703-001',
              'ext': 'mp4',
              'title': '"This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag',
              'description': 'md5:4d2bc4f0d29f5553c2210a4bc7761a21',
          },
+        'params': {
+            'skip_download': True,
+        },
      }]
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
+
          webpage = self._download_webpage(url, display_id)
-        description = self._og_search_description(webpage)
  
          json_data = self._parse_json(self._search_regex(
              r'<script[^>]+type="text/json"[^>]*>\s*({[^>]+})', webpage, 'json'),
              display_id)
-        video_id = None
+
+        title = json_data['title']
          formats = []
  
-        default_lang = 'en'
+        video_id = None
+
+        for key in ('file', 'audio', 'video', 'high_res_video'):
+            media_url = json_data.get(key, '')
+            if not media_url:
+                continue
+            media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
+            video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
+            formats.append({
+                'url': media_url,
+                'vcodec': 'none' if key == 'audio' else None,
+            })
+
+        self._sort_formats(formats)
  
+        default_lang = 'en'
          subtitles = {}
  
          def add_subtitle_item(lang, info_dict):
@@ -67,22 +83,13 @@ class DemocracynowIE(InfoExtractor):
                  'url': compat_urlparse.urljoin(url, subtitle_item['url']),
              })
  
-        for key in ('file', 'audio', 'video'):
-            media_url = json_data.get(key, '')
-            if not media_url:
-                continue
-            media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
-            video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
-            formats.append({
-                'url': media_url,
-            })
-
-        self._sort_formats(formats)
+        description = self._og_search_description(webpage, default=None)
  
          return {
              'id': video_id or display_id,
-            'title': json_data['title'],
+            'title': title,
              'description': description,
+            'thumbnail': json_data.get('image'),
              'subtitles': subtitles,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/dfb.py b/youtube_dl/extractor/dfb.py

index cdfeccacb447591f4dcc776a9c1a374a794fa5ba..a4d0448c26149429ebd7d5813f432b56bf0e6020 100644 (file)
--- a/youtube_dl/extractor/dfb.py
+++ b/youtube_dl/extractor/dfb.py
@@ -12,39 +12,46 @@ class DFBIE(InfoExtractor):
  
      _TEST = {
          'url': 'http://tv.dfb.de/video/u-19-em-stimmen-zum-spiel-gegen-russland/11633/',
-        # The md5 is different each time
+        'md5': 'ac0f98a52a330f700b4b3034ad240649',
          'info_dict': {
              'id': '11633',
              'display_id': 'u-19-em-stimmen-zum-spiel-gegen-russland',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'U 19-EM: Stimmen zum Spiel gegen Russland',
              'upload_date': '20150714',
          },
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        display_id, video_id = re.match(self._VALID_URL, url).groups()
  
-        webpage = self._download_webpage(url, display_id)
          player_info = self._download_xml(
              'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id,
              display_id)
          video_info = player_info.find('video')
-
-        f4m_info = self._download_xml(
-            self._proto_relative_url(video_info.find('url').text.strip()), display_id)
-        token_el = f4m_info.find('token')
-        manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
-        formats = self._extract_f4m_formats(manifest_url, display_id)
+        stream_access_url = self._proto_relative_url(video_info.find('url').text.strip())
+
+        formats = []
+        # see http://tv.dfb.de/player/js/ajax.js for the method to extract m3u8 formats
+        for sa_url in (stream_access_url, stream_access_url + '&area=&format=iphone'):
+            stream_access_info = self._download_xml(sa_url, display_id)
+            token_el = stream_access_info.find('token')
+            manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth']
+            if '.f4m' in manifest_url:
+                formats.extend(self._extract_f4m_formats(
+                    manifest_url + '&hdcore=3.2.0',
+                    display_id, f4m_id='hds', fatal=False))
+            else:
+                formats.extend(self._extract_m3u8_formats(
+                    manifest_url, display_id, 'mp4',
+                    'm3u8_native', m3u8_id='hls', fatal=False))
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'display_id': display_id,
              'title': video_info.find('title').text,
-            'thumbnail': self._og_search_thumbnail(webpage),
+            'thumbnail': 'http://tv.dfb.de/images/%s_640x360.jpg' % video_id,
              'upload_date': unified_strdate(video_info.find('time_date').text),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/discovery.py b/youtube_dl/extractor/discovery.py

index 5f1275b39a1e40048dfdbe865f38bb5d72d89426..55853f76f91e97db19423c6cf8c1b8b56e006447 100644 (file)
--- a/youtube_dl/extractor/discovery.py
+++ b/youtube_dl/extractor/discovery.py
@@ -33,6 +33,7 @@ class DiscoveryIE(InfoExtractor):
              'duration': 156,
              'timestamp': 1302032462,
              'upload_date': '20110405',
+            'uploader_id': '103207',
          },
          'params': {
              'skip_download': True,  # requires ffmpeg
@@ -54,7 +55,11 @@ class DiscoveryIE(InfoExtractor):
              'upload_date': '20140725',
              'timestamp': 1406246400,
              'duration': 116,
+            'uploader_id': '103207',
          },
+        'params': {
+            'skip_download': True,  # requires ffmpeg
+        }
      }]
  
      def _real_extract(self, url):
@@ -66,13 +71,19 @@ class DiscoveryIE(InfoExtractor):
          entries = []
  
          for idx, video_info in enumerate(info['playlist']):
-            formats = self._extract_m3u8_formats(
-                video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls',
-                note='Download m3u8 information for video %d' % (idx + 1))
-            self._sort_formats(formats)
+            subtitles = {}
+            caption_url = video_info.get('captionsUrl')
+            if caption_url:
+                subtitles = {
+                    'en': [{
+                        'url': caption_url,
+                    }]
+                }
+
              entries.append({
+                '_type': 'url_transparent',
+                'url': 'http://players.brightcove.net/103207/default_default/index.html?videoId=ref:%s' % video_info['referenceId'],
                  'id': compat_str(video_info['id']),
-                'formats': formats,
                  'title': video_info['title'],
                  'description': video_info.get('description'),
                  'duration': parse_duration(video_info.get('video_length')),
@@ -80,6 +91,7 @@ class DiscoveryIE(InfoExtractor):
                  'thumbnail': video_info.get('thumbnailURL'),
                  'alt_title': video_info.get('secondary_title'),
                  'timestamp': parse_iso8601(video_info.get('publishedDate')),
+                'subtitles': subtitles,
              })
  
          return self.playlist_result(entries, display_id, video_title)
diff --git a/youtube_dl/extractor/discoverygo.py b/youtube_dl/extractor/discoverygo.py

new file mode 100644 (file)

index 0000000..c4e83b2
--- /dev/null
+++ b/youtube_dl/extractor/discoverygo.py
@@ -0,0 +1,116 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    extract_attributes,
+    int_or_none,
+    parse_age_limit,
+    unescapeHTML,
+    ExtractorError,
+)
+
+
+class DiscoveryGoIE(InfoExtractor):
+    _VALID_URL = r'''(?x)https?://(?:www\.)?(?:
+            discovery|
+            investigationdiscovery|
+            discoverylife|
+            animalplanet|
+            ahctv|
+            destinationamerica|
+            sciencechannel|
+            tlc|
+            velocitychannel
+        )go\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'''
+    _TEST = {
+        'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
+        'info_dict': {
+            'id': '57a33c536b66d1cd0345eeb1',
+            'ext': 'mp4',
+            'title': 'Kiss First, Ask Questions Later!',
+            'description': 'md5:fe923ba34050eae468bffae10831cb22',
+            'duration': 2579,
+            'series': 'Love at First Kiss',
+            'season_number': 1,
+            'episode_number': 1,
+            'age_limit': 14,
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        container = extract_attributes(
+            self._search_regex(
+                r'(<div[^>]+class=["\']video-player-container[^>]+>)',
+                webpage, 'video container'))
+
+        video = self._parse_json(
+            unescapeHTML(container.get('data-video') or container.get('data-json')),
+            display_id)
+
+        title = video['name']
+
+        stream = video.get('stream')
+        if not stream:
+            if video.get('authenticated') is True:
+                raise ExtractorError(
+                    'This video is only available via cable service provider subscription that'
+                    ' is not currently supported. You may want to use --cookies.', expected=True)
+            else:
+                raise ExtractorError('Unable to find stream')
+        STREAM_URL_SUFFIX = 'streamUrl'
+        formats = []
+        for stream_kind in ('', 'hds'):
+            suffix = STREAM_URL_SUFFIX.capitalize() if stream_kind else STREAM_URL_SUFFIX
+            stream_url = stream.get('%s%s' % (stream_kind, suffix))
+            if not stream_url:
+                continue
+            if stream_kind == '':
+                formats.extend(self._extract_m3u8_formats(
+                    stream_url, display_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif stream_kind == 'hds':
+                formats.extend(self._extract_f4m_formats(
+                    stream_url, display_id, f4m_id=stream_kind, fatal=False))
+        self._sort_formats(formats)
+
+        video_id = video.get('id') or display_id
+        description = video.get('description', {}).get('detailed')
+        duration = int_or_none(video.get('duration'))
+
+        series = video.get('show', {}).get('name')
+        season_number = int_or_none(video.get('season', {}).get('number'))
+        episode_number = int_or_none(video.get('episodeNumber'))
+
+        tags = video.get('tags')
+        age_limit = parse_age_limit(video.get('parental', {}).get('rating'))
+
+        subtitles = {}
+        captions = stream.get('captions')
+        if isinstance(captions, list):
+            for caption in captions:
+                subtitle_url = caption.get('fileUrl')
+                if (not subtitle_url or not isinstance(subtitle_url, compat_str) or
+                        not subtitle_url.startswith('http')):
+                    continue
+                lang = caption.get('fileLang', 'en')
+                subtitles.setdefault(lang, []).append({'url': subtitle_url})
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'series': series,
+            'season_number': season_number,
+            'episode_number': episode_number,
+            'tags': tags,
+            'age_limit': age_limit,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/dispeak.py b/youtube_dl/extractor/dispeak.py

new file mode 100644 (file)

index 0000000..a78cb8a
--- /dev/null
+++ b/youtube_dl/extractor/dispeak.py
@@ -0,0 +1,114 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    remove_end,
+    xpath_element,
+    xpath_text,
+)
+
+
+class DigitallySpeakingIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
+
+    _TESTS = [{
+        # From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
+        'url': 'http://evt.dispeak.com/ubm/gdc/sf16/xml/840376_BQRC.xml',
+        'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
+        'info_dict': {
+            'id': '840376_BQRC',
+            'ext': 'mp4',
+            'title': 'Tenacious Design and The Interface of \'Destiny\'',
+        },
+    }, {
+        # From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
+        'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
+        'only_matching': True,
+    }]
+
+    def _parse_mp4(self, metadata):
+        video_formats = []
+        video_root = None
+
+        mp4_video = xpath_text(metadata, './mp4video', default=None)
+        if mp4_video is not None:
+            mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video)
+            video_root = mobj.group('root')
+        if video_root is None:
+            http_host = xpath_text(metadata, 'httpHost', default=None)
+            if http_host:
+                video_root = 'http://%s/' % http_host
+        if video_root is None:
+            # Hard-coded in http://evt.dispeak.com/ubm/gdc/sf16/custom/player2.js
+            # Works for GPUTechConf, too
+            video_root = 'http://s3-2u.digitallyspeaking.com/'
+
+        formats = metadata.findall('./MBRVideos/MBRVideo')
+        if not formats:
+            return None
+        for a_format in formats:
+            stream_name = xpath_text(a_format, 'streamName', fatal=True)
+            video_path = re.match(r'mp4\:(?P<path>.*)', stream_name).group('path')
+            url = video_root + video_path
+            vbr = xpath_text(a_format, 'bitrate')
+            video_formats.append({
+                'url': url,
+                'vbr': int_or_none(vbr),
+            })
+        return video_formats
+
+    def _parse_flv(self, metadata):
+        formats = []
+        akamai_url = xpath_text(metadata, './akamaiHost', fatal=True)
+        audios = metadata.findall('./audios/audio')
+        for audio in audios:
+            formats.append({
+                'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
+                'play_path': remove_end(audio.get('url'), '.flv'),
+                'ext': 'flv',
+                'vcodec': 'none',
+                'format_id': audio.get('code'),
+            })
+        slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
+        formats.append({
+            'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
+            'play_path': remove_end(slide_video_path, '.flv'),
+            'ext': 'flv',
+            'format_note': 'slide deck video',
+            'quality': -2,
+            'preference': -2,
+            'format_id': 'slides',
+        })
+        speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
+        formats.append({
+            'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
+            'play_path': remove_end(speaker_video_path, '.flv'),
+            'ext': 'flv',
+            'format_note': 'speaker video',
+            'quality': -1,
+            'preference': -1,
+            'format_id': 'speaker',
+        })
+        return formats
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        xml_description = self._download_xml(url, video_id)
+        metadata = xpath_element(xml_description, 'metadata')
+
+        video_formats = self._parse_mp4(metadata)
+        if video_formats is None:
+            video_formats = self._parse_flv(metadata)
+
+        return {
+            'id': video_id,
+            'formats': video_formats,
+            'title': xpath_text(metadata, 'title', fatal=True),
+            'duration': parse_duration(xpath_text(metadata, 'endTime')),
+            'creator': xpath_text(metadata, 'speaker'),
+        }
diff --git a/youtube_dl/extractor/dotsub.py b/youtube_dl/extractor/dotsub.py

index e9ca236d4a03c13b1b29b3386535c4262332dab0..1f75352ca945c3e63ddf85e1ec204b5787cafeb6 100644 (file)
--- a/youtube_dl/extractor/dotsub.py
+++ b/youtube_dl/extractor/dotsub.py
@@ -9,22 +9,39 @@ from ..utils import (
  
  class DotsubIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?dotsub\.com/view/(?P<id>[^/]+)'
-    _TEST = {
-        'url': 'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27',
-        'md5': '0914d4d69605090f623b7ac329fea66e',
+    _TESTS = [{
+        'url': 'https://dotsub.com/view/9c63db2a-fa95-4838-8e6e-13deafe47f09',
+        'md5': '21c7ff600f545358134fea762a6d42b6',
          'info_dict': {
-            'id': 'aed3b8b2-1889-4df5-ae63-ad85f5572f27',
+            'id': '9c63db2a-fa95-4838-8e6e-13deafe47f09',
              'ext': 'flv',
-            'title': 'Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary',
-            'description': 'md5:699a0f7f50aeec6042cb3b1db2d0d074',
-            'thumbnail': 're:^https?://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p',
-            'duration': 3169,
-            'uploader': '4v4l0n42',
-            'timestamp': 1292248482.625,
-            'upload_date': '20101213',
+            'title': 'MOTIVATION - "It\'s Possible" Best Inspirational Video Ever',
+            'description': 'md5:41af1e273edbbdfe4e216a78b9d34ac6',
+            'thumbnail': 're:^https?://dotsub.com/media/9c63db2a-fa95-4838-8e6e-13deafe47f09/p',
+            'duration': 198,
+            'uploader': 'liuxt',
+            'timestamp': 1385778501.104,
+            'upload_date': '20131130',
              'view_count': int,
          }
-    }
+    }, {
+        'url': 'https://dotsub.com/view/747bcf58-bd59-45b7-8c8c-ac312d084ee6',
+        'md5': '2bb4a83896434d5c26be868c609429a3',
+        'info_dict': {
+            'id': '168006778',
+            'ext': 'mp4',
+            'title': 'Apartments and flats in Raipur the white symphony',
+            'description': 'md5:784d0639e6b7d1bc29530878508e38fe',
+            'thumbnail': 're:^https?://dotsub.com/media/747bcf58-bd59-45b7-8c8c-ac312d084ee6/p',
+            'duration': 290,
+            'timestamp': 1476767794.2809999,
+            'upload_date': '20160525',
+            'uploader': 'parthivi001',
+            'uploader_id': 'user52596202',
+            'view_count': int,
+        },
+        'add_ie': ['Vimeo'],
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -37,12 +54,23 @@ class DotsubIE(InfoExtractor):
              webpage = self._download_webpage(url, video_id)
              video_url = self._search_regex(
                  [r'<source[^>]+src="([^"]+)"', r'"file"\s*:\s*\'([^\']+)'],
-                webpage, 'video url')
+                webpage, 'video url', default=None)
+            info_dict = {
+                'id': video_id,
+                'url': video_url,
+                'ext': 'flv',
+            }
  
-        return {
-            'id': video_id,
-            'url': video_url,
-            'ext': 'flv',
+        if not video_url:
+            setup_data = self._parse_json(self._html_search_regex(
+                r'(?s)data-setup=([\'"])(?P<content>(?!\1).+?)\1',
+                webpage, 'setup data', group='content'), video_id)
+            info_dict = {
+                '_type': 'url_transparent',
+                'url': setup_data['src'],
+            }
+
+        info_dict.update({
              'title': info['title'],
              'description': info.get('description'),
              'thumbnail': info.get('screenshotURI'),
@@ -50,4 +78,6 @@ class DotsubIE(InfoExtractor):
              'uploader': info.get('user'),
              'timestamp': float_or_none(info.get('dateCreated'), 1000),
              'view_count': int_or_none(info.get('numberOfViews')),
-        }
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/douyutv.py b/youtube_dl/extractor/douyutv.py

index 3915cb182961711873b7a75b48958ac50602aa45..e366e17e68139288543243667d637544488a6a23 100644 (file)
--- a/youtube_dl/extractor/douyutv.py
+++ b/youtube_dl/extractor/douyutv.py
@@ -3,9 +3,17 @@ from __future__ import unicode_literals
  
  import hashlib
  import time
+import uuid
+
  from .common import InfoExtractor
-from ..utils import (ExtractorError, unescapeHTML)
-from ..compat import (compat_str, compat_basestring)
+from ..compat import (
+    compat_str,
+    compat_urllib_parse_urlencode,
+)
+from ..utils import (
+    ExtractorError,
+    unescapeHTML,
+)
  
  
  class DouyuTVIE(InfoExtractor):
@@ -18,10 +26,9 @@ class DouyuTVIE(InfoExtractor):
              'display_id': 'iseven',
              'ext': 'flv',
              'title': 're:^清晨醒脑！T-ara根本停不下来！ [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-            'description': 'md5:f34981259a03e980a3c6404190a3ed61',
+            'description': 're:.*m7show@163\.com.*',
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': '7师傅',
-            'uploader_id': '431925',
              'is_live': True,
          },
          'params': {
@@ -37,13 +44,12 @@ class DouyuTVIE(InfoExtractor):
              'description': 'md5:746a2f7a253966a06755a912f0acc0d2',
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': 'douyu小漠',
-            'uploader_id': '3769985',
              'is_live': True,
          },
          'params': {
              'skip_download': True,
          },
-        'skip': 'Romm not found',
+        'skip': 'Room not found',
      }, {
          'url': 'http://www.douyutv.com/17732',
          'info_dict': {
@@ -51,10 +57,9 @@ class DouyuTVIE(InfoExtractor):
              'display_id': '17732',
              'ext': 'flv',
              'title': 're:^清晨醒脑！T-ara根本停不下来！ [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-            'description': 'md5:f34981259a03e980a3c6404190a3ed61',
+            'description': 're:.*m7show@163\.com.*',
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': '7师傅',
-            'uploader_id': '431925',
              'is_live': True,
          },
          'params': {
@@ -65,6 +70,10 @@ class DouyuTVIE(InfoExtractor):
          'only_matching': True,
      }]
  
+    # Decompile core.swf in webpage by ffdec "Search SWFs in memory". core.swf
+    # is encrypted originally, but ffdec can dump memory to get the decrypted one.
+    _API_KEY = 'A12Svb&%1UUmf@hC'
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
@@ -75,59 +84,56 @@ class DouyuTVIE(InfoExtractor):
              room_id = self._html_search_regex(
                  r'"room_id"\s*:\s*(\d+),', page, 'room id')
  
-        prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
-            room_id, int(time.time()))
+        room = self._download_json(
+            'http://m.douyu.com/html5/live?roomId=%s' % room_id, video_id,
+            note='Downloading room info')['data']
  
-        auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
-        config = self._download_json(
-            'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
-            video_id)
+        # 1 = live, 2 = offline
+        if room.get('show_status') == '2':
+            raise ExtractorError('Live stream is offline', expected=True)
  
-        data = config['data']
+        tt = compat_str(int(time.time() / 60))
+        did = uuid.uuid4().hex.upper()
  
-        error_code = config.get('error', 0)
-        if error_code is not 0:
-            error_desc = 'Server reported error %i' % error_code
-            if isinstance(data, (compat_str, compat_basestring)):
-                error_desc += ': ' + data
-            raise ExtractorError(error_desc, expected=True)
+        sign_content = ''.join((room_id, did, self._API_KEY, tt))
+        sign = hashlib.md5((sign_content).encode('utf-8')).hexdigest()
  
-        show_status = data.get('show_status')
-        # 1 = live, 2 = offline
-        if show_status == '2':
-            raise ExtractorError(
-                'Live stream is offline', expected=True)
+        flv_data = compat_urllib_parse_urlencode({
+            'cdn': 'ws',
+            'rate': '0',
+            'tt': tt,
+            'did': did,
+            'sign': sign,
+        })
  
-        base_url = data['rtmp_url']
-        live_path = data['rtmp_live']
+        video_info = self._download_json(
+            'http://www.douyu.com/lapi/live/getPlay/%s' % room_id, video_id,
+            data=flv_data, note='Downloading video info',
+            headers={'Content-Type': 'application/x-www-form-urlencoded'})
  
-        title = self._live_title(unescapeHTML(data['room_name']))
-        description = data.get('show_details')
-        thumbnail = data.get('room_src')
+        error_code = video_info.get('error', 0)
+        if error_code is not 0:
+            raise ExtractorError(
+                '%s reported error %i' % (self.IE_NAME, error_code),
+                expected=True)
  
-        uploader = data.get('nickname')
-        uploader_id = data.get('owner_uid')
+        base_url = video_info['data']['rtmp_url']
+        live_path = video_info['data']['rtmp_live']
  
-        multi_formats = data.get('rtmp_multi_bitrate')
-        if not isinstance(multi_formats, dict):
-            multi_formats = {}
-        multi_formats['live'] = live_path
+        video_url = '%s/%s' % (base_url, live_path)
  
-        formats = [{
-            'url': '%s/%s' % (base_url, format_path),
-            'format_id': format_id,
-            'preference': 1 if format_id == 'live' else 0,
-        } for format_id, format_path in multi_formats.items()]
-        self._sort_formats(formats)
+        title = self._live_title(unescapeHTML(room['room_name']))
+        description = room.get('notice')
+        thumbnail = room.get('room_src')
+        uploader = room.get('nickname')
  
          return {
              'id': room_id,
              'display_id': video_id,
+            'url': video_url,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
              'uploader': uploader,
-            'uploader_id': uploader_id,
-            'formats': formats,
              'is_live': True,
          }
diff --git a/youtube_dl/extractor/dplay.py b/youtube_dl/extractor/dplay.py

index 66bbfc6ca0b32ad37feaaa42d1480214cb8d89d2..5790553f38ca29107bad44317fedb271dce0883a 100644 (file)
--- a/youtube_dl/extractor/dplay.py
+++ b/youtube_dl/extractor/dplay.py
@@ -6,13 +6,18 @@ import re
  import time
  
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..compat import compat_urlparse
+from ..utils import (
+    int_or_none,
+    update_url_query,
+)
  
  
  class DPlayIE(InfoExtractor):
      _VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
  
      _TESTS = [{
+        # geo restricted, via direct unsigned hls URL
          'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
          'info_dict': {
              'id': '1255600',
@@ -31,11 +36,12 @@ class DPlayIE(InfoExtractor):
          },
          'expected_warnings': ['Unable to download f4m manifest'],
      }, {
+        # non geo restricted, via secure api, unsigned download hls URL
          'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
          'info_dict': {
              'id': '3172',
              'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Svensken lär sig njuta av livet',
              'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
              'duration': 2650,
@@ -48,23 +54,25 @@ class DPlayIE(InfoExtractor):
              'age_limit': 0,
          },
      }, {
+        # geo restricted, via secure api, unsigned download hls URL
          'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
          'info_dict': {
              'id': '70816',
              'display_id': 'season-6-episode-12',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Episode 12',
              'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
              'duration': 2563,
              'timestamp': 1429696800,
              'upload_date': '20150422',
-            'creator': 'Kanal 4',
+            'creator': 'Kanal 4 (Home)',
              'series': 'Mig og min mor',
              'season_number': 6,
              'episode_number': 12,
              'age_limit': 0,
          },
      }, {
+        # geo restricted, via direct unsigned hls URL
          'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
          'only_matching': True,
      }]
@@ -90,17 +98,24 @@ class DPlayIE(InfoExtractor):
  
          def extract_formats(protocol, manifest_url):
              if protocol == 'hls':
-                formats.extend(self._extract_m3u8_formats(
+                m3u8_formats = self._extract_m3u8_formats(
                      manifest_url, video_id, ext='mp4',
-                    entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False))
+                    entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
+                # Sometimes final URLs inside m3u8 are unsigned, let's fix this
+                # ourselves
+                query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
+                for m3u8_format in m3u8_formats:
+                    m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
+                formats.extend(m3u8_formats)
              elif protocol == 'hds':
                  formats.extend(self._extract_f4m_formats(
                      manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
                      video_id, f4m_id=protocol, fatal=False))
  
          domain_tld = domain.split('.')[-1]
-        if domain_tld in ('se', 'dk'):
+        if domain_tld in ('se', 'dk', 'no'):
              for protocol in PROTOCOLS:
+                # Providing dsc-geo allows to bypass geo restriction in some cases
                  self._set_cookie(
                      'secure.dplay.%s' % domain_tld, 'dsc-geo',
                      json.dumps({
@@ -113,13 +128,24 @@ class DPlayIE(InfoExtractor):
                      'Downloading %s stream JSON' % protocol, fatal=False)
                  if stream and stream.get(protocol):
                      extract_formats(protocol, stream[protocol])
-        else:
+
+        # The last resort is to try direct unsigned hls/hds URLs from info dictionary.
+        # Sometimes this does work even when secure API with dsc-geo has failed (e.g.
+        # http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
+        if not formats:
              for protocol in PROTOCOLS:
                  if info.get(protocol):
                      extract_formats(protocol, info[protocol])
  
          self._sort_formats(formats)
  
+        subtitles = {}
+        for lang in ('se', 'sv', 'da', 'nl', 'no'):
+            for format_id in ('web_vtt', 'vtt', 'srt'):
+                subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
+                if subtitle_url:
+                    subtitles.setdefault(lang, []).append({'url': subtitle_url})
+
          return {
              'id': video_id,
              'display_id': display_id,
@@ -133,4 +159,5 @@ class DPlayIE(InfoExtractor):
              'episode_number': int_or_none(info.get('episode')),
              'age_limit': int_or_none(info.get('minimum_age')),
              'formats': formats,
+            'subtitles': subtitles,
          }
diff --git a/youtube_dl/extractor/dramafever.py b/youtube_dl/extractor/dramafever.py

index 3b6529f4b108052e3019c8e400bbc2cd0eb5a9a1..c115956121a242920ec8016e8c9f3558c34060c6 100644 (file)
--- a/youtube_dl/extractor/dramafever.py
+++ b/youtube_dl/extractor/dramafever.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import itertools
diff --git a/youtube_dl/extractor/dreisat.py b/youtube_dl/extractor/dreisat.py

index 0040e70d4929828ebf2dc7dc74199ed639dcfebd..908c9e514c41ea72bac0e6f6ede41def4ba0b20b 100644 (file)
--- a/youtube_dl/extractor/dreisat.py
+++ b/youtube_dl/extractor/dreisat.py
@@ -17,8 +17,12 @@ class DreiSatIE(ZDFIE):
                  'ext': 'mp4',
                  'title': 'Waidmannsheil',
                  'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
-                'uploader': '3sat',
+                'uploader': 'SCHWEIZWEIT',
+                'uploader_id': '100000210',
                  'upload_date': '20140913'
+            },
+            'params': {
+                'skip_download': True,  # m3u8 downloads
              }
          },
          {
diff --git a/youtube_dl/extractor/drtuber.py b/youtube_dl/extractor/drtuber.py

index 639f9182c5484a22f0056e25fc6aa7e56f193df5..22da8e48105e5e8ee81a9cc948c67f6ec7d72eb8 100644 (file)
--- a/youtube_dl/extractor/drtuber.py
+++ b/youtube_dl/extractor/drtuber.py
@@ -3,12 +3,15 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import str_to_int
+from ..utils import (
+    NO_DEFAULT,
+    str_to_int,
+)
  
  
  class DrTuberIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?drtuber\.com/video/(?P<id>\d+)/(?P<display_id>[\w-]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?drtuber\.com/(?:video|embed)/(?P<id>\d+)(?:/(?P<display_id>[\w-]+))?'
+    _TESTS = [{
          'url': 'http://www.drtuber.com/video/1740434/hot-perky-blonde-naked-golf',
          'md5': '93e680cf2536ad0dfb7e74d94a89facd',
          'info_dict': {
@@ -17,44 +20,57 @@ class DrTuberIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'hot perky blonde naked golf',
              'like_count': int,
-            'dislike_count': int,
              'comment_count': int,
              'categories': ['Babe', 'Blonde', 'Erotic', 'Outdoor', 'Softcore', 'Solo'],
              'thumbnail': 're:https?://.*\.jpg$',
              'age_limit': 18,
          }
-    }
+    }, {
+        'url': 'http://www.drtuber.com/embed/489939',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?drtuber\.com/embed/\d+)',
+            webpage)
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        display_id = mobj.group('display_id') or video_id
  
-        webpage = self._download_webpage(url, display_id)
+        webpage = self._download_webpage(
+            'http://www.drtuber.com/video/%s' % video_id, display_id)
  
          video_url = self._html_search_regex(
              r'<source src="([^"]+)"', webpage, 'video URL')
  
          title = self._html_search_regex(
-            [r'<p[^>]+class="title_substrate">([^<]+)</p>', r'<title>([^<]+) - \d+'],
+            (r'class="title_watch"[^>]*><(?:p|h\d+)[^>]*>([^<]+)<',
+             r'<p[^>]+class="title_substrate">([^<]+)</p>',
+             r'<title>([^<]+) - \d+'),
              webpage, 'title')
  
          thumbnail = self._html_search_regex(
              r'poster="([^"]+)"',
              webpage, 'thumbnail', fatal=False)
  
-        def extract_count(id_, name):
+        def extract_count(id_, name, default=NO_DEFAULT):
              return str_to_int(self._html_search_regex(
                  r'<span[^>]+(?:class|id)="%s"[^>]*>([\d,\.]+)</span>' % id_,
-                webpage, '%s count' % name, fatal=False))
+                webpage, '%s count' % name, default=default, fatal=False))
  
          like_count = extract_count('rate_likes', 'like')
-        dislike_count = extract_count('rate_dislikes', 'dislike')
+        dislike_count = extract_count('rate_dislikes', 'dislike', default=None)
          comment_count = extract_count('comments_count', 'comment')
  
          cats_str = self._search_regex(
-            r'<div[^>]+class="categories_list">(.+?)</div>', webpage, 'categories', fatal=False)
-        categories = [] if not cats_str else re.findall(r'<a title="([^"]+)"', cats_str)
+            r'<div[^>]+class="categories_list">(.+?)</div>',
+            webpage, 'categories', fatal=False)
+        categories = [] if not cats_str else re.findall(
+            r'<a title="([^"]+)"', cats_str)
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/drtv.py b/youtube_dl/extractor/drtv.py

index 2d74ff855f1670e0dcb46e35d1875e8e9c9fd144..88d096b307cdf6d484ef6b89253f6cdbcb82deb0 100644 (file)
--- a/youtube_dl/extractor/drtv.py
+++ b/youtube_dl/extractor/drtv.py
@@ -4,26 +4,45 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
+    int_or_none,
+    float_or_none,
+    mimetype2ext,
      parse_iso8601,
+    remove_end,
  )
  
  
  class DRTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?dr\.dk/tv/se/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
+    _VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
  
-    _TEST = {
-        'url': 'https://www.dr.dk/tv/se/boern/ultra/panisk-paske/panisk-paske-5',
-        'md5': 'dc515a9ab50577fa14cc4e4b0265168f',
+    _TESTS = [{
+        'url': 'https://www.dr.dk/tv/se/boern/ultra/klassen-ultra/klassen-darlig-taber-10',
+        'md5': '25e659cccc9a2ed956110a299fdf5983',
          'info_dict': {
-            'id': 'panisk-paske-5',
+            'id': 'klassen-darlig-taber-10',
              'ext': 'mp4',
-            'title': 'Panisk Påske (5)',
-            'description': 'md5:ca14173c5ab24cd26b0fcc074dff391c',
-            'timestamp': 1426984612,
-            'upload_date': '20150322',
-            'duration': 1455,
+            'title': 'Klassen - Dårlig taber (10)',
+            'description': 'md5:815fe1b7fa656ed80580f31e8b3c79aa',
+            'timestamp': 1471991907,
+            'upload_date': '20160823',
+            'duration': 606.84,
          },
-    }
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://www.dr.dk/nyheder/indland/live-christianias-rydning-af-pusher-street-er-i-gang',
+        'md5': '2c37175c718155930f939ef59952474a',
+        'info_dict': {
+            'id': 'christiania-pusher-street-ryddes-drdkrjpo',
+            'ext': 'mp4',
+            'title': 'LIVE Christianias rydning af Pusher Street er i gang',
+            'description': '- Det er det fedeste, der er sket i 20 år, fortæller christianit til DR Nyheder.',
+            'timestamp': 1472800279,
+            'upload_date': '20160902',
+            'duration': 131.4,
+        },
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -35,7 +54,8 @@ class DRTVIE(InfoExtractor):
                  'Video %s is not available' % video_id, expected=True)
  
          video_id = self._search_regex(
-            r'data-(?:material-identifier|episode-slug)="([^"]+)"',
+            (r'data-(?:material-identifier|episode-slug)="([^"]+)"',
+                r'data-resource="[^>"]+mu/programcard/expanded/([^"]+)"'),
              webpage, 'video id')
  
          programcard = self._download_json(
@@ -43,9 +63,12 @@ class DRTVIE(InfoExtractor):
              video_id, 'Downloading video JSON')
          data = programcard['Data'][0]
  
-        title = data['Title']
-        description = data['Description']
-        timestamp = parse_iso8601(data['CreatedTime'])
+        title = remove_end(self._og_search_title(
+            webpage, default=None), ' | TV | DR') or data['Title']
+        description = self._og_search_description(
+            webpage, default=None) or data.get('Description')
+
+        timestamp = parse_iso8601(data.get('CreatedTime'))
  
          thumbnail = None
          duration = None
@@ -56,16 +79,18 @@ class DRTVIE(InfoExtractor):
          subtitles = {}
  
          for asset in data['Assets']:
-            if asset['Kind'] == 'Image':
-                thumbnail = asset['Uri']
-            elif asset['Kind'] == 'VideoResource':
-                duration = asset['DurationInMilliseconds'] / 1000.0
-                restricted_to_denmark = asset['RestrictedToDenmark']
-                spoken_subtitles = asset['Target'] == 'SpokenSubtitles'
-                for link in asset['Links']:
-                    uri = link['Uri']
-                    target = link['Target']
-                    format_id = target
+            if asset.get('Kind') == 'Image':
+                thumbnail = asset.get('Uri')
+            elif asset.get('Kind') == 'VideoResource':
+                duration = float_or_none(asset.get('DurationInMilliseconds'), 1000)
+                restricted_to_denmark = asset.get('RestrictedToDenmark')
+                spoken_subtitles = asset.get('Target') == 'SpokenSubtitles'
+                for link in asset.get('Links', []):
+                    uri = link.get('Uri')
+                    if not uri:
+                        continue
+                    target = link.get('Target')
+                    format_id = target or ''
                      preference = None
                      if spoken_subtitles:
                          preference = -1
@@ -76,8 +101,8 @@ class DRTVIE(InfoExtractor):
                              video_id, preference, f4m_id=format_id))
                      elif target == 'HLS':
                          formats.extend(self._extract_m3u8_formats(
-                            uri, video_id, 'mp4', preference=preference,
-                            m3u8_id=format_id))
+                            uri, video_id, 'mp4', entry_protocol='m3u8_native',
+                            preference=preference, m3u8_id=format_id))
                      else:
                          bitrate = link.get('Bitrate')
                          if bitrate:
@@ -85,7 +110,7 @@ class DRTVIE(InfoExtractor):
                          formats.append({
                              'url': uri,
                              'format_id': format_id,
-                            'tbr': bitrate,
+                            'tbr': int_or_none(bitrate),
                              'ext': link.get('FileFormat'),
                          })
                  subtitles_list = asset.get('SubtitlesList')
@@ -94,12 +119,18 @@ class DRTVIE(InfoExtractor):
                          'Danish': 'da',
                      }
                      for subs in subtitles_list:
-                        lang = subs['Language']
-                        subtitles[LANGS.get(lang, lang)] = [{'url': subs['Uri'], 'ext': 'vtt'}]
+                        if not subs.get('Uri'):
+                            continue
+                        lang = subs.get('Language') or 'da'
+                        subtitles.setdefault(LANGS.get(lang, lang), []).append({
+                            'url': subs['Uri'],
+                            'ext': mimetype2ext(subs.get('MimeType')) or 'vtt'
+                        })
  
          if not formats and restricted_to_denmark:
-            raise ExtractorError(
-                'Unfortunately, DR is not allowed to show this program outside Denmark.', expected=True)
+            self.raise_geo_restricted(
+                'Unfortunately, DR is not allowed to show this program outside Denmark.',
+                expected=True)
  
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/dump.py b/youtube_dl/extractor/dump.py

deleted file mode 100644 (file)

index ff78d4f..0000000
--- a/youtube_dl/extractor/dump.py
+++ /dev/null
@@ -1,39 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class DumpIE(InfoExtractor):
-    _VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P<id>[a-zA-Z0-9]+)/'
-
-    _TEST = {
-        'url': 'http://www.dump.com/oneus/',
-        'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99',
-        'info_dict': {
-            'id': 'oneus',
-            'ext': 'flv',
-            'title': "He's one of us.",
-            'thumbnail': 're:^https?://.*\.jpg$',
-        },
-    }
-
-    def _real_extract(self, url):
-        m = re.match(self._VALID_URL, url)
-        video_id = m.group('id')
-
-        webpage = self._download_webpage(url, video_id)
-        video_url = self._search_regex(
-            r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL')
-
-        title = self._og_search_title(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'url': video_url,
-            'thumbnail': thumbnail,
-        }
diff --git a/youtube_dl/extractor/dw.py b/youtube_dl/extractor/dw.py

index ae7c571bd3d06b41895d46f1df0dbd803170de0b..d740652f172c1dc9b61b19205af20b6721e10211 100644 (file)
--- a/youtube_dl/extractor/dw.py
+++ b/youtube_dl/extractor/dw.py
@@ -2,13 +2,16 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..utils import (
+    int_or_none,
+    unified_strdate,
+)
  from ..compat import compat_urlparse
  
  
  class DWIE(InfoExtractor):
      IE_NAME = 'dw'
-    _VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+av-(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+(?:av|e)-(?P<id>\d+)'
      _TESTS = [{
          # video
          'url': 'http://www.dw.com/en/intelligent-light/av-19112290',
@@ -31,6 +34,18 @@ class DWIE(InfoExtractor):
              'description': 'md5:bc9ca6e4e063361e21c920c53af12405',
              'upload_date': '20160311',
          }
+    }, {
+        # DW documentaries, only last for one or two weeks
+        'url': 'http://www.dw.com/en/documentaries-welcome-to-the-90s-2016-05-21/e-19220158-9798',
+        'md5': '56b6214ef463bfb9a3b71aeb886f3cf1',
+        'info_dict': {
+            'id': '19274438',
+            'ext': 'mp4',
+            'title': 'Welcome to the 90s – Hip Hop',
+            'description': 'Welcome to the 90s - The Golden Decade of Hip Hop',
+            'upload_date': '20160521',
+        },
+        'skip': 'Video removed',
      }]
  
      def _real_extract(self, url):
@@ -38,6 +53,7 @@ class DWIE(InfoExtractor):
          webpage = self._download_webpage(url, media_id)
          hidden_inputs = self._hidden_inputs(webpage)
          title = hidden_inputs['media_title']
+        media_id = hidden_inputs.get('media_id') or media_id
  
          if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
              formats = self._extract_smil_formats(
@@ -49,13 +65,20 @@ class DWIE(InfoExtractor):
          else:
              formats = [{'url': hidden_inputs['file_name']}]
  
+        upload_date = hidden_inputs.get('display_date')
+        if not upload_date:
+            upload_date = self._html_search_regex(
+                r'<span[^>]+class="date">([0-9.]+)\s*\|', webpage,
+                'upload date', default=None)
+            upload_date = unified_strdate(upload_date)
+
          return {
              'id': media_id,
              'title': title,
              'description': self._og_search_description(webpage),
              'thumbnail': hidden_inputs.get('preview_image'),
              'duration': int_or_none(hidden_inputs.get('file_duration')),
-            'upload_date': hidden_inputs.get('display_date'),
+            'upload_date': upload_date,
              'formats': formats,
          }
  
diff --git a/youtube_dl/extractor/eagleplatform.py b/youtube_dl/extractor/eagleplatform.py

index 7bbf617d468f9d5295ee2e1123452679a21a7e4b..c2f593eca201a42f7023cc64d4237b5052fbc722 100644 (file)
--- a/youtube_dl/extractor/eagleplatform.py
+++ b/youtube_dl/extractor/eagleplatform.py
@@ -4,6 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_HTTPError,
+    compat_str,
+)
  from ..utils import (
      ExtractorError,
      int_or_none,
@@ -21,7 +25,7 @@ class EaglePlatformIE(InfoExtractor):
      _TESTS = [{
          # http://lenta.ru/news/2015/03/06/navalny/
          'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201',
-        'md5': '70f5187fb620f2c1d503b3b22fd4efe3',
+        # Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
          'info_dict': {
              'id': '227304',
              'ext': 'mp4',
@@ -36,7 +40,7 @@ class EaglePlatformIE(InfoExtractor):
          # http://muz-tv.ru/play/7129/
          # http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true
          'url': 'eagleplatform:media.clipyou.ru:12820',
-        'md5': '90b26344ba442c8e44aa4cf8f301164a',
+        'md5': '358597369cf8ba56675c1df15e7af624',
          'info_dict': {
              'id': '12820',
              'ext': 'mp4',
@@ -48,15 +52,41 @@ class EaglePlatformIE(InfoExtractor):
          'skip': 'Georestricted',
      }]
  
+    @staticmethod
+    def _extract_url(webpage):
+        # Regular iframe embedding
+        mobj = re.search(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//.+?\.media\.eagleplatform\.com/index/player\?.+?)\1',
+            webpage)
+        if mobj is not None:
+            return mobj.group('url')
+        # Basic usage embedding (see http://dultonmedia.github.io/eplayer/)
+        mobj = re.search(
+            r'''(?xs)
+                    <script[^>]+
+                        src=(?P<q1>["\'])(?:https?:)?//(?P<host>.+?\.media\.eagleplatform\.com)/player/player\.js(?P=q1)
+                    .+?
+                    <div[^>]+
+                        class=(?P<q2>["\'])eagleplayer(?P=q2)[^>]+
+                        data-id=["\'](?P<id>\d+)
+            ''', webpage)
+        if mobj is not None:
+            return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
+
      @staticmethod
      def _handle_error(response):
          status = int_or_none(response.get('status', 200))
          if status != 200:
              raise ExtractorError(' '.join(response['errors']), expected=True)
  
-    def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'):
-        response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
-        self._handle_error(response)
+    def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', *args, **kwargs):
+        try:
+            response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
+        except ExtractorError as ee:
+            if isinstance(ee.cause, compat_HTTPError):
+                response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
+                self._handle_error(response)
+            raise
          return response
  
      def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'):
@@ -84,17 +114,42 @@ class EaglePlatformIE(InfoExtractor):
  
          secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:')
  
+        formats = []
+
          m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON')
-        formats = self._extract_m3u8_formats(
-            m3u8_url, video_id,
-            'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
+        m3u8_formats = self._extract_m3u8_formats(
+            m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+            m3u8_id='hls', fatal=False)
+        formats.extend(m3u8_formats)
+
+        m3u8_formats_dict = {}
+        for f in m3u8_formats:
+            if f.get('height') is not None:
+                m3u8_formats_dict[f['height']] = f
  
-        mp4_url = self._get_video_url(
+        mp4_data = self._download_json(
              # Secure mp4 URL is constructed according to Player.prototype.mp4 from
              # http://lentaru.media.eagleplatform.com/player/player.js
-            re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8),
-            video_id, 'Downloading mp4 JSON')
-        formats.append({'url': mp4_url, 'format_id': 'mp4'})
+            re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4s', secure_m3u8),
+            video_id, 'Downloading mp4 JSON', fatal=False)
+        if mp4_data:
+            for format_id, format_url in mp4_data.get('data', {}).items():
+                if not isinstance(format_url, compat_str):
+                    continue
+                height = int_or_none(format_id)
+                if height is not None and m3u8_formats_dict.get(height):
+                    f = m3u8_formats_dict[height].copy()
+                    f.update({
+                        'format_id': f['format_id'].replace('hls', 'http'),
+                        'protocol': 'http',
+                    })
+                else:
+                    f = {
+                        'format_id': 'http-%s' % format_id,
+                        'height': int_or_none(format_id),
+                    }
+                f['url'] = format_url
+                formats.append(f)
  
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/ebaumsworld.py b/youtube_dl/extractor/ebaumsworld.py

index b6bfd2b2dedc5388ef383a3cd8853bbb0c541f68..c97682cd367edebfd9fc6a476ad073cb03240054 100644 (file)
--- a/youtube_dl/extractor/ebaumsworld.py
+++ b/youtube_dl/extractor/ebaumsworld.py
@@ -4,10 +4,10 @@ from .common import InfoExtractor
  
  
  class EbaumsWorldIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?ebaumsworld\.com/videos/[^/]+/(?P<id>\d+)'
  
      _TEST = {
-        'url': 'http://www.ebaumsworld.com/video/watch/83367677/',
+        'url': 'http://www.ebaumsworld.com/videos/a-giant-python-opens-the-door/83367677/',
          'info_dict': {
              'id': '83367677',
              'ext': 'mp4',
diff --git a/youtube_dl/extractor/einthusan.py b/youtube_dl/extractor/einthusan.py

index f7339702cad3ed2804fe276b9d1fc6857c368206..443865ad27ba96eea8f78c56d14b72a54bc86389 100644 (file)
--- a/youtube_dl/extractor/einthusan.py
+++ b/youtube_dl/extractor/einthusan.py
@@ -14,7 +14,7 @@ class EinthusanIE(InfoExtractor):
      _TESTS = [
          {
              'url': 'http://www.einthusan.com/movies/watch.php?id=2447',
-            'md5': 'af244f4458cd667205e513d75da5b8b1',
+            'md5': 'd71379996ff5b7f217eca034c34e3461',
              'info_dict': {
                  'id': '2447',
                  'ext': 'mp4',
@@ -25,13 +25,13 @@ class EinthusanIE(InfoExtractor):
          },
          {
              'url': 'http://www.einthusan.com/movies/watch.php?id=1671',
-            'md5': 'ef63c7a803e22315880ed182c10d1c5c',
+            'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
              'info_dict': {
                  'id': '1671',
                  'ext': 'mp4',
                  'title': 'Soodhu Kavvuum',
                  'thumbnail': 're:^https?://.*\.jpg$',
-                'description': 'md5:05d8a0c0281a4240d86d76e14f2f4d51',
+                'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
              }
          },
      ]
@@ -50,9 +50,11 @@ class EinthusanIE(InfoExtractor):
          video_id = self._search_regex(
              r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id)
  
-        video_url = self._download_webpage(
+        m3u8_url = self._download_webpage(
              'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/'
-            % video_id, video_id)
+            % video_id, video_id, headers={'Referer': url})
+        formats = self._extract_m3u8_formats(
+            m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native')
  
          description = self._html_search_meta('description', webpage)
          thumbnail = self._html_search_regex(
@@ -64,7 +66,7 @@ class EinthusanIE(InfoExtractor):
          return {
              'id': video_id,
              'title': title,
-            'url': video_url,
+            'formats': formats,
              'thumbnail': thumbnail,
              'description': description,
          }
diff --git a/youtube_dl/extractor/eitb.py b/youtube_dl/extractor/eitb.py

index 713cb7b329208d3c761b12858cc265b401c16dd0..ee5ead18b0834b7c2e27258b4fc6950fa93ad960 100644 (file)
--- a/youtube_dl/extractor/eitb.py
+++ b/youtube_dl/extractor/eitb.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/ellentv.py b/youtube_dl/extractor/ellentv.py

index 4c8190d68d712bf702b5e015c95bbdda4643cefd..74bbc5c51c576880465e76453604891b6b5f48ca 100644 (file)
--- a/youtube_dl/extractor/ellentv.py
+++ b/youtube_dl/extractor/ellentv.py
@@ -6,12 +6,13 @@ import json
  from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
+    NO_DEFAULT,
  )
  
  
  class EllenTVIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
          'md5': '4294cf98bc165f218aaa0b89e0fd8042',
          'info_dict': {
@@ -22,24 +23,47 @@ class EllenTVIE(InfoExtractor):
              'timestamp': 1428035648,
              'upload_date': '20150403',
              'uploader_id': 'batchUser',
-        }
-    }
+        },
+    }, {
+        # not available via http://widgets.ellentube.com/
+        'url': 'http://www.ellentv.com/videos/1-szkgu2m2/',
+        'info_dict': {
+            'id': '1_szkgu2m2',
+            'ext': 'flv',
+            'title': "Ellen's Amazingly Talented Audience",
+            'description': 'md5:86ff1e376ff0d717d7171590e273f0a5',
+            'timestamp': 1255140900,
+            'upload_date': '20091010',
+            'uploader_id': 'ellenkaltura@gmail.com',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(
-            'http://widgets.ellentube.com/videos/%s' % video_id,
-            video_id)
+        URLS = ('http://widgets.ellentube.com/videos/%s' % video_id, url)
+
+        for num, url_ in enumerate(URLS, 1):
+            webpage = self._download_webpage(
+                url_, video_id, fatal=num == len(URLS))
+
+            default = NO_DEFAULT if num == len(URLS) else None
+
+            partner_id = self._search_regex(
+                r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id',
+                default=default)
  
-        partner_id = self._search_regex(
-            r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id')
+            kaltura_id = self._search_regex(
+                [r'id="kaltura_player_([^"]+)"',
+                 r"_wb_entry_id\s*:\s*'([^']+)",
+                 r'data-kaltura-entry-id="([^"]+)'],
+                webpage, 'kaltura id', default=default)
  
-        kaltura_id = self._search_regex(
-            [r'id="kaltura_player_([^"]+)"',
-             r"_wb_entry_id\s*:\s*'([^']+)",
-             r'data-kaltura-entry-id="([^"]+)'],
-            webpage, 'kaltura id')
+            if partner_id and kaltura_id:
+                break
  
          return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura')
  
diff --git a/youtube_dl/extractor/embedly.py b/youtube_dl/extractor/embedly.py

index 1cdb11e34804186e05cdca81d978ab944d49b4db..a5820b21e05a721fd654ff8c1d1313eb80239a73 100644 (file)
--- a/youtube_dl/extractor/embedly.py
+++ b/youtube_dl/extractor/embedly.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/engadget.py b/youtube_dl/extractor/engadget.py

index e5e57d48518d3dd3999dad650d0c32406079ce33..65635c18b7153ec188437f9c24cbe939c65304d7 100644 (file)
--- a/youtube_dl/extractor/engadget.py
+++ b/youtube_dl/extractor/engadget.py
@@ -4,9 +4,10 @@ from .common import InfoExtractor
  
  
  class EngadgetIE(InfoExtractor):
-    _VALID_URL = r'https?://www.engadget.com/video/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?engadget\.com/video/(?P<id>[^/?#]+)'
  
-    _TEST = {
+    _TESTS = [{
+        # video with 5min ID
          'url': 'http://www.engadget.com/video/518153925/',
          'md5': 'c6820d4828a5064447a4d9fc73f312c9',
          'info_dict': {
@@ -15,8 +16,12 @@ class EngadgetIE(InfoExtractor):
              'title': 'Samsung Galaxy Tab Pro 8.4 Review',
          },
          'add_ie': ['FiveMin'],
-    }
+    }, {
+        # video with vidible ID
+        'url': 'https://www.engadget.com/video/57a28462134aa15a39f0421a/',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        return self.url_result('5min:%s' % video_id)
+        return self.url_result('aol-video:%s' % video_id)
diff --git a/youtube_dl/extractor/eporner.py b/youtube_dl/extractor/eporner.py

index e006921ec3f8d2a0aff0e6bb0595148469b1c256..f3734e9f8984ab5a1a723bbb0be171c3fd9cf7b5 100644 (file)
--- a/youtube_dl/extractor/eporner.py
+++ b/youtube_dl/extractor/eporner.py
@@ -4,54 +4,100 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
+    encode_base_n,
+    ExtractorError,
+    int_or_none,
      parse_duration,
      str_to_int,
  )
  
  
  class EpornerIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\d+)/(?P<display_id>[\w-]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\w+)(?:/(?P<display_id>[\w-]+))?'
+    _TESTS = [{
          'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/',
          'md5': '39d486f046212d8e1b911c52ab4691f8',
          'info_dict': {
-            'id': '95008',
+            'id': 'qlDUmNsj6VS',
              'display_id': 'Infamous-Tiffany-Teen-Strip-Tease-Video',
              'ext': 'mp4',
              'title': 'Infamous Tiffany Teen Strip Tease Video',
              'duration': 1838,
              'view_count': int,
              'age_limit': 18,
-        }
-    }
+        },
+    }, {
+        # New (May 2016) URL layout
+        'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        display_id = mobj.group('display_id') or video_id
+
+        webpage, urlh = self._download_webpage_handle(url, display_id)
+
+        video_id = self._match_id(compat_str(urlh.geturl()))
+
+        hash = self._search_regex(
+            r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')
  
-        webpage = self._download_webpage(url, display_id)
-        title = self._html_search_regex(
-            r'<title>(.*?) - EPORNER', webpage, 'title')
+        title = self._og_search_title(webpage, default=None) or self._html_search_regex(
+            r'<title>(.+?) - EPORNER', webpage, 'title')
  
-        redirect_url = 'http://www.eporner.com/config5/%s' % video_id
-        player_code = self._download_webpage(
-            redirect_url, display_id, note='Downloading player config')
+        # Reverse engineered from vjs.js
+        def calc_hash(s):
+            return ''.join((encode_base_n(int(s[lb:lb + 8], 16), 36) for lb in range(0, 32, 8)))
  
-        sources = self._search_regex(
-            r'(?s)sources\s*:\s*\[\s*({.+?})\s*\]', player_code, 'sources')
+        video = self._download_json(
+            'http://www.eporner.com/xhr/video/%s' % video_id,
+            display_id, note='Downloading video JSON',
+            query={
+                'hash': calc_hash(hash),
+                'device': 'generic',
+                'domain': 'www.eporner.com',
+                'fallback': 'false',
+            })
+
+        if video.get('available') is False:
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, video['message']), expected=True)
+
+        sources = video['sources']
  
          formats = []
-        for video_url, format_id in re.findall(r'file\s*:\s*"([^"]+)",\s*label\s*:\s*"([^"]+)"', sources):
-            fmt = {
-                'url': video_url,
-                'format_id': format_id,
-            }
-            m = re.search(r'^(\d+)', format_id)
-            if m:
-                fmt['height'] = int(m.group(1))
-            formats.append(fmt)
+        for kind, formats_dict in sources.items():
+            if not isinstance(formats_dict, dict):
+                continue
+            for format_id, format_dict in formats_dict.items():
+                if not isinstance(format_dict, dict):
+                    continue
+                src = format_dict.get('src')
+                if not isinstance(src, compat_str) or not src.startswith('http'):
+                    continue
+                if kind == 'hls':
+                    formats.extend(self._extract_m3u8_formats(
+                        src, display_id, 'mp4', entry_protocol='m3u8_native',
+                        m3u8_id=kind, fatal=False))
+                else:
+                    height = int_or_none(self._search_regex(
+                        r'(\d+)[pP]', format_id, 'height', default=None))
+                    fps = int_or_none(self._search_regex(
+                        r'(\d+)fps', format_id, 'fps', default=None))
+
+                    formats.append({
+                        'url': src,
+                        'format_id': format_id,
+                        'height': height,
+                        'fps': fps,
+                    })
          self._sort_formats(formats)
  
          duration = parse_duration(self._html_search_meta('duration', webpage))
diff --git a/youtube_dl/extractor/espn.py b/youtube_dl/extractor/espn.py

index db4b263bcbf40a9cb133d2a9729e4fe07292bae3..8795e0ddf5e26f676a421173a0e1fd019cd112cb 100644 (file)
--- a/youtube_dl/extractor/espn.py
+++ b/youtube_dl/extractor/espn.py
@@ -1,36 +1,117 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import remove_end
+from ..compat import compat_str
+from ..utils import (
+    determine_ext,
+    int_or_none,
+    unified_timestamp,
+)
  
  
  class ESPNIE(InfoExtractor):
-    _VALID_URL = r'https?://espn\.go\.com/(?:[^/]+/)*(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/video/clip(?:\?.*?\bid=|/_/id/)(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://espn.go.com/video/clip?id=10365079',
          'info_dict': {
-            'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG',
+            'id': '10365079',
              'ext': 'mp4',
              'title': '30 for 30 Shorts: Judging Jewell',
-            'description': None,
+            'description': 'md5:39370c2e016cb4ecf498ffe75bef7f0f',
+            'timestamp': 1390936111,
+            'upload_date': '20140128',
          },
          'params': {
-            # m3u8 download
              'skip_download': True,
          },
      }, {
          # intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season
          'url': 'http://espn.go.com/video/clip?id=2743663',
          'info_dict': {
-            'id': '50NDFkeTqRHB0nXBOK-RGdSG5YQPuxHg',
+            'id': '2743663',
              'ext': 'mp4',
              'title': 'Must-See Moments: Best of the MLS season',
+            'description': 'md5:4c2d7232beaea572632bec41004f0aeb',
+            'timestamp': 1449446454,
+            'upload_date': '20151207',
          },
          'params': {
-            # m3u8 download
              'skip_download': True,
          },
+        'expected_warnings': ['Unable to download f4m manifest'],
      }, {
+        'url': 'http://www.espn.com/video/clip?id=10365079',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.espn.com/video/clip/_/id/17989860',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        clip = self._download_json(
+            'http://api-app.espn.com/v1/video/clips/%s' % video_id,
+            video_id)['videos'][0]
+
+        title = clip['headline']
+
+        format_urls = set()
+        formats = []
+
+        def traverse_source(source, base_source_id=None):
+            for source_id, source in source.items():
+                if isinstance(source, compat_str):
+                    extract_source(source, base_source_id)
+                elif isinstance(source, dict):
+                    traverse_source(
+                        source,
+                        '%s-%s' % (base_source_id, source_id)
+                        if base_source_id else source_id)
+
+        def extract_source(source_url, source_id=None):
+            if source_url in format_urls:
+                return
+            format_urls.add(source_url)
+            ext = determine_ext(source_url)
+            if ext == 'smil':
+                formats.extend(self._extract_smil_formats(
+                    source_url, video_id, fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    source_url, video_id, f4m_id=source_id, fatal=False))
+            elif ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    source_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id=source_id, fatal=False))
+            else:
+                formats.append({
+                    'url': source_url,
+                    'format_id': source_id,
+                })
+
+        traverse_source(clip['links']['source'])
+        self._sort_formats(formats)
+
+        description = clip.get('caption') or clip.get('description')
+        thumbnail = clip.get('thumbnail')
+        duration = int_or_none(clip.get('duration'))
+        timestamp = unified_timestamp(clip.get('originalPublishDate'))
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'timestamp': timestamp,
+            'duration': duration,
+            'formats': formats,
+        }
+
+
+class ESPNArticleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/(?:[^/]+/)*(?P<id>[^/]+)'
+    _TESTS = [{
          'url': 'https://espn.go.com/video/iframe/twitter/?cms=espn&id=10365079',
          'only_matching': True,
      }, {
@@ -47,6 +128,10 @@ class ESPNIE(InfoExtractor):
          'only_matching': True,
      }]
  
+    @classmethod
+    def suitable(cls, url):
+        return False if ESPNIE.suitable(url) else super(ESPNArticleIE, cls).suitable(url)
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
@@ -56,23 +141,5 @@ class ESPNIE(InfoExtractor):
              r'class=(["\']).*?video-play-button.*?\1[^>]+data-id=["\'](?P<id>\d+)',
              webpage, 'video id', group='id')
  
-        cms = 'espn'
-        if 'data-source="intl"' in webpage:
-            cms = 'intl'
-        player_url = 'https://espn.go.com/video/iframe/twitter/?id=%s&cms=%s' % (video_id, cms)
-        player = self._download_webpage(
-            player_url, video_id)
-
-        pcode = self._search_regex(
-            r'["\']pcode=([^"\']+)["\']', player, 'pcode')
-
-        title = remove_end(
-            self._og_search_title(webpage),
-            '- ESPN Video').strip()
-
-        return {
-            '_type': 'url_transparent',
-            'url': 'ooyalaexternal:%s:%s:%s' % (cms, video_id, pcode),
-            'ie_key': 'OoyalaExternal',
-            'title': title,
-        }
+        return self.url_result(
+            'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key())
diff --git a/youtube_dl/extractor/exfm.py b/youtube_dl/extractor/exfm.py

deleted file mode 100644 (file)

index 09ed4f2..0000000
--- a/youtube_dl/extractor/exfm.py
+++ /dev/null
@@ -1,58 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class ExfmIE(InfoExtractor):
-    IE_NAME = 'exfm'
-    IE_DESC = 'ex.fm'
-    _VALID_URL = r'https?://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
-    _SOUNDCLOUD_URL = r'http://(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
-    _TESTS = [
-        {
-            'url': 'http://ex.fm/song/eh359',
-            'md5': 'e45513df5631e6d760970b14cc0c11e7',
-            'info_dict': {
-                'id': '44216187',
-                'ext': 'mp3',
-                'title': 'Test House "Love Is Not Enough" (Extended Mix) DeadJournalist Exclusive',
-                'uploader': 'deadjournalist',
-                'upload_date': '20120424',
-                'description': 'Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive',
-            },
-            'note': 'Soundcloud song',
-            'skip': 'The site is down too often',
-        },
-        {
-            'url': 'http://ex.fm/song/wddt8',
-            'md5': '966bd70741ac5b8570d8e45bfaed3643',
-            'info_dict': {
-                'id': 'wddt8',
-                'ext': 'mp3',
-                'title': 'Safe and Sound',
-                'uploader': 'Capital Cities',
-            },
-            'skip': 'The site is down too often',
-        },
-    ]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        song_id = mobj.group('id')
-        info_url = 'http://ex.fm/api/v3/song/%s' % song_id
-        info = self._download_json(info_url, song_id)['song']
-        song_url = info['url']
-        if re.match(self._SOUNDCLOUD_URL, song_url) is not None:
-            self.to_screen('Soundcloud song detected')
-            return self.url_result(song_url.replace('/stream', ''), 'Soundcloud')
-        return {
-            'id': song_id,
-            'url': song_url,
-            'ext': 'mp3',
-            'title': info['title'],
-            'thumbnail': info['image']['large'],
-            'uploader': info['artist'],
-            'view_count': info['loved_count'],
-        }
diff --git a/youtube_dl/extractor/expotv.py b/youtube_dl/extractor/expotv.py

index 1585a03bb9235a520c63e84c306294f6958dfe13..ef11962f35035617a589e91cde5db43659099f66 100644 (file)
--- a/youtube_dl/extractor/expotv.py
+++ b/youtube_dl/extractor/expotv.py
@@ -1,7 +1,5 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
@@ -10,25 +8,24 @@ from ..utils import (
  
  
  class ExpoTVIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
+    _VALID_URL = r'https?://(?:www\.)?expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
      _TEST = {
-        'url': 'http://www.expotv.com/videos/reviews/1/24/LinneCardscom/17561',
-        'md5': '2985e6d7a392b2f7a05e0ca350fe41d0',
+        'url': 'http://www.expotv.com/videos/reviews/3/40/NYX-Butter-lipstick/667916',
+        'md5': 'fe1d728c3a813ff78f595bc8b7a707a8',
          'info_dict': {
-            'id': '17561',
+            'id': '667916',
              'ext': 'mp4',
-            'upload_date': '20060212',
-            'title': 'My Favorite Online Scrapbook Store',
-            'view_count': int,
-            'description': 'You\'ll find most everything you need at this virtual store front.',
-            'uploader': 'Anna T.',
+            'title': 'NYX Butter Lipstick Little Susie',
+            'description': 'Goes on like butter, but looks better!',
              'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'Stephanie S.',
+            'upload_date': '20150520',
+            'view_count': int,
          }
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
          player_key = self._search_regex(
@@ -66,7 +63,7 @@ class ExpoTVIE(InfoExtractor):
              fatal=False)
          upload_date = unified_strdate(self._search_regex(
              r'<h5>Reviewed on ([0-9/.]+)</h5>', webpage, 'upload date',
-            fatal=False))
+            fatal=False), day_first=False)
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py

new file mode 100644 (file)

index 0000000..9107f0b
--- /dev/null
+++ b/youtube_dl/extractor/extractors.py
@@ -0,0 +1,1199 @@
+# flake8: noqa
+from __future__ import unicode_literals
+
+from .abc import (
+    ABCIE,
+    ABCIViewIE,
+)
+from .abcnews import (
+    AbcNewsIE,
+    AbcNewsVideoIE,
+)
+from .abcotvs import (
+    ABCOTVSIE,
+    ABCOTVSClipsIE,
+)
+from .academicearth import AcademicEarthCourseIE
+from .acast import (
+    ACastIE,
+    ACastChannelIE,
+)
+from .addanime import AddAnimeIE
+from .adobetv import (
+    AdobeTVIE,
+    AdobeTVShowIE,
+    AdobeTVChannelIE,
+    AdobeTVVideoIE,
+)
+from .adultswim import AdultSwimIE
+from .aenetworks import (
+    AENetworksIE,
+    HistoryTopicIE,
+)
+from .afreecatv import AfreecaTVIE
+from .airmozilla import AirMozillaIE
+from .aljazeera import AlJazeeraIE
+from .alphaporno import AlphaPornoIE
+from .amcnetworks import AMCNetworksIE
+from .animeondemand import AnimeOnDemandIE
+from .anitube import AnitubeIE
+from .anysex import AnySexIE
+from .aol import (
+    AolIE,
+    AolFeaturesIE,
+)
+from .allocine import AllocineIE
+from .aparat import AparatIE
+from .appleconnect import AppleConnectIE
+from .appletrailers import (
+    AppleTrailersIE,
+    AppleTrailersSectionIE,
+)
+from .archiveorg import ArchiveOrgIE
+from .arkena import ArkenaIE
+from .ard import (
+    ARDIE,
+    ARDMediathekIE,
+)
+from .arte import (
+    ArteTvIE,
+    ArteTVPlus7IE,
+    ArteTVCreativeIE,
+    ArteTVConcertIE,
+    ArteTVInfoIE,
+    ArteTVFutureIE,
+    ArteTVCinemaIE,
+    ArteTVDDCIE,
+    ArteTVMagazineIE,
+    ArteTVEmbedIE,
+    TheOperaPlatformIE,
+    ArteTVPlaylistIE,
+)
+from .atresplayer import AtresPlayerIE
+from .atttechchannel import ATTTechChannelIE
+from .audimedia import AudiMediaIE
+from .audioboom import AudioBoomIE
+from .audiomack import AudiomackIE, AudiomackAlbumIE
+from .awaan import (
+    AWAANIE,
+    AWAANVideoIE,
+    AWAANLiveIE,
+    AWAANSeasonIE,
+)
+from .azubu import AzubuIE, AzubuLiveIE
+from .baidu import BaiduVideoIE
+from .bambuser import BambuserIE, BambuserChannelIE
+from .bandcamp import BandcampIE, BandcampAlbumIE
+from .bbc import (
+    BBCCoUkIE,
+    BBCCoUkArticleIE,
+    BBCCoUkIPlayerPlaylistIE,
+    BBCCoUkPlaylistIE,
+    BBCIE,
+)
+from .beeg import BeegIE
+from .behindkink import BehindKinkIE
+from .bellmedia import BellMediaIE
+from .beatport import BeatportIE
+from .bet import BetIE
+from .bigflix import BigflixIE
+from .bild import BildIE
+from .bilibili import BiliBiliIE
+from .biobiochiletv import BioBioChileTVIE
+from .biqle import BIQLEIE
+from .bleacherreport import (
+    BleacherReportIE,
+    BleacherReportCMSIE,
+)
+from .blinkx import BlinkxIE
+from .bloomberg import BloombergIE
+from .bokecc import BokeCCIE
+from .bpb import BpbIE
+from .br import BRIE
+from .bravotv import BravoTVIE
+from .breakcom import BreakIE
+from .brightcove import (
+    BrightcoveLegacyIE,
+    BrightcoveNewIE,
+)
+from .buzzfeed import BuzzFeedIE
+from .byutv import (
+    BYUtvIE,
+    BYUtvEventIE,
+)
+from .c56 import C56IE
+from .camdemy import (
+    CamdemyIE,
+    CamdemyFolderIE
+)
+from .camwithher import CamWithHerIE
+from .canalplus import CanalplusIE
+from .canalc2 import Canalc2IE
+from .canvas import CanvasIE
+from .carambatv import (
+    CarambaTVIE,
+    CarambaTVPageIE,
+)
+from .cartoonnetwork import CartoonNetworkIE
+from .cbc import (
+    CBCIE,
+    CBCPlayerIE,
+    CBCWatchVideoIE,
+    CBCWatchIE,
+)
+from .cbs import CBSIE
+from .cbslocal import CBSLocalIE
+from .cbsinteractive import CBSInteractiveIE
+from .cbsnews import (
+    CBSNewsIE,
+    CBSNewsLiveVideoIE,
+)
+from .cbssports import CBSSportsIE
+from .ccc import CCCIE
+from .cctv import CCTVIE
+from .cda import CDAIE
+from .ceskatelevize import CeskaTelevizeIE
+from .channel9 import Channel9IE
+from .charlierose import CharlieRoseIE
+from .chaturbate import ChaturbateIE
+from .chilloutzone import ChilloutzoneIE
+from .chirbit import (
+    ChirbitIE,
+    ChirbitProfileIE,
+)
+from .cinchcast import CinchcastIE
+from .clipfish import ClipfishIE
+from .cliphunter import CliphunterIE
+from .cliprs import ClipRsIE
+from .clipsyndicate import ClipsyndicateIE
+from .closertotruth import CloserToTruthIE
+from .cloudy import CloudyIE
+from .clubic import ClubicIE
+from .clyp import ClypIE
+from .cmt import CMTIE
+from .cnbc import CNBCIE
+from .cnn import (
+    CNNIE,
+    CNNBlogsIE,
+    CNNArticleIE,
+)
+from .coub import CoubIE
+from .collegerama import CollegeRamaIE
+from .comedycentral import (
+    ComedyCentralIE,
+    ComedyCentralShortnameIE,
+    ComedyCentralTVIE,
+    ToshIE,
+)
+from .comcarcoff import ComCarCoffIE
+from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
+from .commonprotocols import (
+    MmsIE,
+    RtmpIE,
+)
+from .condenast import CondeNastIE
+from .cracked import CrackedIE
+from .crackle import CrackleIE
+from .criterion import CriterionIE
+from .crooksandliars import CrooksAndLiarsIE
+from .crunchyroll import (
+    CrunchyrollIE,
+    CrunchyrollShowPlaylistIE
+)
+from .cspan import CSpanIE
+from .ctsnews import CtsNewsIE
+from .ctvnews import CTVNewsIE
+from .cultureunplugged import CultureUnpluggedIE
+from .curiositystream import (
+    CuriosityStreamIE,
+    CuriosityStreamCollectionIE,
+)
+from .cwtv import CWTVIE
+from .dailymail import DailyMailIE
+from .dailymotion import (
+    DailymotionIE,
+    DailymotionPlaylistIE,
+    DailymotionUserIE,
+    DailymotionCloudIE,
+)
+from .daum import (
+    DaumIE,
+    DaumClipIE,
+    DaumPlaylistIE,
+    DaumUserIE,
+)
+from .dbtv import DBTVIE
+from .dctp import DctpTvIE
+from .deezer import DeezerPlaylistIE
+from .democracynow import DemocracynowIE
+from .dfb import DFBIE
+from .dhm import DHMIE
+from .dotsub import DotsubIE
+from .douyutv import DouyuTVIE
+from .dplay import DPlayIE
+from .dramafever import (
+    DramaFeverIE,
+    DramaFeverSeriesIE,
+)
+from .dreisat import DreiSatIE
+from .drbonanza import DRBonanzaIE
+from .drtuber import DrTuberIE
+from .drtv import DRTVIE
+from .dvtv import DVTVIE
+from .dumpert import DumpertIE
+from .defense import DefenseGouvFrIE
+from .discovery import DiscoveryIE
+from .discoverygo import DiscoveryGoIE
+from .dispeak import DigitallySpeakingIE
+from .dropbox import DropboxIE
+from .dw import (
+    DWIE,
+    DWArticleIE,
+)
+from .eagleplatform import EaglePlatformIE
+from .ebaumsworld import EbaumsWorldIE
+from .echomsk import EchoMskIE
+from .ehow import EHowIE
+from .eighttracks import EightTracksIE
+from .einthusan import EinthusanIE
+from .eitb import EitbIE
+from .ellentv import (
+    EllenTVIE,
+    EllenTVClipsIE,
+)
+from .elpais import ElPaisIE
+from .embedly import EmbedlyIE
+from .engadget import EngadgetIE
+from .eporner import EpornerIE
+from .eroprofile import EroProfileIE
+from .escapist import EscapistIE
+from .espn import (
+    ESPNIE,
+    ESPNArticleIE,
+)
+from .esri import EsriVideoIE
+from .europa import EuropaIE
+from .everyonesmixtape import EveryonesMixtapeIE
+from .expotv import ExpoTVIE
+from .extremetube import ExtremeTubeIE
+from .eyedotv import EyedoTVIE
+from .facebook import (
+    FacebookIE,
+    FacebookPluginsVideoIE,
+)
+from .faz import FazIE
+from .fc2 import (
+    FC2IE,
+    FC2EmbedIE,
+)
+from .fczenit import FczenitIE
+from .firstpost import FirstpostIE
+from .firsttv import FirstTVIE
+from .fivemin import FiveMinIE
+from .fivetv import FiveTVIE
+from .fktv import FKTVIE
+from .flickr import FlickrIE
+from .flipagram import FlipagramIE
+from .folketinget import FolketingetIE
+from .footyroom import FootyRoomIE
+from .formula1 import Formula1IE
+from .fourtube import FourTubeIE
+from .fox import FOXIE
+from .fox9 import FOX9IE
+from .foxgay import FoxgayIE
+from .foxnews import (
+    FoxNewsIE,
+    FoxNewsArticleIE,
+    FoxNewsInsiderIE,
+)
+from .foxsports import FoxSportsIE
+from .franceculture import FranceCultureIE
+from .franceinter import FranceInterIE
+from .francetv import (
+    PluzzIE,
+    FranceTvInfoIE,
+    FranceTVIE,
+    GenerationQuoiIE,
+    CultureboxIE,
+)
+from .freesound import FreesoundIE
+from .freespeech import FreespeechIE
+from .freevideo import FreeVideoIE
+from .funimation import FunimationIE
+from .funnyordie import FunnyOrDieIE
+from .fusion import FusionIE
+from .fxnetworks import FXNetworksIE
+from .gameinformer import GameInformerIE
+from .gameone import (
+    GameOneIE,
+    GameOnePlaylistIE,
+)
+from .gamersyde import GamersydeIE
+from .gamespot import GameSpotIE
+from .gamestar import GameStarIE
+from .gazeta import GazetaIE
+from .gdcvault import GDCVaultIE
+from .generic import GenericIE
+from .gfycat import GfycatIE
+from .giantbomb import GiantBombIE
+from .giga import GigaIE
+from .glide import GlideIE
+from .globo import (
+    GloboIE,
+    GloboArticleIE,
+)
+from .go import GoIE
+from .godtube import GodTubeIE
+from .godtv import GodTVIE
+from .golem import GolemIE
+from .googledrive import GoogleDriveIE
+from .googleplus import GooglePlusIE
+from .googlesearch import GoogleSearchIE
+from .goshgay import GoshgayIE
+from .gputechconf import GPUTechConfIE
+from .groupon import GrouponIE
+from .hark import HarkIE
+from .hbo import (
+    HBOIE,
+    HBOEpisodeIE,
+)
+from .hearthisat import HearThisAtIE
+from .heise import HeiseIE
+from .hellporno import HellPornoIE
+from .helsinki import HelsinkiIE
+from .hentaistigma import HentaiStigmaIE
+from .hgtv import (
+    HGTVIE,
+    HGTVComShowIE,
+)
+from .historicfilms import HistoricFilmsIE
+from .hitbox import HitboxIE, HitboxLiveIE
+from .hornbunny import HornBunnyIE
+from .hotnewhiphop import HotNewHipHopIE
+from .hotstar import HotStarIE
+from .howcast import HowcastIE
+from .howstuffworks import HowStuffWorksIE
+from .hrti import (
+    HRTiIE,
+    HRTiPlaylistIE,
+)
+from .huajiao import HuajiaoIE
+from .huffpost import HuffPostIE
+from .hypem import HypemIE
+from .iconosquare import IconosquareIE
+from .ign import (
+    IGNIE,
+    OneUPIE,
+    PCMagIE,
+)
+from .imdb import (
+    ImdbIE,
+    ImdbListIE
+)
+from .imgur import (
+    ImgurIE,
+    ImgurAlbumIE,
+)
+from .ina import InaIE
+from .indavideo import (
+    IndavideoIE,
+    IndavideoEmbedIE,
+)
+from .infoq import InfoQIE
+from .instagram import InstagramIE, InstagramUserIE
+from .internetvideoarchive import InternetVideoArchiveIE
+from .iprima import IPrimaIE
+from .iqiyi import IqiyiIE
+from .ir90tv import Ir90TvIE
+from .ivi import (
+    IviIE,
+    IviCompilationIE
+)
+from .ivideon import IvideonIE
+from .iwara import IwaraIE
+from .izlesene import IzleseneIE
+from .jamendo import (
+    JamendoIE,
+    JamendoAlbumIE,
+)
+from .jeuxvideo import JeuxVideoIE
+from .jove import JoveIE
+from .jwplatform import JWPlatformIE
+from .jpopsukitv import JpopsukiIE
+from .kaltura import KalturaIE
+from .kamcord import KamcordIE
+from .kanalplay import KanalPlayIE
+from .kankan import KankanIE
+from .karaoketv import KaraoketvIE
+from .karrierevideos import KarriereVideosIE
+from .keezmovies import KeezMoviesIE
+from .ketnet import KetnetIE
+from .khanacademy import KhanAcademyIE
+from .kickstarter import KickStarterIE
+from .keek import KeekIE
+from .konserthusetplay import KonserthusetPlayIE
+from .kontrtube import KontrTubeIE
+from .krasview import KrasViewIE
+from .ku6 import Ku6IE
+from .kusi import KUSIIE
+from .kuwo import (
+    KuwoIE,
+    KuwoAlbumIE,
+    KuwoChartIE,
+    KuwoSingerIE,
+    KuwoCategoryIE,
+    KuwoMvIE,
+)
+from .la7 import LA7IE
+from .laola1tv import Laola1TvIE
+from .lci import LCIIE
+from .lcp import (
+    LcpPlayIE,
+    LcpIE,
+)
+from .learnr import LearnrIE
+from .lecture2go import Lecture2GoIE
+from .lego import LEGOIE
+from .lemonde import LemondeIE
+from .leeco import (
+    LeIE,
+    LePlaylistIE,
+    LetvCloudIE,
+)
+from .libraryofcongress import LibraryOfCongressIE
+from .libsyn import LibsynIE
+from .lifenews import (
+    LifeNewsIE,
+    LifeEmbedIE,
+)
+from .limelight import (
+    LimelightMediaIE,
+    LimelightChannelIE,
+    LimelightChannelListIE,
+)
+from .litv import LiTVIE
+from .liveleak import LiveLeakIE
+from .livestream import (
+    LivestreamIE,
+    LivestreamOriginalIE,
+    LivestreamShortenerIE,
+)
+from .lnkgo import LnkGoIE
+from .localnews8 import LocalNews8IE
+from .lovehomeporn import LoveHomePornIE
+from .lrt import LRTIE
+from .lynda import (
+    LyndaIE,
+    LyndaCourseIE
+)
+from .m6 import M6IE
+from .macgamestore import MacGameStoreIE
+from .mailru import MailRuIE
+from .makerschannel import MakersChannelIE
+from .makertv import MakerTVIE
+from .mangomolo import (
+    MangomoloVideoIE,
+    MangomoloLiveIE,
+)
+from .matchtv import MatchTVIE
+from .mdr import MDRIE
+from .meta import METAIE
+from .metacafe import MetacafeIE
+from .metacritic import MetacriticIE
+from .mgoon import MgoonIE
+from .mgtv import MGTVIE
+from .miaopai import MiaoPaiIE
+from .microsoftvirtualacademy import (
+    MicrosoftVirtualAcademyIE,
+    MicrosoftVirtualAcademyCourseIE,
+)
+from .minhateca import MinhatecaIE
+from .ministrygrid import MinistryGridIE
+from .minoto import MinotoIE
+from .miomio import MioMioIE
+from .mit import TechTVMITIE, MITIE, OCWMITIE
+from .mitele import MiTeleIE
+from .mixcloud import (
+    MixcloudIE,
+    MixcloudUserIE,
+    MixcloudPlaylistIE,
+    MixcloudStreamIE,
+)
+from .mlb import MLBIE
+from .mnet import MnetIE
+from .mpora import MporaIE
+from .moevideo import MoeVideoIE
+from .mofosex import MofosexIE
+from .mojvideo import MojvideoIE
+from .moniker import MonikerIE
+from .morningstar import MorningstarIE
+from .motherless import MotherlessIE
+from .motorsport import MotorsportIE
+from .movieclips import MovieClipsIE
+from .moviezine import MoviezineIE
+from .movingimage import MovingImageIE
+from .msn import MSNIE
+from .mtv import (
+    MTVIE,
+    MTVVideoIE,
+    MTVServicesEmbeddedIE,
+    MTVDEIE,
+)
+from .muenchentv import MuenchenTVIE
+from .musicplayon import MusicPlayOnIE
+from .mwave import MwaveIE, MwaveMeetGreetIE
+from .myspace import MySpaceIE, MySpaceAlbumIE
+from .myspass import MySpassIE
+from .myvi import MyviIE
+from .myvideo import MyVideoIE
+from .myvidster import MyVidsterIE
+from .nationalgeographic import (
+    NationalGeographicVideoIE,
+    NationalGeographicIE,
+    NationalGeographicEpisodeGuideIE,
+)
+from .naver import NaverIE
+from .nba import NBAIE
+from .nbc import (
+    CSNNEIE,
+    NBCIE,
+    NBCNewsIE,
+    NBCOlympicsIE,
+    NBCSportsIE,
+    NBCSportsVPlayerIE,
+)
+from .ndr import (
+    NDRIE,
+    NJoyIE,
+    NDREmbedBaseIE,
+    NDREmbedIE,
+    NJoyEmbedIE,
+)
+from .ndtv import NDTVIE
+from .netzkino import NetzkinoIE
+from .nerdcubed import NerdCubedFeedIE
+from .neteasemusic import (
+    NetEaseMusicIE,
+    NetEaseMusicAlbumIE,
+    NetEaseMusicSingerIE,
+    NetEaseMusicListIE,
+    NetEaseMusicMvIE,
+    NetEaseMusicProgramIE,
+    NetEaseMusicDjRadioIE,
+)
+from .newgrounds import NewgroundsIE
+from .newstube import NewstubeIE
+from .nextmedia import (
+    NextMediaIE,
+    NextMediaActionNewsIE,
+    AppleDailyIE,
+)
+from .nfb import NFBIE
+from .nfl import NFLIE
+from .nhk import NhkVodIE
+from .nhl import (
+    NHLVideocenterIE,
+    NHLNewsIE,
+    NHLVideocenterCategoryIE,
+    NHLIE,
+)
+from .nick import (
+    NickIE,
+    NickDeIE,
+    NickNightIE,
+)
+from .niconico import NiconicoIE, NiconicoPlaylistIE
+from .ninecninemedia import (
+    NineCNineMediaStackIE,
+    NineCNineMediaIE,
+)
+from .ninegag import NineGagIE
+from .ninenow import NineNowIE
+from .nintendo import NintendoIE
+from .nobelprize import NobelPrizeIE
+from .noco import NocoIE
+from .normalboots import NormalbootsIE
+from .nosvideo import NosVideoIE
+from .nova import NovaIE
+from .novamov import (
+    AuroraVidIE,
+    CloudTimeIE,
+    NowVideoIE,
+    VideoWeedIE,
+    WholeCloudIE,
+)
+from .nowness import (
+    NownessIE,
+    NownessPlaylistIE,
+    NownessSeriesIE,
+)
+from .nowtv import (
+    NowTVIE,
+    NowTVListIE,
+)
+from .noz import NozIE
+from .npo import (
+    AndereTijdenIE,
+    NPOIE,
+    NPOLiveIE,
+    NPORadioIE,
+    NPORadioFragmentIE,
+    SchoolTVIE,
+    VPROIE,
+    WNLIE,
+)
+from .npr import NprIE
+from .nrk import (
+    NRKIE,
+    NRKPlaylistIE,
+    NRKSkoleIE,
+    NRKTVIE,
+)
+from .ntvde import NTVDeIE
+from .ntvru import NTVRuIE
+from .nytimes import (
+    NYTimesIE,
+    NYTimesArticleIE,
+)
+from .nuvid import NuvidIE
+from .nzz import NZZIE
+from .odatv import OdaTVIE
+from .odnoklassniki import OdnoklassnikiIE
+from .oktoberfesttv import OktoberfestTVIE
+from .onet import (
+    OnetIE,
+    OnetChannelIE,
+)
+from .onionstudios import OnionStudiosIE
+from .ooyala import (
+    OoyalaIE,
+    OoyalaExternalIE,
+)
+from .openload import OpenloadIE
+from .ora import OraTVIE
+from .orf import (
+    ORFTVthekIE,
+    ORFOE1IE,
+    ORFFM4IE,
+    ORFIPTVIE,
+)
+from .pandatv import PandaTVIE
+from .pandoratv import PandoraTVIE
+from .parliamentliveuk import ParliamentLiveUKIE
+from .patreon import PatreonIE
+from .pbs import PBSIE
+from .people import PeopleIE
+from .periscope import (
+    PeriscopeIE,
+    PeriscopeUserIE,
+)
+from .philharmoniedeparis import PhilharmonieDeParisIE
+from .phoenix import PhoenixIE
+from .photobucket import PhotobucketIE
+from .pinkbike import PinkbikeIE
+from .pladform import PladformIE
+from .playfm import PlayFMIE
+from .plays import PlaysTVIE
+from .playtvak import PlaytvakIE
+from .playvid import PlayvidIE
+from .playwire import PlaywireIE
+from .pluralsight import (
+    PluralsightIE,
+    PluralsightCourseIE,
+)
+from .podomatic import PodomaticIE
+from .pokemon import PokemonIE
+from .polskieradio import (
+    PolskieRadioIE,
+    PolskieRadioCategoryIE,
+)
+from .porn91 import Porn91IE
+from .porncom import PornComIE
+from .pornhd import PornHdIE
+from .pornhub import (
+    PornHubIE,
+    PornHubPlaylistIE,
+    PornHubUserVideosIE,
+)
+from .pornotube import PornotubeIE
+from .pornovoisines import PornoVoisinesIE
+from .pornoxo import PornoXOIE
+from .presstv import PressTVIE
+from .primesharetv import PrimeShareTVIE
+from .promptfile import PromptFileIE
+from .prosiebensat1 import ProSiebenSat1IE
+from .puls4 import Puls4IE
+from .pyvideo import PyvideoIE
+from .qqmusic import (
+    QQMusicIE,
+    QQMusicSingerIE,
+    QQMusicAlbumIE,
+    QQMusicToplistIE,
+    QQMusicPlaylistIE,
+)
+from .r7 import (
+    R7IE,
+    R7ArticleIE,
+)
+from .radiocanada import (
+    RadioCanadaIE,
+    RadioCanadaAudioVideoIE,
+)
+from .radiode import RadioDeIE
+from .radiojavan import RadioJavanIE
+from .radiobremen import RadioBremenIE
+from .radiofrance import RadioFranceIE
+from .rai import (
+    RaiTVIE,
+    RaiIE,
+)
+from .rbmaradio import RBMARadioIE
+from .rds import RDSIE
+from .redtube import RedTubeIE
+from .regiotv import RegioTVIE
+from .rentv import (
+    RENTVIE,
+    RENTVArticleIE,
+)
+from .restudy import RestudyIE
+from .reuters import ReutersIE
+from .reverbnation import ReverbNationIE
+from .revision3 import (
+    Revision3EmbedIE,
+    Revision3IE,
+)
+from .rice import RICEIE
+from .ringtv import RingTVIE
+from .rmcdecouverte import RMCDecouverteIE
+from .ro220 import Ro220IE
+from .rockstargames import RockstarGamesIE
+from .roosterteeth import RoosterTeethIE
+from .rottentomatoes import RottenTomatoesIE
+from .roxwel import RoxwelIE
+from .rozhlas import RozhlasIE
+from .rtbf import RTBFIE
+from .rte import RteIE, RteRadioIE
+from .rtlnl import RtlNlIE
+from .rtl2 import RTL2IE
+from .rtp import RTPIE
+from .rts import RTSIE
+from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
+from .rtvnh import RTVNHIE
+from .rudo import RudoIE
+from .ruhd import RUHDIE
+from .ruleporn import RulePornIE
+from .rutube import (
+    RutubeIE,
+    RutubeChannelIE,
+    RutubeEmbedIE,
+    RutubeMovieIE,
+    RutubePersonIE,
+)
+from .rutv import RUTVIE
+from .ruutu import RuutuIE
+from .sandia import SandiaIE
+from .safari import (
+    SafariIE,
+    SafariApiIE,
+    SafariCourseIE,
+)
+from .sapo import SapoIE
+from .savefrom import SaveFromIE
+from .sbs import SBSIE
+from .scivee import SciVeeIE
+from .screencast import ScreencastIE
+from .screencastomatic import ScreencastOMaticIE
+from .screenjunkies import ScreenJunkiesIE
+from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
+from .seeker import SeekerIE
+from .senateisvp import SenateISVPIE
+from .sendtonews import SendtoNewsIE
+from .servingsys import ServingSysIE
+from .sexu import SexuIE
+from .shahid import ShahidIE
+from .shared import (
+    SharedIE,
+    VivoIE,
+)
+from .sharesix import ShareSixIE
+from .sina import SinaIE
+from .sixplay import SixPlayIE
+from .skynewsarabia import (
+    SkyNewsArabiaIE,
+    SkyNewsArabiaArticleIE,
+)
+from .skysports import SkySportsIE
+from .slideshare import SlideshareIE
+from .slutload import SlutloadIE
+from .smotri import (
+    SmotriIE,
+    SmotriCommunityIE,
+    SmotriUserIE,
+    SmotriBroadcastIE,
+)
+from .snotr import SnotrIE
+from .sohu import SohuIE
+from .sonyliv import SonyLIVIE
+from .soundcloud import (
+    SoundcloudIE,
+    SoundcloudSetIE,
+    SoundcloudUserIE,
+    SoundcloudPlaylistIE,
+    SoundcloudSearchIE
+)
+from .soundgasm import (
+    SoundgasmIE,
+    SoundgasmProfileIE
+)
+from .southpark import (
+    SouthParkIE,
+    SouthParkDeIE,
+    SouthParkDkIE,
+    SouthParkEsIE,
+    SouthParkNlIE
+)
+from .spankbang import SpankBangIE
+from .spankwire import SpankwireIE
+from .spiegel import SpiegelIE, SpiegelArticleIE
+from .spiegeltv import SpiegeltvIE
+from .spike import SpikeIE
+from .stitcher import StitcherIE
+from .sport5 import Sport5IE
+from .sportbox import (
+    SportBoxIE,
+    SportBoxEmbedIE,
+)
+from .sportdeutschland import SportDeutschlandIE
+from .sportschau import SportschauIE
+from .srgssr import (
+    SRGSSRIE,
+    SRGSSRPlayIE,
+)
+from .srmediathek import SRMediathekIE
+from .stanfordoc import StanfordOpenClassroomIE
+from .steam import SteamIE
+from .streamable import StreamableIE
+from .streamcloud import StreamcloudIE
+from .streamcz import StreamCZIE
+from .streetvoice import StreetVoiceIE
+from .sunporno import SunPornoIE
+from .svt import (
+    SVTIE,
+    SVTPlayIE,
+)
+from .swrmediathek import SWRMediathekIE
+from .syfy import SyfyIE
+from .sztvhu import SztvHuIE
+from .tagesschau import (
+    TagesschauPlayerIE,
+    TagesschauIE,
+)
+from .tass import TassIE
+from .tbs import TBSIE
+from .tdslifeway import TDSLifewayIE
+from .teachertube import (
+    TeacherTubeIE,
+    TeacherTubeUserIE,
+)
+from .teachingchannel import TeachingChannelIE
+from .teamcoco import TeamcocoIE
+from .techtalks import TechTalksIE
+from .ted import TEDIE
+from .tele13 import Tele13IE
+from .telebruxelles import TeleBruxellesIE
+from .telecinco import TelecincoIE
+from .telegraaf import TelegraafIE
+from .telemb import TeleMBIE
+from .telequebec import TeleQuebecIE
+from .teletask import TeleTaskIE
+from .telewebion import TelewebionIE
+from .testurl import TestURLIE
+from .tf1 import TF1IE
+from .tfo import TFOIE
+from .theintercept import TheInterceptIE
+from .theplatform import (
+    ThePlatformIE,
+    ThePlatformFeedIE,
+)
+from .thescene import TheSceneIE
+from .thesixtyone import TheSixtyOneIE
+from .thestar import TheStarIE
+from .theweatherchannel import TheWeatherChannelIE
+from .thisamericanlife import ThisAmericanLifeIE
+from .thisav import ThisAVIE
+from .thisoldhouse import ThisOldHouseIE
+from .threeqsdn import ThreeQSDNIE
+from .tinypic import TinyPicIE
+from .tlc import TlcDeIE
+from .tmz import (
+    TMZIE,
+    TMZArticleIE,
+)
+from .tnaflix import (
+    TNAFlixNetworkEmbedIE,
+    TNAFlixIE,
+    EMPFlixIE,
+    MovieFapIE,
+)
+from .toggle import ToggleIE
+from .tonline import TOnlineIE
+from .toutv import TouTvIE
+from .toypics import ToypicsUserIE, ToypicsIE
+from .traileraddict import TrailerAddictIE
+from .trilulilu import TriluliluIE
+from .trutv import TruTVIE
+from .tube8 import Tube8IE
+from .tubitv import TubiTvIE
+from .tudou import (
+    TudouIE,
+    TudouPlaylistIE,
+    TudouAlbumIE,
+)
+from .tumblr import TumblrIE
+from .tunein import (
+    TuneInClipIE,
+    TuneInStationIE,
+    TuneInProgramIE,
+    TuneInTopicIE,
+    TuneInShortenerIE,
+)
+from .turbo import TurboIE
+from .tutv import TutvIE
+from .tv2 import (
+    TV2IE,
+    TV2ArticleIE,
+)
+from .tv3 import TV3IE
+from .tv4 import TV4IE
+from .tvanouvelles import (
+    TVANouvellesIE,
+    TVANouvellesArticleIE,
+)
+from .tvc import (
+    TVCIE,
+    TVCArticleIE,
+)
+from .tvigle import TvigleIE
+from .tvland import TVLandIE
+from .tvnoe import TVNoeIE
+from .tvp import (
+    TVPEmbedIE,
+    TVPIE,
+    TVPSeriesIE,
+)
+from .tvplay import (
+    TVPlayIE,
+    ViafreeIE,
+)
+from .tweakers import TweakersIE
+from .twentyfourvideo import TwentyFourVideoIE
+from .twentymin import TwentyMinutenIE
+from .twentytwotracks import (
+    TwentyTwoTracksIE,
+    TwentyTwoTracksGenreIE
+)
+from .twitch import (
+    TwitchVideoIE,
+    TwitchChapterIE,
+    TwitchVodIE,
+    TwitchProfileIE,
+    TwitchPastBroadcastsIE,
+    TwitchStreamIE,
+    TwitchClipsIE,
+)
+from .twitter import (
+    TwitterCardIE,
+    TwitterIE,
+    TwitterAmplifyIE,
+)
+from .udemy import (
+    UdemyIE,
+    UdemyCourseIE
+)
+from .udn import UDNEmbedIE
+from .digiteka import DigitekaIE
+from .unistra import UnistraIE
+from .uol import UOLIE
+from .uplynk import (
+    UplynkIE,
+    UplynkPreplayIE,
+)
+from .urort import UrortIE
+from .urplay import URPlayIE
+from .usanetwork import USANetworkIE
+from .usatoday import USATodayIE
+from .ustream import UstreamIE, UstreamChannelIE
+from .ustudio import (
+    UstudioIE,
+    UstudioEmbedIE,
+)
+from .varzesh3 import Varzesh3IE
+from .vbox7 import Vbox7IE
+from .veehd import VeeHDIE
+from .veoh import VeohIE
+from .vessel import VesselIE
+from .vesti import VestiIE
+from .vevo import (
+    VevoIE,
+    VevoPlaylistIE,
+)
+from .vgtv import (
+    BTArticleIE,
+    BTVestlendingenIE,
+    VGTVIE,
+)
+from .vh1 import VH1IE
+from .vice import (
+    ViceIE,
+    ViceShowIE,
+)
+from .viceland import VicelandIE
+from .vidbit import VidbitIE
+from .viddler import ViddlerIE
+from .videodetective import VideoDetectiveIE
+from .videofyme import VideofyMeIE
+from .videomega import VideoMegaIE
+from .videomore import (
+    VideomoreIE,
+    VideomoreVideoIE,
+    VideomoreSeasonIE,
+)
+from .videopremium import VideoPremiumIE
+from .videott import VideoTtIE
+from .vidio import VidioIE
+from .vidme import (
+    VidmeIE,
+    VidmeUserIE,
+    VidmeUserLikesIE,
+)
+from .vidzi import VidziIE
+from .vier import VierIE, VierVideosIE
+from .viewlift import (
+    ViewLiftIE,
+    ViewLiftEmbedIE,
+)
+from .viewster import ViewsterIE
+from .viidea import ViideaIE
+from .vimeo import (
+    VimeoIE,
+    VimeoAlbumIE,
+    VimeoChannelIE,
+    VimeoGroupsIE,
+    VimeoLikesIE,
+    VimeoOndemandIE,
+    VimeoReviewIE,
+    VimeoUserIE,
+    VimeoWatchLaterIE,
+)
+from .vimple import VimpleIE
+from .vine import (
+    VineIE,
+    VineUserIE,
+)
+from .viki import (
+    VikiIE,
+    VikiChannelIE,
+)
+from .vk import (
+    VKIE,
+    VKUserVideosIE,
+    VKWallPostIE,
+)
+from .vlive import VLiveIE
+from .vodlocker import VodlockerIE
+from .vodplatform import VODPlatformIE
+from .voicerepublic import VoiceRepublicIE
+from .voxmedia import VoxMediaIE
+from .vporn import VpornIE
+from .vrt import VRTIE
+from .vube import VubeIE
+from .vuclip import VuClipIE
+from .vyborymos import VyboryMosIE
+from .vzaar import VzaarIE
+from .walla import WallaIE
+from .washingtonpost import (
+    WashingtonPostIE,
+    WashingtonPostArticleIE,
+)
+from .wat import WatIE
+from .watchindianporn import WatchIndianPornIE
+from .wdr import (
+    WDRIE,
+    WDRMobileIE,
+)
+from .webofstories import (
+    WebOfStoriesIE,
+    WebOfStoriesPlaylistIE,
+)
+from .weiqitv import WeiqiTVIE
+from .wimp import WimpIE
+from .wistia import WistiaIE
+from .worldstarhiphop import WorldStarHipHopIE
+from .wrzuta import (
+    WrzutaIE,
+    WrzutaPlaylistIE,
+)
+from .wsj import WSJIE
+from .xbef import XBefIE
+from .xboxclips import XboxClipsIE
+from .xfileshare import XFileShareIE
+from .xhamster import (
+    XHamsterIE,
+    XHamsterEmbedIE,
+)
+from .xiami import (
+    XiamiSongIE,
+    XiamiAlbumIE,
+    XiamiArtistIE,
+    XiamiCollectionIE
+)
+from .xminus import XMinusIE
+from .xnxx import XNXXIE
+from .xstream import XstreamIE
+from .xtube import XTubeUserIE, XTubeIE
+from .xuite import XuiteIE
+from .xvideos import XVideosIE
+from .xxxymovies import XXXYMoviesIE
+from .yahoo import (
+    YahooIE,
+    YahooSearchIE,
+)
+from .yam import YamIE
+from .yandexmusic import (
+    YandexMusicTrackIE,
+    YandexMusicAlbumIE,
+    YandexMusicPlaylistIE,
+)
+from .yesjapan import YesJapanIE
+from .yinyuetai import YinYueTaiIE
+from .ynet import YnetIE
+from .youjizz import YouJizzIE
+from .youku import (
+    YoukuIE,
+    YoukuShowIE,
+)
+from .youporn import YouPornIE
+from .yourupload import YourUploadIE
+from .youtube import (
+    YoutubeIE,
+    YoutubeChannelIE,
+    YoutubeFavouritesIE,
+    YoutubeHistoryIE,
+    YoutubeLiveIE,
+    YoutubePlaylistIE,
+    YoutubePlaylistsIE,
+    YoutubeRecommendedIE,
+    YoutubeSearchDateIE,
+    YoutubeSearchIE,
+    YoutubeSearchURLIE,
+    YoutubeSharedVideoIE,
+    YoutubeShowIE,
+    YoutubeSubscriptionsIE,
+    YoutubeTruncatedIDIE,
+    YoutubeTruncatedURLIE,
+    YoutubeUserIE,
+    YoutubeWatchLaterIE,
+)
+from .zapiks import ZapiksIE
+from .zdf import ZDFIE, ZDFChannelIE
+from .zingmp3 import ZingMp3IE
diff --git a/youtube_dl/extractor/extremetube.py b/youtube_dl/extractor/extremetube.py

index 3403581fddf08a0928a8e4c5b22e740117646bd2..445f9438db182d0ced6d48233306a53e56271f9d 100644 (file)
--- a/youtube_dl/extractor/extremetube.py
+++ b/youtube_dl/extractor/extremetube.py
@@ -1,20 +1,14 @@
  from __future__ import unicode_literals
  
-import re
+from ..utils import str_to_int
+from .keezmovies import KeezMoviesIE
  
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    sanitized_Request,
-    str_to_int,
-)
  
-
-class ExtremeTubeIE(InfoExtractor):
+class ExtremeTubeIE(KeezMoviesIE):
      _VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)'
      _TESTS = [{
          'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
-        'md5': '344d0c6d50e2f16b06e49ca011d8ac69',
+        'md5': '1fb9228f5e3332ec8c057d6ac36f33e0',
          'info_dict': {
              'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431',
              'ext': 'mp4',
@@ -35,58 +29,22 @@ class ExtremeTubeIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        webpage, info = self._extract_info(url)
  
-        req = sanitized_Request(url)
-        req.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(req, video_id)
+        if not info['title']:
+            info['title'] = self._search_regex(
+                r'<h1[^>]+title="([^"]+)"[^>]*>', webpage, 'title')
  
-        video_title = self._html_search_regex(
-            r'<h1 [^>]*?title="([^"]+)"[^>]*>', webpage, 'title')
          uploader = self._html_search_regex(
              r'Uploaded by:\s*</strong>\s*(.+?)\s*</div>',
              webpage, 'uploader', fatal=False)
-        view_count = str_to_int(self._html_search_regex(
+        view_count = str_to_int(self._search_regex(
              r'Views:\s*</strong>\s*<span>([\d,\.]+)</span>',
              webpage, 'view count', fatal=False))
  
-        flash_vars = self._parse_json(
-            self._search_regex(
-                r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flash vars'),
-            video_id)
-
-        formats = []
-        for quality_key, video_url in flash_vars.items():
-            height = int_or_none(self._search_regex(
-                r'quality_(\d+)[pP]$', quality_key, 'height', default=None))
-            if not height:
-                continue
-            f = {
-                'url': video_url,
-            }
-            mobj = re.search(
-                r'/(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+', video_url)
-            if mobj:
-                height = int(mobj.group('height'))
-                bitrate = int(mobj.group('bitrate'))
-                f.update({
-                    'format_id': '%dp-%dk' % (height, bitrate),
-                    'height': height,
-                    'tbr': bitrate,
-                })
-            else:
-                f.update({
-                    'format_id': '%dp' % height,
-                    'height': height,
-                })
-            formats.append(f)
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': video_title,
-            'formats': formats,
+        info.update({
              'uploader': uploader,
              'view_count': view_count,
-            'age_limit': 18,
-        }
+        })
+
+        return info
diff --git a/youtube_dl/extractor/eyedotv.py b/youtube_dl/extractor/eyedotv.py

new file mode 100644 (file)

index 0000000..2f30351
--- /dev/null
+++ b/youtube_dl/extractor/eyedotv.py
@@ -0,0 +1,64 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    xpath_text,
+    parse_duration,
+    ExtractorError,
+)
+
+
+class EyedoTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?eyedo\.tv/[^/]+/(?:#!/)?Live/Detail/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'https://www.eyedo.tv/en-US/#!/Live/Detail/16301',
+        'md5': 'ba14f17995cdfc20c36ba40e21bf73f7',
+        'info_dict': {
+            'id': '16301',
+            'ext': 'mp4',
+            'title': 'Journée du conseil scientifique de l\'Afnic 2015',
+            'description': 'md5:4abe07293b2f73efc6e1c37028d58c98',
+            'uploader': 'Afnic Live',
+            'uploader_id': '8023',
+        }
+    }
+    _ROOT_URL = 'http://live.eyedo.net:1935/'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_xml('http://eyedo.tv/api/live/GetLive/%s' % video_id, video_id)
+
+        def _add_ns(path):
+            return self._xpath_ns(path, 'http://schemas.datacontract.org/2004/07/EyeDo.Core.Implementation.Web.ViewModels.Api')
+
+        title = xpath_text(video_data, _add_ns('Titre'), 'title', True)
+        state_live_code = xpath_text(video_data, _add_ns('StateLiveCode'), 'title', True)
+        if state_live_code == 'avenir':
+            raise ExtractorError(
+                '%s said: We\'re sorry, but this video is not yet available.' % self.IE_NAME,
+                expected=True)
+
+        is_live = state_live_code == 'live'
+        m3u8_url = None
+        # http://eyedo.tv/Content/Html5/Scripts/html5view.js
+        if is_live:
+            if xpath_text(video_data, 'Cdn') == 'true':
+                m3u8_url = 'http://rrr.sz.xlcdn.com/?account=eyedo&file=A%s&type=live&service=wowza&protocol=http&output=playlist.m3u8' % video_id
+            else:
+                m3u8_url = self._ROOT_URL + 'w/%s/eyedo_720p/playlist.m3u8' % video_id
+        else:
+            m3u8_url = self._ROOT_URL + 'replay-w/%s/mp4:%s.mp4/playlist.m3u8' % (video_id, video_id)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8' if is_live else 'm3u8_native'),
+            'description': xpath_text(video_data, _add_ns('Description')),
+            'duration': parse_duration(xpath_text(video_data, _add_ns('Duration'))),
+            'uploader': xpath_text(video_data, _add_ns('Createur')),
+            'uploader_id': xpath_text(video_data, _add_ns('CreateurId')),
+            'chapter': xpath_text(video_data, _add_ns('ChapitreTitre')),
+            'chapter_id': xpath_text(video_data, _add_ns('ChapitreId')),
+        }
diff --git a/youtube_dl/extractor/facebook.py b/youtube_dl/extractor/facebook.py

index f5bbd39d2d0e90996c118e3fae325034fc2bbb6d..b4d38e5c258b830e192bcfa2639f2074d9217434 100644 (file)
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@@ -1,6 +1,5 @@
  from __future__ import unicode_literals
  
-import json
  import re
  import socket
  
@@ -15,6 +14,7 @@ from ..compat import (
  from ..utils import (
      error_to_compat_str,
      ExtractorError,
+    int_or_none,
      limit_length,
      sanitized_Request,
      urlencode_postdata,
@@ -27,7 +27,7 @@ class FacebookIE(InfoExtractor):
      _VALID_URL = r'''(?x)
                  (?:
                      https?://
-                        (?:\w+\.)?facebook\.com/
+                        (?:[\w-]+\.)?facebook\.com/
                          (?:[^#]*?\#!/)?
                          (?:
                              (?:
@@ -62,6 +62,8 @@ class FacebookIE(InfoExtractor):
              'ext': 'mp4',
              'title': 're:Did you know Kei Nishikori is the first Asian man to ever reach a Grand Slam',
              'uploader': 'Tennis on Facebook',
+            'upload_date': '20140908',
+            'timestamp': 1410199200,
          }
      }, {
          'note': 'Video without discernible title',
@@ -71,6 +73,8 @@ class FacebookIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Facebook video #274175099429670',
              'uploader': 'Asif Nawab Butt',
+            'upload_date': '20140506',
+            'timestamp': 1399398998,
          },
          'expected_warnings': [
              'title'
@@ -78,12 +82,14 @@ class FacebookIE(InfoExtractor):
      }, {
          'note': 'Video with DASH manifest',
          'url': 'https://www.facebook.com/video.php?v=957955867617029',
-        'md5': '54706e4db4f5ad58fbad82dde1f1213f',
+        'md5': 'b2c28d528273b323abe5c6ab59f0f030',
          'info_dict': {
              'id': '957955867617029',
              'ext': 'mp4',
              'title': 'When you post epic content on instagram.com/433 8 million followers, this is ...',
              'uploader': 'Demy de Zeeuw',
+            'upload_date': '20160110',
+            'timestamp': 1452431627,
          },
      }, {
          'url': 'https://www.facebook.com/maxlayn/posts/10153807558977570',
@@ -93,7 +99,8 @@ class FacebookIE(InfoExtractor):
              'ext': 'mp4',
              'title': '"What are you doing running in the snow?"',
              'uploader': 'FailArmy',
-        }
+        },
+        'skip': 'Video gone',
      }, {
          'url': 'https://m.facebook.com/story.php?story_fbid=1035862816472149&id=116132035111903',
          'md5': '1deb90b6ac27f7efcf6d747c8a27f5e3',
@@ -103,6 +110,7 @@ class FacebookIE(InfoExtractor):
              'title': 'What the Flock Is Going On In New Zealand  Credit: ViralHog',
              'uploader': 'S. Saint',
          },
+        'skip': 'Video gone',
      }, {
          'note': 'swf params escaped',
          'url': 'https://www.facebook.com/barackobama/posts/10153664894881749',
@@ -112,6 +120,18 @@ class FacebookIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Facebook video #10153664894881749',
          },
+    }, {
+        # have 1080P, but only up to 720p in swf params
+        'url': 'https://www.facebook.com/cnn/videos/10155529876156509/',
+        'md5': '0d9813160b146b3bc8744e006027fcc6',
+        'info_dict': {
+            'id': '10155529876156509',
+            'ext': 'mp4',
+            'title': 'Holocaust survivor becomes US citizen',
+            'timestamp': 1477818095,
+            'upload_date': '20161030',
+            'uploader': 'CNN',
+        },
      }, {
          'url': 'https://www.facebook.com/video.php?v=10204634152394104',
          'only_matching': True,
@@ -127,8 +147,26 @@ class FacebookIE(InfoExtractor):
      }, {
          'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/',
          'only_matching': True,
+    }, {
+        'url': 'https://zh-hk.facebook.com/peoplespower/videos/1135894589806027/',
+        'only_matching': True,
      }]
  
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
+        if mobj is not None:
+            return mobj.group('url')
+
+        # Facebook API embed
+        # see https://developers.facebook.com/docs/plugins/embedded-video-player
+        mobj = re.search(r'''(?x)<div[^>]+
+                class=(?P<q1>[\'"])[^\'"]*\bfb-(?:video|post)\b[^\'"]*(?P=q1)[^>]+
+                data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage)
+        if mobj is not None:
+            return mobj.group('url')
+
      def _login(self):
          (useremail, password) = self._get_login_info()
          if useremail is None:
@@ -202,29 +240,12 @@ class FacebookIE(InfoExtractor):
  
          video_data = None
  
-        BEFORE = '{swf.addParam(param[0], param[1]);});'
-        AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
-        m = re.search(re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER), webpage)
-        if m:
-            swf_params = m.group(1).replace('\\\\', '\\').replace('\\"', '"')
-            data = dict(json.loads(swf_params))
-            params_raw = compat_urllib_parse_unquote(data['params'])
-            video_data = json.loads(params_raw)['video_data']
-
-        def video_data_list2dict(video_data):
-            ret = {}
-            for item in video_data:
-                format_id = item['stream_type']
-                ret.setdefault(format_id, []).append(item)
-            return ret
-
-        if not video_data:
-            server_js_data = self._parse_json(self._search_regex(
-                r'handleServerJS\(({.+})\);', webpage, 'server js data', default='{}'), video_id)
-            for item in server_js_data.get('instances', []):
-                if item[1][0] == 'VideoConfig':
-                    video_data = video_data_list2dict(item[2][0]['videoData'])
-                    break
+        server_js_data = self._parse_json(self._search_regex(
+            r'handleServerJS\(({.+})(?:\);|,")', webpage, 'server js data', default='{}'), video_id)
+        for item in server_js_data.get('instances', []):
+            if item[1][0] == 'VideoConfig':
+                video_data = item[2][0]['videoData']
+                break
  
          if not video_data:
              if not fatal_if_no_video:
@@ -238,7 +259,10 @@ class FacebookIE(InfoExtractor):
                  raise ExtractorError('Cannot parse data')
  
          formats = []
-        for format_id, f in video_data.items():
+        for f in video_data:
+            format_id = f['stream_type']
+            if f and isinstance(f, dict):
+                f = [f]
              if not f or not isinstance(f, list):
                  continue
              for quality in ('sd', 'hd'):
@@ -273,12 +297,16 @@ class FacebookIE(InfoExtractor):
          if not video_title:
              video_title = 'Facebook video #%s' % video_id
          uploader = clean_html(get_element_by_id('fbPhotoPageAuthorName', webpage))
+        timestamp = int_or_none(self._search_regex(
+            r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
+            'timestamp', default=None))
  
          info_dict = {
              'id': video_id,
              'title': video_title,
              'formats': formats,
              'uploader': uploader,
+            'timestamp': timestamp,
          }
  
          return webpage, info_dict
@@ -307,3 +335,32 @@ class FacebookIE(InfoExtractor):
                  self._VIDEO_PAGE_TEMPLATE % video_id,
                  video_id, fatal_if_no_video=True)
              return info_dict
+
+
+class FacebookPluginsVideoIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:[\w-]+\.)?facebook\.com/plugins/video\.php\?.*?\bhref=(?P<id>https.+)'
+
+    _TESTS = [{
+        'url': 'https://www.facebook.com/plugins/video.php?href=https%3A%2F%2Fwww.facebook.com%2Fgov.sg%2Fvideos%2F10154383743583686%2F&show_text=0&width=560',
+        'md5': '5954e92cdfe51fe5782ae9bda7058a07',
+        'info_dict': {
+            'id': '10154383743583686',
+            'ext': 'mp4',
+            'title': 'What to do during the haze?',
+            'uploader': 'Gov.sg',
+            'upload_date': '20160826',
+            'timestamp': 1472184808,
+        },
+        'add_ie': [FacebookIE.ie_key()],
+    }, {
+        'url': 'https://www.facebook.com/plugins/video.php?href=https%3A%2F%2Fwww.facebook.com%2Fvideo.php%3Fv%3D10204634152394104',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.facebook.com/plugins/video.php?href=https://www.facebook.com/gov.sg/videos/10154383743583686/&show_text=0&width=560',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        return self.url_result(
+            compat_urllib_parse_unquote(self._match_id(url)),
+            FacebookIE.ie_key())
diff --git a/youtube_dl/extractor/faz.py b/youtube_dl/extractor/faz.py

index fd535457dc56a589eaf9e062dc40fe5374735020..4bc8fc5127010e1b3ced207da04f8926716cc94d 100644 (file)
--- a/youtube_dl/extractor/faz.py
+++ b/youtube_dl/extractor/faz.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/fc2.py b/youtube_dl/extractor/fc2.py

index c7d69ff1f980de46bd4ecce96e4ac301b1f1be59..c032d4d0282cc7907b08ec42de9ac842dd4a34c2 100644 (file)
--- a/youtube_dl/extractor/fc2.py
+++ b/youtube_dl/extractor/fc2.py
@@ -1,10 +1,12 @@
-#! -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import hashlib
+import re
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_parse_qs,
      compat_urllib_request,
      compat_urlparse,
  )
@@ -16,7 +18,7 @@ from ..utils import (
  
  
  class FC2IE(InfoExtractor):
-    _VALID_URL = r'^https?://video\.fc2\.com/(?:[^/]+/)*content/(?P<id>[^/]+)'
+    _VALID_URL = r'^(?:https?://video\.fc2\.com/(?:[^/]+/)*content/|fc2:)(?P<id>[^/]+)'
      IE_NAME = 'fc2'
      _NETRC_MACHINE = 'fc2'
      _TESTS = [{
@@ -75,12 +77,17 @@ class FC2IE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
          self._login()
-        webpage = self._download_webpage(url, video_id)
-        self._downloader.cookiejar.clear_session_cookies()  # must clear
-        self._login()
-
-        title = self._og_search_title(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
+        webpage = None
+        if not url.startswith('fc2:'):
+            webpage = self._download_webpage(url, video_id)
+            self._downloader.cookiejar.clear_session_cookies()  # must clear
+            self._login()
+
+        title = 'FC2 video %s' % video_id
+        thumbnail = None
+        if webpage is not None:
+            title = self._og_search_title(webpage)
+            thumbnail = self._og_search_thumbnail(webpage)
          refer = url.replace('/content/', '/a/content/') if '/a/content/' not in url else url
  
          mimi = hashlib.md5((video_id + '_gGddgPfeaf_gzyr').encode('utf-8')).hexdigest()
@@ -113,3 +120,41 @@ class FC2IE(InfoExtractor):
              'ext': 'flv',
              'thumbnail': thumbnail,
          }
+
+
+class FC2EmbedIE(InfoExtractor):
+    _VALID_URL = r'https?://video\.fc2\.com/flv2\.swf\?(?P<query>.+)'
+    IE_NAME = 'fc2:embed'
+
+    _TEST = {
+        'url': 'http://video.fc2.com/flv2.swf?t=201404182936758512407645&i=20130316kwishtfitaknmcgd76kjd864hso93htfjcnaogz629mcgfs6rbfk0hsycma7shkf85937cbchfygd74&i=201403223kCqB3Ez&d=2625&sj=11&lang=ja&rel=1&from=11&cmt=1&tk=TlRBM09EQTNNekU9&tl=プリズン･ブレイク%20S1-01%20マイケル%20【吹替】',
+        'md5': 'b8aae5334cb691bdb1193a88a6ab5d5a',
+        'info_dict': {
+            'id': '201403223kCqB3Ez',
+            'ext': 'flv',
+            'title': 'プリズン･ブレイク S1-01 マイケル 【吹替】',
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        query = compat_parse_qs(mobj.group('query'))
+
+        video_id = query['i'][-1]
+        title = query.get('tl', ['FC2 video %s' % video_id])[0]
+
+        sj = query.get('sj', [None])[0]
+        thumbnail = None
+        if sj:
+            # See thumbnailImagePath() in ServerConst.as of flv2.swf
+            thumbnail = 'http://video%s-thumbnail.fc2.com/up/pic/%s.jpg' % (
+                sj, '/'.join((video_id[:6], video_id[6:8], video_id[-2], video_id[-1], video_id)))
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': FC2IE.ie_key(),
+            'url': 'fc2:%s' % video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+        }
diff --git a/youtube_dl/extractor/fczenit.py b/youtube_dl/extractor/fczenit.py

index f1f150ef2ce41defbcee841d86fe4f9ada34d25d..8d1010b88c83dcbfd3e71e9f20275bf6fb9c9d21 100644 (file)
--- a/youtube_dl/extractor/fczenit.py
+++ b/youtube_dl/extractor/fczenit.py
@@ -1,20 +1,19 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
+from ..compat import compat_urlparse
  
  
  class FczenitIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?fc-zenit\.ru/video/gl(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?fc-zenit\.ru/video/(?P<id>[0-9]+)'
      _TEST = {
-        'url': 'http://fc-zenit.ru/video/gl6785/',
-        'md5': '458bacc24549173fe5a5aa29174a5606',
+        'url': 'http://fc-zenit.ru/video/41044/',
+        'md5': '0e3fab421b455e970fa1aa3891e57df0',
          'info_dict': {
-            'id': '6785',
+            'id': '41044',
              'ext': 'mp4',
-            'title': '«Зенит-ТВ»: как Олег Шатов играл против «Урала»',
+            'title': 'Так пишется история: казанский разгром ЦСКА на «Зенит-ТВ»',
          },
      }
  
@@ -22,15 +21,23 @@ class FczenitIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        video_title = self._html_search_regex(r'<div class=\"photoalbum__title\">([^<]+)', webpage, 'title')
+        video_title = self._html_search_regex(
+            r'<[^>]+class=\"photoalbum__title\">([^<]+)', webpage, 'title')
+
+        video_items = self._parse_json(self._search_regex(
+            r'arrPath\s*=\s*JSON\.parse\(\'(.+)\'\)', webpage, 'video items'),
+            video_id)
  
-        bitrates_raw = self._html_search_regex(r'bitrates:.*\n(.*)\]', webpage, 'video URL')
-        bitrates = re.findall(r'url:.?\'(.+?)\'.*?bitrate:.?([0-9]{3}?)', bitrates_raw)
+        def merge_dicts(*dicts):
+            ret = {}
+            for a_dict in dicts:
+                ret.update(a_dict)
+            return ret
  
          formats = [{
-            'url': furl,
-            'tbr': tbr,
-        } for furl, tbr in bitrates]
+            'url': compat_urlparse.urljoin(url, video_url),
+            'tbr': int(tbr),
+        } for tbr, video_url in merge_dicts(*video_items).items()]
  
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/firsttv.py b/youtube_dl/extractor/firsttv.py

index 98b165143fe8b3f3e970ad602856b4266c59701c..6b662cc3cd78e4acf661af473f2374b5ec2af05c 100644 (file)
--- a/youtube_dl/extractor/firsttv.py
+++ b/youtube_dl/extractor/firsttv.py
@@ -1,79 +1,93 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..compat import compat_urlparse
+from ..utils import (
+    int_or_none,
+    qualities,
+    unified_strdate,
+)
  
  
  class FirstTVIE(InfoExtractor):
      IE_NAME = '1tv'
      IE_DESC = 'Первый канал'
-    _VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
+    _VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>[^/?#]+)'
  
      _TESTS = [{
-        'url': 'http://www.1tv.ru/videoarchive/73390',
-        'md5': '777f525feeec4806130f4f764bc18a4f',
+        # single format
+        'url': 'http://www.1tv.ru/shows/naedine-so-vsemi/vypuski/gost-lyudmila-senchina-naedine-so-vsemi-vypusk-ot-12-02-2015',
+        'md5': 'a1b6b60d530ebcf8daacf4565762bbaf',
          'info_dict': {
-            'id': '73390',
+            'id': '40049',
              'ext': 'mp4',
-            'title': 'Ð\9eÐ»Ð¸Ð¼Ð¿Ð¸Ð¹Ñ\81ÐºÐ¸Ðµ ÐºÐ°Ð½Ð°Ñ\82Ð½Ñ\8bÐµ Ð´Ð¾Ñ\80Ð¾Ð³Ð¸',
-            'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
+            'title': 'Ð\93Ð¾Ñ\81Ñ\82Ñ\8c Ð\9bÑ\8eÐ´Ð¼Ð¸Ð»Ð° Ð¡ÐµÐ½Ñ\87Ð¸Ð½Ð°. Ð\9dÐ°ÐµÐ´Ð¸Ð½Ðµ Ñ\81Ð¾ Ð²Ñ\81ÐµÐ¼Ð¸. Ð\92Ñ\8bÐ¿Ñ\83Ñ\81Ðº Ð¾Ñ\82 12.02.2015',
+            'description': 'md5:36a39c1d19618fec57d12efe212a8370',
              'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
-            'duration': 149,
-            'like_count': int,
-            'dislike_count': int,
+            'upload_date': '20150212',
+            'duration': 2694,
          },
-        'skip': 'Only works from Russia',
      }, {
-        'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
-        'md5': 'a1b6b60d530ebcf8daacf4565762bbaf',
+        # multiple formats
+        'url': 'http://www.1tv.ru/shows/dobroe-utro/pro-zdorove/vesennyaya-allergiya-dobroe-utro-fragment-vypuska-ot-07042016',
          'info_dict': {
-            'id': '35930',
+            'id': '364746',
              'ext': 'mp4',
-            'title': 'Ð\9dÐ°ÐµÐ´Ð¸Ð½Ðµ Ñ\81Ð¾ Ð²Ñ\81ÐµÐ¼Ð¸. Ð\9bÑ\8eÐ´Ð¼Ð¸Ð»Ð° Ð¡ÐµÐ½Ñ\87Ð¸Ð½Ð°',
-            'description': 'md5:89553aed1d641416001fe8d450f06cb9',
+            'title': 'Ð\92ÐµÑ\81ÐµÐ½Ð½Ñ\8fÑ\8f Ð°Ð»Ð»ÐµÑ\80Ð³Ð¸Ñ\8f. Ð\94Ð¾Ð±Ñ\80Ð¾Ðµ Ñ\83Ñ\82Ñ\80Ð¾. Ð¤Ñ\80Ð°Ð³Ð¼ÐµÐ½Ñ\82 Ð²Ñ\8bÐ¿Ñ\83Ñ\81ÐºÐ° Ð¾Ñ\82 07.04.2016',
+            'description': 'md5:a242eea0031fd180a4497d52640a9572',
              'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
-            'duration': 2694,
+            'upload_date': '20160407',
+            'duration': 179,
+            'formats': 'mincount:3',
+        },
+        'params': {
+            'skip_download': True,
          },
-        'skip': 'Only works from Russia',
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        display_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id, 'Downloading page')
+        webpage = self._download_webpage(url, display_id)
+        playlist_url = compat_urlparse.urljoin(url, self._search_regex(
+            r'data-playlist-url="([^"]+)', webpage, 'playlist url'))
  
-        video_url = self._html_search_regex(
-            r'''(?s)(?:jwplayer\('flashvideoportal_1'\)\.setup\({|var\s+playlistObj\s*=).*?'file'\s*:\s*'([^']+)'.*?}\);''',
-            webpage, 'video URL')
+        item = self._download_json(playlist_url, display_id)[0]
+        video_id = item['id']
+        quality = qualities(('ld', 'sd', 'hd', ))
+        formats = []
+        for f in item.get('mbr', []):
+            src = f.get('src')
+            if not src:
+                continue
+            fname = f.get('name')
+            formats.append({
+                'url': src,
+                'format_id': fname,
+                'quality': quality(fname),
+            })
+        self._sort_formats(formats)
  
          title = self._html_search_regex(
-            [r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
-             r"'title'\s*:\s*'([^']+)'"], webpage, 'title')
+            (r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
+             r"'title'\s*:\s*'([^']+)'"),
+            webpage, 'title', default=None) or item['title']
          description = self._html_search_regex(
              r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
              webpage, 'description', default=None) or self._html_search_meta(
-                'description', webpage, 'description')
-
-        thumbnail = self._og_search_thumbnail(webpage)
-        duration = self._og_search_property(
-            'video:duration', webpage,
-            'video duration', fatal=False)
-
-        like_count = self._html_search_regex(
-            r'title="Понравилось".*?/></label> \[(\d+)\]',
-            webpage, 'like count', default=None)
-        dislike_count = self._html_search_regex(
-            r'title="Не понравилось".*?/></label> \[(\d+)\]',
-            webpage, 'dislike count', default=None)
+            'description', webpage, 'description')
+        duration = int_or_none(self._html_search_meta(
+            'video:duration', webpage, 'video duration', fatal=False))
+        upload_date = unified_strdate(self._html_search_meta(
+            'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
  
          return {
              'id': video_id,
-            'url': video_url,
-            'thumbnail': thumbnail,
+            'thumbnail': item.get('poster') or self._og_search_thumbnail(webpage),
              'title': title,
              'description': description,
+            'upload_date': upload_date,
              'duration': int_or_none(duration),
-            'like_count': int_or_none(like_count),
-            'dislike_count': int_or_none(dislike_count),
+            'formats': formats
          }
diff --git a/youtube_dl/extractor/fivemin.py b/youtube_dl/extractor/fivemin.py

index 6b834541636533d808ce396ae456f980f989c731..f3f876ecda7fa3776f21013352833548f9be42c6 100644 (file)
--- a/youtube_dl/extractor/fivemin.py
+++ b/youtube_dl/extractor/fivemin.py
@@ -1,24 +1,11 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..compat import (
-    compat_parse_qs,
-    compat_urllib_parse_urlencode,
-    compat_urllib_parse_urlparse,
-    compat_urlparse,
-)
-from ..utils import (
-    ExtractorError,
-    parse_duration,
-    replace_extension,
-)
  
  
  class FiveMinIE(InfoExtractor):
      IE_NAME = '5min'
-    _VALID_URL = r'(?:5min:(?P<id>\d+)(?::(?P<sid>\d+))?|https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?P<query>.*))'
+    _VALID_URL = r'(?:5min:|https?://(?:[^/]*?5min\.com/|delivery\.vidible\.tv/aol)(?:(?:Scripts/PlayerSeed\.js|playerseed/?)?\?.*?playList=)?)(?P<id>\d+)'
  
      _TESTS = [
          {
@@ -29,8 +16,16 @@ class FiveMinIE(InfoExtractor):
                  'id': '518013791',
                  'ext': 'mp4',
                  'title': 'iPad Mini with Retina Display Review',
+                'description': 'iPad mini with Retina Display review',
                  'duration': 177,
+                'uploader': 'engadget',
+                'upload_date': '20131115',
+                'timestamp': 1384515288,
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            }
          },
          {
              # From http://on.aol.com/video/how-to-make-a-next-level-fruit-salad-518086247
@@ -44,108 +39,16 @@ class FiveMinIE(InfoExtractor):
              },
              'skip': 'no longer available',
          },
-    ]
-    _ERRORS = {
-        'ErrorVideoNotExist': 'We\'re sorry, but the video you are trying to watch does not exist.',
-        'ErrorVideoNoLongerAvailable': 'We\'re sorry, but the video you are trying to watch is no longer available.',
-        'ErrorVideoRejected': 'We\'re sorry, but the video you are trying to watch has been removed.',
-        'ErrorVideoUserNotGeo': 'We\'re sorry, but the video you are trying to watch cannot be viewed from your current location.',
-        'ErrorVideoLibraryRestriction': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
-        'ErrorExposurePermission': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
-    }
-    _QUALITIES = {
-        1: {
-            'width': 640,
-            'height': 360,
-        },
-        2: {
-            'width': 854,
-            'height': 480,
-        },
-        4: {
-            'width': 1280,
-            'height': 720,
-        },
-        8: {
-            'width': 1920,
-            'height': 1080,
-        },
-        16: {
-            'width': 640,
-            'height': 360,
-        },
-        32: {
-            'width': 854,
-            'height': 480,
-        },
-        64: {
-            'width': 1280,
-            'height': 720,
-        },
-        128: {
-            'width': 640,
-            'height': 360,
+        {
+            'url': 'http://embed.5min.com/518726732/',
+            'only_matching': True,
          },
-    }
+        {
+            'url': 'http://delivery.vidible.tv/aol?playList=518013791',
+            'only_matching': True,
+        }
+    ]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        sid = mobj.group('sid')
-
-        if mobj.group('query'):
-            qs = compat_parse_qs(mobj.group('query'))
-            if not qs.get('playList'):
-                raise ExtractorError('Invalid URL', expected=True)
-            video_id = qs['playList'][0]
-            if qs.get('sid'):
-                sid = qs['sid'][0]
-
-        embed_url = 'https://embed.5min.com/playerseed/?playList=%s' % video_id
-        if not sid:
-            embed_page = self._download_webpage(embed_url, video_id,
-                                                'Downloading embed page')
-            sid = self._search_regex(r'sid=(\d+)', embed_page, 'sid')
-
-        response = self._download_json(
-            'https://syn.5min.com/handlers/SenseHandler.ashx?' +
-            compat_urllib_parse_urlencode({
-                'func': 'GetResults',
-                'playlist': video_id,
-                'sid': sid,
-                'isPlayerSeed': 'true',
-                'url': embed_url,
-            }),
-            video_id)
-        if not response['success']:
-            raise ExtractorError(
-                '%s said: %s' % (
-                    self.IE_NAME,
-                    self._ERRORS.get(response['errorMessage'], response['errorMessage'])),
-                expected=True)
-        info = response['binding'][0]
-
-        formats = []
-        parsed_video_url = compat_urllib_parse_urlparse(compat_parse_qs(
-            compat_urllib_parse_urlparse(info['EmbededURL']).query)['videoUrl'][0])
-        for rendition in info['Renditions']:
-            if rendition['RenditionType'] == 'aac' or rendition['RenditionType'] == 'm3u8':
-                continue
-            else:
-                rendition_url = compat_urlparse.urlunparse(parsed_video_url._replace(path=replace_extension(parsed_video_url.path.replace('//', '/%s/' % rendition['ID']), rendition['RenditionType'])))
-                quality = self._QUALITIES.get(rendition['ID'], {})
-                formats.append({
-                    'format_id': '%s-%d' % (rendition['RenditionType'], rendition['ID']),
-                    'url': rendition_url,
-                    'width': quality.get('width'),
-                    'height': quality.get('height'),
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': info['Title'],
-            'thumbnail': info.get('ThumbURL'),
-            'duration': parse_duration(info.get('Duration')),
-            'formats': formats,
-        }
+        video_id = self._match_id(url)
+        return self.url_result('aol-video:%s' % video_id)
diff --git a/youtube_dl/extractor/flickr.py b/youtube_dl/extractor/flickr.py

index 0a3de14988dc06e92a7a27e52c4c7838caf69b2b..a8e1bf42a433fd87f638e8b34ce5ab68464a9252 100644 (file)
--- a/youtube_dl/extractor/flickr.py
+++ b/youtube_dl/extractor/flickr.py
@@ -24,13 +24,28 @@ class FlickrIE(InfoExtractor):
              'upload_date': '20110423',
              'uploader_id': '10922353@N03',
              'uploader': 'Forest Wander',
+            'uploader_url': 'https://www.flickr.com/photos/forestwander-nature-pictures/',
              'comment_count': int,
              'view_count': int,
              'tags': list,
+            'license': 'Attribution-ShareAlike',
          }
      }
-
      _API_BASE_URL = 'https://api.flickr.com/services/rest?'
+    # https://help.yahoo.com/kb/flickr/SLN25525.html
+    _LICENSES = {
+        '0': 'All Rights Reserved',
+        '1': 'Attribution-NonCommercial-ShareAlike',
+        '2': 'Attribution-NonCommercial',
+        '3': 'Attribution-NonCommercial-NoDerivs',
+        '4': 'Attribution',
+        '5': 'Attribution-ShareAlike',
+        '6': 'Attribution-NoDerivs',
+        '7': 'No known copyright restrictions',
+        '8': 'United States government work',
+        '9': 'Public Domain Dedication (CC0)',
+        '10': 'Public Domain Work',
+    }
  
      def _call_api(self, method, video_id, api_key, note, secret=None):
          query = {
@@ -75,6 +90,9 @@ class FlickrIE(InfoExtractor):
              self._sort_formats(formats)
  
              owner = video_info.get('owner', {})
+            uploader_id = owner.get('nsid')
+            uploader_path = owner.get('path_alias') or uploader_id
+            uploader_url = 'https://www.flickr.com/photos/%s/' % uploader_path if uploader_path else None
  
              return {
                  'id': video_id,
@@ -83,11 +101,13 @@ class FlickrIE(InfoExtractor):
                  'formats': formats,
                  'timestamp': int_or_none(video_info.get('dateuploaded')),
                  'duration': int_or_none(video_info.get('video', {}).get('duration')),
-                'uploader_id': owner.get('nsid'),
+                'uploader_id': uploader_id,
                  'uploader': owner.get('realname'),
+                'uploader_url': uploader_url,
                  'comment_count': int_or_none(video_info.get('comments', {}).get('_content')),
                  'view_count': int_or_none(video_info.get('views')),
-                'tags': [tag.get('_content') for tag in video_info.get('tags', {}).get('tag', [])]
+                'tags': [tag.get('_content') for tag in video_info.get('tags', {}).get('tag', [])],
+                'license': self._LICENSES.get(video_info.get('license')),
              }
          else:
              raise ExtractorError('not a video', expected=True)
diff --git a/youtube_dl/extractor/flipagram.py b/youtube_dl/extractor/flipagram.py

new file mode 100644 (file)

index 0000000..1902a23
--- /dev/null
+++ b/youtube_dl/extractor/flipagram.py
@@ -0,0 +1,115 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    int_or_none,
+    float_or_none,
+    try_get,
+    unified_timestamp,
+)
+
+
+class FlipagramIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?flipagram\.com/f/(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'https://flipagram.com/f/nyvTSJMKId',
+        'md5': '888dcf08b7ea671381f00fab74692755',
+        'info_dict': {
+            'id': 'nyvTSJMKId',
+            'ext': 'mp4',
+            'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
+            'description': 'md5:d55e32edc55261cae96a41fa85ff630e',
+            'duration': 35.571,
+            'timestamp': 1461244995,
+            'upload_date': '20160421',
+            'uploader': 'kitty juria',
+            'uploader_id': 'sjuria101',
+            'creator': 'kitty juria',
+            'view_count': int,
+            'like_count': int,
+            'repost_count': int,
+            'comment_count': int,
+            'comments': list,
+            'formats': 'mincount:2',
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        video_data = self._parse_json(
+            self._search_regex(
+                r'window\.reactH2O\s*=\s*({.+});', webpage, 'video data'),
+            video_id)
+
+        flipagram = video_data['flipagram']
+        video = flipagram['video']
+
+        json_ld = self._search_json_ld(webpage, video_id, default={})
+        title = json_ld.get('title') or flipagram['captionText']
+        description = json_ld.get('description') or flipagram.get('captionText')
+
+        formats = [{
+            'url': video['url'],
+            'width': int_or_none(video.get('width')),
+            'height': int_or_none(video.get('height')),
+            'filesize': int_or_none(video_data.get('size')),
+        }]
+
+        preview_url = try_get(
+            flipagram, lambda x: x['music']['track']['previewUrl'], compat_str)
+        if preview_url:
+            formats.append({
+                'url': preview_url,
+                'ext': 'm4a',
+                'vcodec': 'none',
+            })
+
+        self._sort_formats(formats)
+
+        counts = flipagram.get('counts', {})
+        user = flipagram.get('user', {})
+        video_data = flipagram.get('video', {})
+
+        thumbnails = [{
+            'url': self._proto_relative_url(cover['url']),
+            'width': int_or_none(cover.get('width')),
+            'height': int_or_none(cover.get('height')),
+            'filesize': int_or_none(cover.get('size')),
+        } for cover in flipagram.get('covers', []) if cover.get('url')]
+
+        # Note that this only retrieves comments that are initally loaded.
+        # For videos with large amounts of comments, most won't be retrieved.
+        comments = []
+        for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
+            text = comment.get('comment')
+            if not text or not isinstance(text, list):
+                continue
+            comments.append({
+                'author': comment.get('user', {}).get('name'),
+                'author_id': comment.get('user', {}).get('username'),
+                'id': comment.get('id'),
+                'text': text[0],
+                'timestamp': unified_timestamp(comment.get('created')),
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'duration': float_or_none(flipagram.get('duration'), 1000),
+            'thumbnails': thumbnails,
+            'timestamp': unified_timestamp(flipagram.get('iso8601Created')),
+            'uploader': user.get('name'),
+            'uploader_id': user.get('username'),
+            'creator': user.get('name'),
+            'view_count': int_or_none(counts.get('plays')),
+            'like_count': int_or_none(counts.get('likes')),
+            'repost_count': int_or_none(counts.get('reflips')),
+            'comment_count': int_or_none(counts.get('comments')),
+            'comments': comments,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/folketinget.py b/youtube_dl/extractor/folketinget.py

index 75399fa7d2a3164c67f2d72c24628a861ed77806..b3df93f28fc6471b1c5fe7303415c223042261bc 100644 (file)
--- a/youtube_dl/extractor/folketinget.py
+++ b/youtube_dl/extractor/folketinget.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/footyroom.py b/youtube_dl/extractor/footyroom.py

index d2503ae2eff3d2e46497bbcba356af11db665452..118325b6d5cd6f29645f94c0d5cc6c719715e400 100644 (file)
--- a/youtube_dl/extractor/footyroom.py
+++ b/youtube_dl/extractor/footyroom.py
@@ -2,25 +2,27 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from .streamable import StreamableIE
  
  
  class FootyRoomIE(InfoExtractor):
-    _VALID_URL = r'https?://footyroom\.com/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://footyroom\.com/matches/(?P<id>\d+)'
      _TESTS = [{
-        'url': 'http://footyroom.com/schalke-04-0-2-real-madrid-2015-02/',
+        'url': 'http://footyroom.com/matches/79922154/hull-city-vs-chelsea/review',
          'info_dict': {
-            'id': 'schalke-04-0-2-real-madrid-2015-02',
-            'title': 'Schalke 04 0 – 2 Real Madrid',
+            'id': '79922154',
+            'title': 'VIDEO Hull City 0 - 2 Chelsea',
          },
-        'playlist_count': 3,
-        'skip': 'Video for this match is not available',
+        'playlist_count': 2,
+        'add_ie': [StreamableIE.ie_key()],
      }, {
-        'url': 'http://footyroom.com/georgia-0-2-germany-2015-03/',
+        'url': 'http://footyroom.com/matches/75817984/georgia-vs-germany/review',
          'info_dict': {
-            'id': 'georgia-0-2-germany-2015-03',
-            'title': 'Georgia 0 – 2 Germany',
+            'id': '75817984',
+            'title': 'VIDEO Georgia 0 - 2 Germany',
          },
          'playlist_count': 1,
+        'add_ie': ['Playwire']
      }]
  
      def _real_extract(self, url):
@@ -28,9 +30,8 @@ class FootyRoomIE(InfoExtractor):
  
          webpage = self._download_webpage(url, playlist_id)
  
-        playlist = self._parse_json(
-            self._search_regex(
-                r'VideoSelector\.load\((\[.+?\])\);', webpage, 'video selector'),
+        playlist = self._parse_json(self._search_regex(
+            r'DataStore\.media\s*=\s*([^;]+)', webpage, 'media data'),
              playlist_id)
  
          playlist_title = self._og_search_title(webpage)
@@ -40,11 +41,16 @@ class FootyRoomIE(InfoExtractor):
              payload = video.get('payload')
              if not payload:
                  continue
-            playwire_url = self._search_regex(
+            playwire_url = self._html_search_regex(
                  r'data-config="([^"]+)"', payload,
                  'playwire url', default=None)
              if playwire_url:
                  entries.append(self.url_result(self._proto_relative_url(
                      playwire_url, 'http:'), 'Playwire'))
  
+            streamable_url = StreamableIE._extract_url(payload)
+            if streamable_url:
+                entries.append(self.url_result(
+                    streamable_url, StreamableIE.ie_key()))
+
          return self.playlist_result(entries, playlist_id, playlist_title)
diff --git a/youtube_dl/extractor/formula1.py b/youtube_dl/extractor/formula1.py

new file mode 100644 (file)

index 0000000..fecfc28
--- /dev/null
+++ b/youtube_dl/extractor/formula1.py
@@ -0,0 +1,33 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class Formula1IE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?formula1\.com/(?:content/fom-website/)?en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
+    _TESTS = [{
+        'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html',
+        'md5': '8c79e54be72078b26b89e0e111c0502b',
+        'info_dict': {
+            'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
+            'ext': 'mp4',
+            'title': 'Race highlights - Spain 2016',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'add_ie': ['Ooyala'],
+    }, {
+        'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        ooyala_embed_code = self._search_regex(
+            r'data-videoid="([^"]+)"', webpage, 'ooyala embed code')
+        return self.url_result(
+            'ooyala:%s' % ooyala_embed_code, 'Ooyala', ooyala_embed_code)
diff --git a/youtube_dl/extractor/fourtube.py b/youtube_dl/extractor/fourtube.py

index fc4a5a0fbf01801d598e20a9addd29ebef4a298e..9776c8422228f1f44b5b0a3bf0c40a125e1fa50a 100644 (file)
--- a/youtube_dl/extractor/fourtube.py
+++ b/youtube_dl/extractor/fourtube.py
@@ -43,14 +43,14 @@ class FourTubeIE(InfoExtractor):
              'uploadDate', webpage))
          thumbnail = self._html_search_meta('thumbnailUrl', webpage)
          uploader_id = self._html_search_regex(
-            r'<a class="img-avatar" href="[^"]+/channels/([^/"]+)" title="Go to [^"]+ page">',
+            r'<a class="item-to-subscribe" href="[^"]+/channels/([^/"]+)" title="Go to [^"]+ page">',
              webpage, 'uploader id', fatal=False)
          uploader = self._html_search_regex(
-            r'<a class="img-avatar" href="[^"]+/channels/[^/"]+" title="Go to ([^"]+) page">',
+            r'<a class="item-to-subscribe" href="[^"]+/channels/[^/"]+" title="Go to ([^"]+) page">',
              webpage, 'uploader', fatal=False)
  
          categories_html = self._search_regex(
-            r'(?s)><i class="icon icon-tag"></i>\s*Categories / Tags\s*.*?<ul class="list">(.*?)</ul>',
+            r'(?s)><i class="icon icon-tag"></i>\s*Categories / Tags\s*.*?<ul class="[^"]*?list[^"]*?">(.*?)</ul>',
              webpage, 'categories', fatal=False)
          categories = None
          if categories_html:
@@ -59,10 +59,10 @@ class FourTubeIE(InfoExtractor):
                      r'(?s)<li><a.*?>(.*?)</a>', categories_html)]
  
          view_count = str_to_int(self._search_regex(
-            r'<meta itemprop="interactionCount" content="UserPlays:([0-9,]+)">',
+            r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([0-9,]+)">',
              webpage, 'view count', fatal=False))
          like_count = str_to_int(self._search_regex(
-            r'<meta itemprop="interactionCount" content="UserLikes:([0-9,]+)">',
+            r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserLikes:([0-9,]+)">',
              webpage, 'like count', fatal=False))
          duration = parse_duration(self._html_search_meta('duration', webpage))
  
diff --git a/youtube_dl/extractor/fox.py b/youtube_dl/extractor/fox.py

index fa05af50d99ba1e580ba631717515f184bc838de..9f2e5d0652a3266c08e83567a3b0f650ec624720 100644 (file)
--- a/youtube_dl/extractor/fox.py
+++ b/youtube_dl/extractor/fox.py
@@ -1,11 +1,14 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..utils import smuggle_url
+from .adobepass import AdobePassIE
+from ..utils import (
+    smuggle_url,
+    update_url_query,
+)
  
  
-class FOXIE(InfoExtractor):
+class FOXIE(AdobePassIE):
      _VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://www.fox.com/watch/255180355939/7684182528',
@@ -16,6 +19,9 @@ class FOXIE(InfoExtractor):
              'title': 'Official Trailer: Gotham',
              'description': 'Tracing the rise of the great DC Comics Super-Villains and vigilantes, Gotham reveals an entirely new chapter that has never been told.',
              'duration': 129,
+            'timestamp': 1400020798,
+            'upload_date': '20140513',
+            'uploader': 'NEWA-FNG-FOXCOM',
          },
          'add_ie': ['ThePlatform'],
      }
@@ -24,13 +30,26 @@ class FOXIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        release_url = self._parse_json(self._search_regex(
-            r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
-            video_id)['release_url'] + '&switch=http'
+        settings = self._parse_json(self._search_regex(
+            r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
+            webpage, 'drupal settings'), video_id)
+        fox_pdk_player = settings['fox_pdk_player']
+        release_url = fox_pdk_player['release_url']
+        query = {
+            'mbr': 'true',
+            'switch': 'http'
+        }
+        if fox_pdk_player.get('access') == 'locked':
+            ap_p = settings['foxAdobePassProvider']
+            rating = ap_p.get('videoRating')
+            if rating == 'n/a':
+                rating = None
+            resource = self._get_mvpd_resource('fbc-fox', None, ap_p['videoGUID'], rating)
+            query['auth'] = self._extract_mvpd_auth(url, video_id, 'fbc-fox', resource)
  
          return {
              '_type': 'url_transparent',
              'ie_key': 'ThePlatform',
-            'url': smuggle_url(release_url, {'force_smil_url': True}),
+            'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
              'id': video_id,
          }
diff --git a/youtube_dl/extractor/fox9.py b/youtube_dl/extractor/fox9.py

new file mode 100644 (file)

index 0000000..56d9975
--- /dev/null
+++ b/youtube_dl/extractor/fox9.py
@@ -0,0 +1,43 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .anvato import AnvatoIE
+from ..utils import js_to_json
+
+
+class FOX9IE(AnvatoIE):
+    _VALID_URL = r'https?://(?:www\.)?fox9\.com/(?:[^/]+/)+(?P<id>\d+)-story'
+    _TESTS = [{
+        'url': 'http://www.fox9.com/news/215123287-story',
+        'md5': 'd6e1b2572c3bab8a849c9103615dd243',
+        'info_dict': {
+            'id': '314473',
+            'ext': 'mp4',
+            'title': 'Bear climbs tree in downtown Duluth',
+            'description': 'md5:6a36bfb5073a411758a752455408ac90',
+            'duration': 51,
+            'timestamp': 1478123580,
+            'upload_date': '20161102',
+            'uploader': 'EPFOX',
+            'categories': ['News', 'Sports'],
+            'tags': ['news', 'video'],
+        },
+    }, {
+        'url': 'http://www.fox9.com/news/investigators/214070684-story',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        video_id = self._parse_json(
+            self._search_regex(
+                r'AnvatoPlaylist\s*\(\s*(\[.+?\])\s*\)\s*;',
+                webpage, 'anvato playlist'),
+            video_id, transform_source=js_to_json)[0]['video']
+
+        return self._get_anvato_videos(
+            'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b',
+            video_id)
diff --git a/youtube_dl/extractor/foxgay.py b/youtube_dl/extractor/foxgay.py

index 70c1a815d3121bf048da9510a00abf10dc516126..39174fcecca44b54ce42a174f59f3d14fbec2592 100644 (file)
--- a/youtube_dl/extractor/foxgay.py
+++ b/youtube_dl/extractor/foxgay.py
@@ -1,18 +1,24 @@
  from __future__ import unicode_literals
  
+import itertools
+
  from .common import InfoExtractor
+from ..utils import (
+    get_element_by_id,
+    remove_end,
+)
  
  
  class FoxgayIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
      _TEST = {
          'url': 'http://foxgay.com/videos/fuck-turkish-style-2582.shtml',
-        'md5': '80d72beab5d04e1655a56ad37afe6841',
+        'md5': '344558ccfea74d33b7adbce22e577f54',
          'info_dict': {
              'id': '2582',
              'ext': 'mp4',
-            'title': 'md5:6122f7ae0fc6b21ebdf59c5e083ce25a',
-            'description': 'md5:5e51dc4405f1fd315f7927daed2ce5cf',
+            'title': 'Fuck Turkish-style',
+            'description': 'md5:6ae2d9486921891efe89231ace13ffdf',
              'age_limit': 18,
              'thumbnail': 're:https?://.*\.jpg$',
          },
@@ -22,27 +28,35 @@ class FoxgayIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        title = self._html_search_regex(
-            r'<title>(?P<title>.*?)</title>',
-            webpage, 'title', fatal=False)
-        description = self._html_search_regex(
-            r'<div class="ico_desc"><h2>(?P<description>.*?)</h2>',
-            webpage, 'description', fatal=False)
+        title = remove_end(self._html_search_regex(
+            r'<title>([^<]+)</title>', webpage, 'title'), ' - Foxgay.com')
+        description = get_element_by_id('inf_tit', webpage)
  
+        # The default user-agent with foxgay cookies leads to pages without videos
+        self._downloader.cookiejar.clear('.foxgay.com')
          # Find the URL for the iFrame which contains the actual video.
+        iframe_url = self._html_search_regex(
+            r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1', webpage,
+            'video frame', group='url')
          iframe = self._download_webpage(
-            self._html_search_regex(r'iframe src="(?P<frame>.*?)"', webpage, 'video frame'),
-            video_id)
-        video_url = self._html_search_regex(
-            r"v_path = '(?P<vid>http://.*?)'", iframe, 'url')
-        thumb_url = self._html_search_regex(
-            r"t_path = '(?P<thumb>http://.*?)'", iframe, 'thumbnail', fatal=False)
+            iframe_url, video_id, headers={'User-Agent': 'curl/7.50.1'},
+            note='Downloading video frame')
+        video_data = self._parse_json(self._search_regex(
+            r'video_data\s*=\s*([^;]+);', iframe, 'video data'), video_id)
+
+        formats = [{
+            'url': source,
+            'height': resolution,
+        } for source, resolution in zip(
+            video_data['sources'], video_data.get('resolutions', itertools.repeat(None)))]
+
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': title,
-            'url': video_url,
+            'formats': formats,
              'description': description,
-            'thumbnail': thumb_url,
+            'thumbnail': video_data.get('act_vid', {}).get('thumb'),
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/foxnews.py b/youtube_dl/extractor/foxnews.py

index b04da2415246974c4959c6baa7745e550c0c9fa4..229bcb175789ee78b12ae71dbcca811de69d9b65 100644 (file)
--- a/youtube_dl/extractor/foxnews.py
+++ b/youtube_dl/extractor/foxnews.py
@@ -3,11 +3,13 @@ from __future__ import unicode_literals
  import re
  
  from .amp import AMPIE
+from .common import InfoExtractor
  
  
  class FoxNewsIE(AMPIE):
+    IE_NAME = 'foxnews'
      IE_DESC = 'Fox News and Fox Business Video'
-    _VALID_URL = r'https?://(?P<host>video\.fox(?:news|business)\.com)/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)'
+    _VALID_URL = r'https?://(?P<host>video\.(?:insider\.)?fox(?:news|business)\.com)/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)'
      _TESTS = [
          {
              'url': 'http://video.foxnews.com/v/3937480/frozen-in-time/#sp=show-clips',
@@ -49,6 +51,11 @@ class FoxNewsIE(AMPIE):
              'url': 'http://video.foxbusiness.com/v/4442309889001',
              'only_matching': True,
          },
+        {
+            # From http://insider.foxnews.com/2016/08/25/univ-wisconsin-student-group-pushing-silence-certain-words
+            'url': 'http://video.insider.foxnews.com/v/video-embed.html?video_id=5099377331001&autoplay=true&share_url=http://insider.foxnews.com/2016/08/25/univ-wisconsin-student-group-pushing-silence-certain-words&share_title=Student%20Group:%20Saying%20%27Politically%20Correct,%27%20%27Trash%27%20and%20%27Lame%27%20Is%20Offensive&share=true',
+            'only_matching': True,
+        },
      ]
  
      def _real_extract(self, url):
@@ -58,3 +65,76 @@ class FoxNewsIE(AMPIE):
              'http://%s/v/feed/video/%s.js?template=fox' % (host, video_id))
          info['id'] = video_id
          return info
+
+
+class FoxNewsArticleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?foxnews\.com/(?!v)([^/]+/)+(?P<id>[a-z-]+)'
+    IE_NAME = 'foxnews:article'
+
+    _TEST = {
+        'url': 'http://www.foxnews.com/politics/2016/09/08/buzz-about-bud-clinton-camp-denies-claims-wore-earpiece-at-forum.html',
+        'md5': '62aa5a781b308fdee212ebb6f33ae7ef',
+        'info_dict': {
+            'id': '5116295019001',
+            'ext': 'mp4',
+            'title': 'Trump and Clinton asked to defend positions on Iraq War',
+            'description': 'Veterans react on \'The Kelly File\'',
+            'timestamp': 1473299755,
+            'upload_date': '20160908',
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        video_id = self._html_search_regex(
+            r'data-video-id=([\'"])(?P<id>[^\'"]+)\1',
+            webpage, 'video ID', group='id')
+        return self.url_result(
+            'http://video.foxnews.com/v/' + video_id,
+            FoxNewsIE.ie_key())
+
+
+class FoxNewsInsiderIE(InfoExtractor):
+    _VALID_URL = r'https?://insider\.foxnews\.com/([^/]+/)+(?P<id>[a-z-]+)'
+    IE_NAME = 'foxnews:insider'
+
+    _TEST = {
+        'url': 'http://insider.foxnews.com/2016/08/25/univ-wisconsin-student-group-pushing-silence-certain-words',
+        'md5': 'a10c755e582d28120c62749b4feb4c0c',
+        'info_dict': {
+            'id': '5099377331001',
+            'display_id': 'univ-wisconsin-student-group-pushing-silence-certain-words',
+            'ext': 'mp4',
+            'title': 'Student Group: Saying \'Politically Correct,\' \'Trash\' and \'Lame\' Is Offensive',
+            'description': 'Is campus censorship getting out of control?',
+            'timestamp': 1472168725,
+            'upload_date': '20160825',
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'add_ie': [FoxNewsIE.ie_key()],
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        embed_url = self._html_search_meta('embedUrl', webpage, 'embed URL')
+
+        title = self._og_search_title(webpage)
+        description = self._og_search_description(webpage)
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': FoxNewsIE.ie_key(),
+            'url': embed_url,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+        }
diff --git a/youtube_dl/extractor/foxsports.py b/youtube_dl/extractor/foxsports.py

index df7665176ec3827f836e18b8ca46e3fec7c97c3b..a3bb98377cf4feb769d89769c40fe7098ae20743 100644 (file)
--- a/youtube_dl/extractor/foxsports.py
+++ b/youtube_dl/extractor/foxsports.py
@@ -1,7 +1,10 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import smuggle_url
+from ..utils import (
+    smuggle_url,
+    update_url_query,
+)
  
  
  class FoxSportsIE(InfoExtractor):
@@ -9,11 +12,15 @@ class FoxSportsIE(InfoExtractor):
  
      _TEST = {
          'url': 'http://www.foxsports.com/video?vid=432609859715',
+        'md5': 'b49050e955bebe32c301972e4012ac17',
          'info_dict': {
-            'id': 'gA0bHB3Ladz3',
-            'ext': 'flv',
+            'id': 'i0qKWsk3qJaM',
+            'ext': 'mp4',
              'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
              'description': 'Courtney Lee talks about Memphis being focused.',
+            'upload_date': '20150423',
+            'timestamp': 1429761109,
+            'uploader': 'NEWA-FNG-FOXSPORTS',
          },
          'add_ie': ['ThePlatform'],
      }
@@ -28,5 +35,8 @@ class FoxSportsIE(InfoExtractor):
                  r"data-player-config='([^']+)'", webpage, 'data player config'),
              video_id)
  
-        return self.url_result(smuggle_url(
-            config['releaseURL'] + '&manifest=f4m', {'force_smil_url': True}))
+        return self.url_result(smuggle_url(update_url_query(
+            config['releaseURL'], {
+                'mbr': 'true',
+                'switch': 'http',
+            }), {'force_smil_url': True}))
diff --git a/youtube_dl/extractor/franceculture.py b/youtube_dl/extractor/franceculture.py

index e2ca962838932f682f0ac833bd64169ccaba8fc5..56048ffc21e8de8810b7e6b10122cc621927fbba 100644 (file)
--- a/youtube_dl/extractor/franceculture.py
+++ b/youtube_dl/extractor/franceculture.py
@@ -2,104 +2,56 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urlparse,
-)
  from ..utils import (
      determine_ext,
-    int_or_none,
-    ExtractorError,
+    unified_strdate,
  )
  
  
  class FranceCultureIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?franceculture\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?franceculture\.fr/emissions/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TEST = {
-        'url': 'http://www.franceculture.fr/player/reecouter?play=4795174',
+        'url': 'http://www.franceculture.fr/emissions/carnet-nomade/rendez-vous-au-pays-des-geeks',
          'info_dict': {
-            'id': '4795174',
+            'id': 'rendez-vous-au-pays-des-geeks',
+            'display_id': 'rendez-vous-au-pays-des-geeks',
              'ext': 'mp3',
              'title': 'Rendez-vous au pays des geeks',
-            'alt_title': 'Carnet nomade | 13-14',
-            'vcodec': 'none',
+            'thumbnail': 're:^https?://.*\\.jpg$',
              'upload_date': '20140301',
-            'thumbnail': r're:^http://static\.franceculture\.fr/.*/images/player/Carnet-nomade\.jpg$',
-            'description': 'startswith:Avec :Jean-Baptiste Péretié pour son documentaire sur Arte "La revanche',
-            'timestamp': 1393700400,
+            'vcodec': 'none',
          }
      }
  
-    def _extract_from_player(self, url, video_id):
-        webpage = self._download_webpage(url, video_id)
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
  
-        video_path = self._search_regex(
-            r'<a id="player".*?href="([^"]+)"', webpage, 'video path')
-        video_url = compat_urlparse.urljoin(url, video_path)
-        timestamp = int_or_none(self._search_regex(
-            r'<a id="player".*?data-date="([0-9]+)"',
-            webpage, 'upload date', fatal=False))
-        thumbnail = self._search_regex(
-            r'<a id="player".*?>\s+<img src="([^"]+)"',
-            webpage, 'thumbnail', fatal=False)
+        webpage = self._download_webpage(url, display_id)
  
-        display_id = self._search_regex(
-            r'<span class="path-diffusion">emission-(.*?)</span>', webpage, 'display_id')
+        video_url = self._search_regex(
+            r'(?s)<div[^>]+class="[^"]*?title-zone-diffusion[^"]*?"[^>]*>.*?<button[^>]+data-asset-source="([^"]+)"',
+            webpage, 'video path')
  
-        title = self._html_search_regex(
-            r'<span class="title-diffusion">(.*?)</span>', webpage, 'title')
-        alt_title = self._html_search_regex(
-            r'<span class="title">(.*?)</span>',
-            webpage, 'alt_title', fatal=False)
-        description = self._html_search_regex(
-            r'<span class="description">(.*?)</span>',
-            webpage, 'description', fatal=False)
+        title = self._og_search_title(webpage)
  
+        upload_date = unified_strdate(self._search_regex(
+            '(?s)<div[^>]+class="date"[^>]*>.*?<span[^>]+class="inner"[^>]*>([^<]+)<',
+            webpage, 'upload date', fatal=False))
+        thumbnail = self._search_regex(
+            r'(?s)<figure[^>]+itemtype="https://schema.org/ImageObject"[^>]*>.*?<img[^>]+data-dejavu-src="([^"]+)"',
+            webpage, 'thumbnail', fatal=False)
          uploader = self._html_search_regex(
              r'(?s)<div id="emission".*?<span class="author">(.*?)</span>',
              webpage, 'uploader', default=None)
          vcodec = 'none' if determine_ext(video_url.lower()) == 'mp3' else None
  
          return {
-            'id': video_id,
+            'id': display_id,
+            'display_id': display_id,
              'url': video_url,
-            'vcodec': vcodec,
-            'uploader': uploader,
-            'timestamp': timestamp,
              'title': title,
-            'alt_title': alt_title,
              'thumbnail': thumbnail,
-            'description': description,
-            'display_id': display_id,
+            'vcodec': vcodec,
+            'uploader': uploader,
+            'upload_date': upload_date,
          }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        return self._extract_from_player(url, video_id)
-
-
-class FranceCultureEmissionIE(FranceCultureIE):
-    _VALID_URL = r'https?://(?:www\.)?franceculture\.fr/emission-(?P<id>[^?#]+)'
-    _TEST = {
-        'url': 'http://www.franceculture.fr/emission-les-carnets-de-la-creation-jean-gabriel-periot-cineaste-2015-10-13',
-        'info_dict': {
-            'title': 'Jean-Gabriel Périot, cinéaste',
-            'alt_title': 'Les Carnets de la création',
-            'id': '5093239',
-            'display_id': 'les-carnets-de-la-creation-jean-gabriel-periot-cineaste-2015-10-13',
-            'ext': 'mp3',
-            'timestamp': 1444762500,
-            'upload_date': '20151013',
-            'description': 'startswith:Aujourd\'hui dans "Les carnets de la création", le cinéaste',
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-        video_path = self._html_search_regex(
-            r'<a class="rf-player-open".*?href="([^"]+)"', webpage, 'video path', 'no_path_player')
-        if video_path == 'no_path_player':
-            raise ExtractorError('no player : no sound in this page.', expected=True)
-        new_id = self._search_regex('play=(?P<id>[0-9]+)', video_path, 'new_id', group='id')
-        video_url = compat_urlparse.urljoin(url, video_path)
-        return self._extract_from_player(video_url, new_id)
diff --git a/youtube_dl/extractor/franceinter.py b/youtube_dl/extractor/franceinter.py

index 2369f868da4a39b1cf84c7cee6a5830859484082..707b9e00db02104a43e65ed8e0e94a3b2c7211c7 100644 (file)
--- a/youtube_dl/extractor/franceinter.py
+++ b/youtube_dl/extractor/franceinter.py
@@ -2,21 +2,21 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..utils import month_by_name
  
  
  class FranceInterIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?franceinter\.fr/emissions/(?P<id>[^?#]+)'
+
      _TEST = {
-        'url': 'http://www.franceinter.fr/player/reecouter?play=793962',
-        'md5': '4764932e466e6f6c79c317d2e74f6884',
+        'url': 'https://www.franceinter.fr/emissions/affaires-sensibles/affaires-sensibles-07-septembre-2016',
+        'md5': '9e54d7bdb6fdc02a841007f8a975c094',
          'info_dict': {
-            'id': '793962',
+            'id': 'affaires-sensibles/affaires-sensibles-07-septembre-2016',
              'ext': 'mp3',
-            'title': 'L’Histoire dans les jeux vidéo',
-            'description': 'md5:7e93ddb4451e7530022792240a3049c7',
-            'timestamp': 1387369800,
-            'upload_date': '20131218',
+            'title': 'Affaire Cahuzac : le contentieux du compte en Suisse',
+            'description': 'md5:401969c5d318c061f86bda1fa359292b',
+            'upload_date': '20160907',
          },
      }
  
@@ -25,23 +25,30 @@ class FranceInterIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        path = self._search_regex(
-            r'<a id="player".+?href="([^"]+)"', webpage, 'video url')
-        video_url = 'http://www.franceinter.fr/' + path
-
-        title = self._html_search_regex(
-            r'<span class="title-diffusion">(.+?)</span>', webpage, 'title')
-        description = self._html_search_regex(
-            r'<span class="description">(.*?)</span>',
-            webpage, 'description', fatal=False)
-        timestamp = int_or_none(self._search_regex(
-            r'data-date="(\d+)"', webpage, 'upload date', fatal=False))
+        video_url = self._search_regex(
+            r'(?s)<div[^>]+class=["\']page-diffusion["\'][^>]*>.*?<button[^>]+data-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
+            webpage, 'video url', group='url')
+
+        title = self._og_search_title(webpage)
+        description = self._og_search_description(webpage)
+
+        upload_date_str = self._search_regex(
+            r'class=["\']cover-emission-period["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
+            webpage, 'upload date', fatal=False)
+        if upload_date_str:
+            upload_date_list = upload_date_str.split()
+            upload_date_list.reverse()
+            upload_date_list[1] = '%02d' % (month_by_name(upload_date_list[1], lang='fr') or 0)
+            upload_date_list[2] = '%02d' % int(upload_date_list[2])
+            upload_date = ''.join(upload_date_list)
+        else:
+            upload_date = None
  
          return {
              'id': video_id,
              'title': title,
              'description': description,
-            'timestamp': timestamp,
+            'upload_date': upload_date,
              'formats': [{
                  'url': video_url,
                  'vcodec': 'none',
diff --git a/youtube_dl/extractor/francetv.py b/youtube_dl/extractor/francetv.py

index ad94e31f346cc97cd71ad1be9f6983a16b6df209..e7068d1aed9573199211a29a91486bd72e9aecd0 100644 (file)
--- a/youtube_dl/extractor/francetv.py
+++ b/youtube_dl/extractor/francetv.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  
  from __future__ import unicode_literals
  
@@ -14,7 +14,10 @@ from ..utils import (
      parse_duration,
      determine_ext,
  )
-from .dailymotion import DailymotionCloudIE
+from .dailymotion import (
+    DailymotionIE,
+    DailymotionCloudIE,
+)
  
  
  class FranceTVBaseInfoExtractor(InfoExtractor):
@@ -128,7 +131,7 @@ class PluzzIE(FranceTVBaseInfoExtractor):
  
  class FranceTvInfoIE(FranceTVBaseInfoExtractor):
      IE_NAME = 'francetvinfo.fr'
-    _VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
+    _VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/(?:[^/]+/)*(?P<title>[^/?#&.]+)'
  
      _TESTS = [{
          'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
@@ -188,6 +191,24 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
          'params': {
              'skip_download': True,
          },
+    }, {
+        # Dailymotion embed
+        'url': 'http://www.francetvinfo.fr/politique/notre-dame-des-landes/video-sur-france-inter-cecile-duflot-denonce-le-regard-meprisant-de-patrick-cohen_1520091.html',
+        'md5': 'ee7f1828f25a648addc90cb2687b1f12',
+        'info_dict': {
+            'id': 'x4iiko0',
+            'ext': 'mp4',
+            'title': 'NDDL, référendum, Brexit : Cécile Duflot répond à Patrick Cohen',
+            'description': 'Au lendemain de la victoire du "oui" au référendum sur l\'aéroport de Notre-Dame-des-Landes, l\'ancienne ministre écologiste est l\'invitée de Patrick Cohen. Plus d\'info : https://www.franceinter.fr/emissions/le-7-9/le-7-9-27-juin-2016',
+            'timestamp': 1467011958,
+            'upload_date': '20160627',
+            'uploader': 'France Inter',
+            'uploader_id': 'x2q2ez',
+        },
+        'add_ie': ['Dailymotion'],
+    }, {
+        'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -197,7 +218,13 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
  
          dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
          if dmcloud_url:
-            return self.url_result(dmcloud_url, 'DailymotionCloud')
+            return self.url_result(dmcloud_url, DailymotionCloudIE.ie_key())
+
+        dailymotion_urls = DailymotionIE._extract_urls(webpage)
+        if dailymotion_urls:
+            return self.playlist_result([
+                self.url_result(dailymotion_url, DailymotionIE.ie_key())
+                for dailymotion_url in dailymotion_urls])
  
          video_id, catalogue = self._search_regex(
              (r'id-video=([^@]+@[^"]+)',
diff --git a/youtube_dl/extractor/freespeech.py b/youtube_dl/extractor/freespeech.py

index 1477708bbec14c38bf0db7801d09d68a22ff1546..0a70ca76351ab310ba394959b717973ec772f52d 100644 (file)
--- a/youtube_dl/extractor/freespeech.py
+++ b/youtube_dl/extractor/freespeech.py
@@ -8,7 +8,7 @@ from .common import InfoExtractor
  
  class FreespeechIE(InfoExtractor):
      IE_NAME = 'freespeech.org'
-    _VALID_URL = r'https://www\.freespeech\.org/video/(?P<title>.+)'
+    _VALID_URL = r'https?://(?:www\.)?freespeech\.org/video/(?P<title>.+)'
      _TEST = {
          'add_ie': ['Youtube'],
          'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0',
diff --git a/youtube_dl/extractor/funimation.py b/youtube_dl/extractor/funimation.py

index 1eb528f31f4b908b8d832cfe1fd4e1647ef74058..0ad0d9b6a9fe789228487e861139fa2166d88767 100644 (file)
--- a/youtube_dl/extractor/funimation.py
+++ b/youtube_dl/extractor/funimation.py
@@ -2,6 +2,10 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_HTTPError,
+    compat_urllib_parse_unquote_plus,
+)
  from ..utils import (
      clean_html,
      determine_ext,
@@ -27,6 +31,7 @@ class FunimationIE(InfoExtractor):
              'description': 'md5:1769f43cd5fc130ace8fd87232207892',
              'thumbnail': 're:https?://.*\.jpg',
          },
+        'skip': 'Access without user interaction is forbidden by CloudFlare, and video removed',
      }, {
          'url': 'http://www.funimation.com/shows/hacksign/videos/official/role-play',
          'info_dict': {
@@ -37,6 +42,7 @@ class FunimationIE(InfoExtractor):
              'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd',
              'thumbnail': 're:https?://.*\.jpg',
          },
+        'skip': 'Access without user interaction is forbidden by CloudFlare',
      }, {
          'url': 'http://www.funimation.com/shows/attack-on-titan-junior-high/videos/promotional/broadcast-dub-preview',
          'info_dict': {
@@ -47,8 +53,36 @@ class FunimationIE(InfoExtractor):
              'description': 'md5:f8ec49c0aff702a7832cd81b8a44f803',
              'thumbnail': 're:https?://.*\.(?:jpg|png)',
          },
+        'skip': 'Access without user interaction is forbidden by CloudFlare',
      }]
  
+    _LOGIN_URL = 'http://www.funimation.com/login'
+
+    def _download_webpage(self, *args, **kwargs):
+        try:
+            return super(FunimationIE, self)._download_webpage(*args, **kwargs)
+        except ExtractorError as ee:
+            if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
+                response = ee.cause.read()
+                if b'>Please complete the security check to access<' in response:
+                    raise ExtractorError(
+                        'Access to funimation.com is blocked by CloudFlare. '
+                        'Please browse to http://www.funimation.com/, solve '
+                        'the reCAPTCHA, export browser cookies to a text file,'
+                        ' and then try again with --cookies YOUR_COOKIE_FILE.',
+                        expected=True)
+            raise
+
+    def _extract_cloudflare_session_ua(self, url):
+        ci_session_cookie = self._get_cookies(url).get('ci_session')
+        if ci_session_cookie:
+            ci_session = compat_urllib_parse_unquote_plus(ci_session_cookie.value)
+            # ci_session is a string serialized by PHP function serialize()
+            # This case is simple enough to use regular expressions only
+            return self._search_regex(
+                r'"user_agent";s:\d+:"([^"]+)"', ci_session, 'user agent',
+                default=None)
+
      def _login(self):
          (username, password) = self._get_login_info()
          if username is None:
@@ -57,8 +91,11 @@ class FunimationIE(InfoExtractor):
              'email_field': username,
              'password_field': password,
          })
-        login_request = sanitized_Request('http://www.funimation.com/login', data, headers={
-            'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0',
+        user_agent = self._extract_cloudflare_session_ua(self._LOGIN_URL)
+        if not user_agent:
+            user_agent = 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0'
+        login_request = sanitized_Request(self._LOGIN_URL, data, headers={
+            'User-Agent': user_agent,
              'Content-Type': 'application/x-www-form-urlencoded'
          })
          login_page = self._download_webpage(
@@ -103,11 +140,16 @@ class FunimationIE(InfoExtractor):
              ('mobile', 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'),
          )
  
+        user_agent = self._extract_cloudflare_session_ua(url)
+        if user_agent:
+            USER_AGENTS = ((None, user_agent),)
+
          for kind, user_agent in USER_AGENTS:
              request = sanitized_Request(url)
              request.add_header('User-Agent', user_agent)
              webpage = self._download_webpage(
-                request, display_id, 'Downloading %s webpage' % kind)
+                request, display_id,
+                'Downloading %s webpage' % kind if kind else 'Downloading webpage')
  
              playlist = self._parse_json(
                  self._search_regex(
diff --git a/youtube_dl/extractor/funnyordie.py b/youtube_dl/extractor/funnyordie.py

index 4c4a87e2a3337bfb5de955a329c7edba2c338ad2..8c5ffc9e84cec305e9fc813a6366b360b7e36230 100644 (file)
--- a/youtube_dl/extractor/funnyordie.py
+++ b/youtube_dl/extractor/funnyordie.py
@@ -46,8 +46,8 @@ class FunnyOrDieIE(InfoExtractor):
          links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
  
          m3u8_url = self._search_regex(
-            r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8)\1',
-            webpage, 'm3u8 url', default=None, group='url')
+            r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
+            webpage, 'm3u8 url', group='url')
  
          formats = []
  
diff --git a/youtube_dl/extractor/fusion.py b/youtube_dl/extractor/fusion.py

new file mode 100644 (file)

index 0000000..b4ab4cb
--- /dev/null
+++ b/youtube_dl/extractor/fusion.py
@@ -0,0 +1,35 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .ooyala import OoyalaIE
+
+
+class FusionIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?fusion\.net/video/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://fusion.net/video/201781/u-s-and-panamanian-forces-work-together-to-stop-a-vessel-smuggling-drugs/',
+        'info_dict': {
+            'id': 'ZpcWNoMTE6x6uVIIWYpHh0qQDjxBuq5P',
+            'ext': 'mp4',
+            'title': 'U.S. and Panamanian forces work together to stop a vessel smuggling drugs',
+            'description': 'md5:0cc84a9943c064c0f46b128b41b1b0d7',
+            'duration': 140.0,
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': ['Ooyala'],
+    }, {
+        'url': 'http://fusion.net/video/201781',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        ooyala_code = self._search_regex(
+            r'data-video-id=(["\'])(?P<code>.+?)\1',
+            webpage, 'ooyala code', group='code')
+
+        return OoyalaIE._build_url_result(ooyala_code)
diff --git a/youtube_dl/extractor/fxnetworks.py b/youtube_dl/extractor/fxnetworks.py

new file mode 100644 (file)

index 0000000..6298973
--- /dev/null
+++ b/youtube_dl/extractor/fxnetworks.py
@@ -0,0 +1,70 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .adobepass import AdobePassIE
+from ..utils import (
+    update_url_query,
+    extract_attributes,
+    parse_age_limit,
+    smuggle_url,
+)
+
+
+class FXNetworksIE(AdobePassIE):
+    _VALID_URL = r'https?://(?:www\.)?(?:fxnetworks|simpsonsworld)\.com/video/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.fxnetworks.com/video/719841347694',
+        'md5': '1447d4722e42ebca19e5232ab93abb22',
+        'info_dict': {
+            'id': '719841347694',
+            'ext': 'mp4',
+            'title': 'Vanpage',
+            'description': 'F*ck settling down. You\'re the Worst returns for an all new season August 31st on FXX.',
+            'age_limit': 14,
+            'uploader': 'NEWA-FNG-FX',
+            'upload_date': '20160706',
+            'timestamp': 1467844741,
+        },
+        'add_ie': ['ThePlatform'],
+    }, {
+        'url': 'http://www.simpsonsworld.com/video/716094019682',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        if 'The content you are trying to access is not available in your region.' in webpage:
+            self.raise_geo_restricted()
+        video_data = extract_attributes(self._search_regex(
+            r'(<a.+?rel="http://link\.theplatform\.com/s/.+?</a>)', webpage, 'video data'))
+        player_type = self._search_regex(r'playerType\s*=\s*[\'"]([^\'"]+)', webpage, 'player type', default=None)
+        release_url = video_data['rel']
+        title = video_data['data-title']
+        rating = video_data.get('data-rating')
+        query = {
+            'mbr': 'true',
+        }
+        if player_type == 'movies':
+            query.update({
+                'manifest': 'm3u',
+            })
+        else:
+            query.update({
+                'switch': 'http',
+            })
+        if video_data.get('data-req-auth') == '1':
+            resource = self._get_mvpd_resource(
+                video_data['data-channel'], title,
+                video_data.get('data-guid'), rating)
+            query['auth'] = self._extract_mvpd_auth(url, video_id, 'fx', resource)
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'title': title,
+            'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
+            'thumbnail': video_data.get('data-large-thumb'),
+            'age_limit': parse_age_limit(rating),
+            'ie_key': 'ThePlatform',
+        }
diff --git a/youtube_dl/extractor/gamekings.py b/youtube_dl/extractor/gamekings.py

deleted file mode 100644 (file)

index cbcddcb..0000000
--- a/youtube_dl/extractor/gamekings.py
+++ /dev/null
@@ -1,76 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    xpath_text,
-    xpath_with_ns,
-)
-from .youtube import YoutubeIE
-
-
-class GamekingsIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.gamekings\.nl/(?:videos|nieuws)/(?P<id>[^/]+)'
-    _TESTS = [{
-        # YouTube embed video
-        'url': 'http://www.gamekings.nl/videos/phoenix-wright-ace-attorney-dual-destinies-review/',
-        'md5': '5208d3a17adeaef829a7861887cb9029',
-        'info_dict': {
-            'id': 'HkSQKetlGOU',
-            'ext': 'mp4',
-            'title': 'Phoenix Wright: Ace Attorney - Dual Destinies Review',
-            'description': 'md5:db88c0e7f47e9ea50df3271b9dc72e1d',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'uploader_id': 'UCJugRGo4STYMeFr5RoOShtQ',
-            'uploader': 'Gamekings Vault',
-            'upload_date': '20151123',
-        },
-        'add_ie': ['Youtube'],
-    }, {
-        # vimeo video
-        'url': 'http://www.gamekings.nl/videos/the-legend-of-zelda-majoras-mask/',
-        'md5': '12bf04dfd238e70058046937657ea68d',
-        'info_dict': {
-            'id': 'the-legend-of-zelda-majoras-mask',
-            'ext': 'mp4',
-            'title': 'The Legend of Zelda: Majora’s Mask',
-            'description': 'md5:9917825fe0e9f4057601fe1e38860de3',
-            'thumbnail': 're:^https?://.*\.jpg$',
-        },
-    }, {
-        'url': 'http://www.gamekings.nl/nieuws/gamekings-extra-shelly-en-david-bereiden-zich-voor-op-de-livestream/',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        playlist_id = self._search_regex(
-            r'gogoVideo\([^,]+,\s*"([^"]+)', webpage, 'playlist id')
-
-        # Check if a YouTube embed is used
-        if YoutubeIE.suitable(playlist_id):
-            return self.url_result(playlist_id, ie='Youtube')
-
-        playlist = self._download_xml(
-            'http://www.gamekings.tv/wp-content/themes/gk2010/rss_playlist.php?id=%s' % playlist_id,
-            video_id)
-
-        NS_MAP = {
-            'jwplayer': 'http://rss.jwpcdn.com/'
-        }
-
-        item = playlist.find('./channel/item')
-
-        thumbnail = xpath_text(item, xpath_with_ns('./jwplayer:image', NS_MAP), 'thumbnail')
-        video_url = item.find(xpath_with_ns('./jwplayer:source', NS_MAP)).get('file')
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'title': self._og_search_title(webpage),
-            'description': self._og_search_description(webpage),
-            'thumbnail': thumbnail,
-        }
diff --git a/youtube_dl/extractor/gamespot.py b/youtube_dl/extractor/gamespot.py

index 4ffdd75157486957810f718cb1019cdc5dd80f4f..4e859e09aa16b7608ee103e851f0d0928bbfdb30 100644 (file)
--- a/youtube_dl/extractor/gamespot.py
+++ b/youtube_dl/extractor/gamespot.py
@@ -1,19 +1,19 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
-from .common import InfoExtractor
+from .once import OnceIE
  from ..compat import (
      compat_urllib_parse_unquote,
-    compat_urlparse,
  )
  from ..utils import (
      unescapeHTML,
+    url_basename,
+    dict_get,
  )
  
  
-class GameSpotIE(InfoExtractor):
+class GameSpotIE(OnceIE):
      _VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
      _TESTS = [{
          'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
@@ -28,10 +28,13 @@ class GameSpotIE(InfoExtractor):
          'url': 'http://www.gamespot.com/videos/the-witcher-3-wild-hunt-xbox-one-now-playing/2300-6424837/',
          'info_dict': {
              'id': 'gs-2300-6424837',
-            'ext': 'flv',
-            'title': 'The Witcher 3: Wild Hunt [Xbox ONE]  - Now Playing',
+            'ext': 'mp4',
+            'title': 'Now Playing - The Witcher 3: Wild Hunt',
              'description': 'Join us as we take a look at the early hours of The Witcher 3: Wild Hunt and more.',
          },
+        'params': {
+            'skip_download': True,  # m3u8 downloads
+        },
      }]
  
      def _real_extract(self, url):
@@ -39,29 +42,73 @@ class GameSpotIE(InfoExtractor):
          webpage = self._download_webpage(url, page_id)
          data_video_json = self._search_regex(
              r'data-video=["\'](.*?)["\']', webpage, 'data video')
-        data_video = json.loads(unescapeHTML(data_video_json))
+        data_video = self._parse_json(unescapeHTML(data_video_json), page_id)
          streams = data_video['videoStreams']
  
+        manifest_url = None
          formats = []
          f4m_url = streams.get('f4m_stream')
-        if f4m_url is not None:
-            # Transform the manifest url to a link to the mp4 files
-            # they are used in mobile devices.
-            f4m_path = compat_urlparse.urlparse(f4m_url).path
-            QUALITIES_RE = r'((,\d+)+,?)'
-            qualities = self._search_regex(QUALITIES_RE, f4m_path, 'qualities').strip(',').split(',')
-            http_path = f4m_path[1:].split('/', 1)[1]
-            http_template = re.sub(QUALITIES_RE, r'%s', http_path)
-            http_template = http_template.replace('.csmil/manifest.f4m', '')
-            http_template = compat_urlparse.urljoin(
-                'http://video.gamespotcdn.com/', http_template)
-            for q in qualities:
-                formats.append({
-                    'url': http_template % q,
-                    'ext': 'mp4',
-                    'format_id': q,
-                })
-        else:
+        if f4m_url:
+            manifest_url = f4m_url
+            formats.extend(self._extract_f4m_formats(
+                f4m_url + '?hdcore=3.7.0', page_id, f4m_id='hds', fatal=False))
+        m3u8_url = streams.get('m3u8_stream')
+        if m3u8_url:
+            manifest_url = m3u8_url
+            m3u8_formats = self._extract_m3u8_formats(
+                m3u8_url, page_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False)
+            formats.extend(m3u8_formats)
+        progressive_url = dict_get(
+            streams, ('progressive_hd', 'progressive_high', 'progressive_low'))
+        if progressive_url and manifest_url:
+            qualities_basename = self._search_regex(
+                '/([^/]+)\.csmil/',
+                manifest_url, 'qualities basename', default=None)
+            if qualities_basename:
+                QUALITIES_RE = r'((,\d+)+,?)'
+                qualities = self._search_regex(
+                    QUALITIES_RE, qualities_basename,
+                    'qualities', default=None)
+                if qualities:
+                    qualities = list(map(lambda q: int(q), qualities.strip(',').split(',')))
+                    qualities.sort()
+                    http_template = re.sub(QUALITIES_RE, r'%d', qualities_basename)
+                    http_url_basename = url_basename(progressive_url)
+                    if m3u8_formats:
+                        self._sort_formats(m3u8_formats)
+                        m3u8_formats = list(filter(
+                            lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
+                            m3u8_formats))
+                    if len(qualities) == len(m3u8_formats):
+                        for q, m3u8_format in zip(qualities, m3u8_formats):
+                            f = m3u8_format.copy()
+                            f.update({
+                                'url': progressive_url.replace(
+                                    http_url_basename, http_template % q),
+                                'format_id': f['format_id'].replace('hls', 'http'),
+                                'protocol': 'http',
+                            })
+                            formats.append(f)
+                    else:
+                        for q in qualities:
+                            formats.append({
+                                'url': progressive_url.replace(
+                                    http_url_basename, http_template % q),
+                                'ext': 'mp4',
+                                'format_id': 'http-%d' % q,
+                                'tbr': q,
+                            })
+
+        onceux_json = self._search_regex(
+            r'data-onceux-options=["\'](.*?)["\']', webpage, 'data video', default=None)
+        if onceux_json:
+            onceux_url = self._parse_json(unescapeHTML(onceux_json), page_id).get('metadataUri')
+            if onceux_url:
+                formats.extend(self._extract_once_formats(re.sub(
+                    r'https?://[^/]+', 'http://once.unicornmedia.com', onceux_url).replace('ads/vmap/', '')))
+
+        if not formats:
              for quality in ['sd', 'hd']:
                  # It's actually a link to a flv file
                  flv_url = streams.get('f4m_{0}'.format(quality))
@@ -71,6 +118,7 @@ class GameSpotIE(InfoExtractor):
                          'ext': 'flv',
                          'format_id': quality,
                      })
+        self._sort_formats(formats)
  
          return {
              'id': data_video['guid'],
diff --git a/youtube_dl/extractor/gamestar.py b/youtube_dl/extractor/gamestar.py

index 69058a5835f2bac0d1e56ce0917909df0fb9a92b..55a34604af2cd2bca83ebc2c7957f1f4eb7401f1 100644 (file)
--- a/youtube_dl/extractor/gamestar.py
+++ b/youtube_dl/extractor/gamestar.py
@@ -1,19 +1,15 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
-    parse_duration,
-    str_to_int,
-    unified_strdate,
+    remove_end,
  )
  
  
  class GameStarIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
+    _VALID_URL = r'https?://(?:www\.)?gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
      _TEST = {
          'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
          'md5': '96974ecbb7fd8d0d20fca5a00810cea7',
@@ -21,8 +17,9 @@ class GameStarIE(InfoExtractor):
              'id': '76110',
              'ext': 'mp4',
              'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
-            'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den vollständigen Trailer an.',
-            'thumbnail': 'http://images.gamestar.de/images/idgwpgsgp/bdb/2494525/600x.jpg',
+            'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'timestamp': 1406542020,
              'upload_date': '20140728',
              'duration': 17
          }
@@ -32,41 +29,27 @@ class GameStarIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        og_title = self._og_search_title(webpage)
-        title = re.sub(r'\s*- Video (bei|-) GameStar\.de$', '', og_title)
-
          url = 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id
  
-        description = self._og_search_description(webpage).strip()
-
-        thumbnail = self._proto_relative_url(
-            self._og_search_thumbnail(webpage), scheme='http:')
-
-        upload_date = unified_strdate(self._html_search_regex(
-            r'<span style="float:left;font-size:11px;">Datum: ([0-9]+\.[0-9]+\.[0-9]+)&nbsp;&nbsp;',
-            webpage, 'upload_date', fatal=False))
-
-        duration = parse_duration(self._html_search_regex(
-            r'&nbsp;&nbsp;Länge: ([0-9]+:[0-9]+)</span>', webpage, 'duration',
-            fatal=False))
-
-        view_count = str_to_int(self._html_search_regex(
-            r'&nbsp;&nbsp;Zuschauer: ([0-9\.]+)&nbsp;&nbsp;', webpage,
-            'view_count', fatal=False))
+        # TODO: there are multiple ld+json objects in the webpage,
+        # while _search_json_ld finds only the first one
+        json_ld = self._parse_json(self._search_regex(
+            r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>[^<]+VideoObject[^<]+)</script>',
+            webpage, 'JSON-LD', group='json_ld'), video_id)
+        info_dict = self._json_ld(json_ld, video_id)
+        info_dict['title'] = remove_end(info_dict['title'], ' - GameStar')
  
+        view_count = json_ld.get('interactionCount')
          comment_count = int_or_none(self._html_search_regex(
-            r'>Kommentieren \(([0-9]+)\)</a>', webpage, 'comment_count',
+            r'([0-9]+) Kommentare</span>', webpage, 'comment_count',
              fatal=False))
  
-        return {
+        info_dict.update({
              'id': video_id,
-            'title': title,
              'url': url,
              'ext': 'mp4',
-            'thumbnail': thumbnail,
-            'description': description,
-            'upload_date': upload_date,
-            'duration': duration,
              'view_count': view_count,
              'comment_count': comment_count
-        }
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/gametrailers.py b/youtube_dl/extractor/gametrailers.py

deleted file mode 100644 (file)

index 1e7948a..0000000
--- a/youtube_dl/extractor/gametrailers.py
+++ /dev/null
@@ -1,62 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    parse_age_limit,
-    url_basename,
-)
-
-
-class GametrailersIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.gametrailers\.com/videos/view/[^/]+/(?P<id>.+)'
-
-    _TEST = {
-        'url': 'http://www.gametrailers.com/videos/view/gametrailers-com/116437-Just-Cause-3-Review',
-        'md5': 'f28c4efa0bdfaf9b760f6507955b6a6a',
-        'info_dict': {
-            'id': '2983958',
-            'ext': 'mp4',
-            'display_id': '116437-Just-Cause-3-Review',
-            'title': 'Just Cause 3 - Review',
-            'description': 'It\'s a lot of fun to shoot at things and then watch them explode in Just Cause 3, but should there be more to the experience than that?',
-        },
-    }
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-        title = self._html_search_regex(
-            r'<title>(.+?)\|', webpage, 'title').strip()
-        embed_url = self._proto_relative_url(
-            self._search_regex(
-                r'src=\'(//embed.gametrailers.com/embed/[^\']+)\'', webpage,
-                'embed url'),
-            scheme='http:')
-        video_id = url_basename(embed_url)
-        embed_page = self._download_webpage(embed_url, video_id)
-        embed_vars_json = self._search_regex(
-            r'(?s)var embedVars = (\{.*?\})\s*</script>', embed_page,
-            'embed vars')
-        info = self._parse_json(embed_vars_json, video_id)
-
-        formats = []
-        for media in info['media']:
-            if media['mediaPurpose'] == 'play':
-                formats.append({
-                    'url': media['uri'],
-                    'height': media['height'],
-                    'width:': media['width'],
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'formats': formats,
-            'thumbnail': info.get('thumbUri'),
-            'description': self._og_search_description(webpage),
-            'duration': int_or_none(info.get('videoLengthInSeconds')),
-            'age_limit': parse_age_limit(info.get('audienceRating')),
-        }
diff --git a/youtube_dl/extractor/gazeta.py b/youtube_dl/extractor/gazeta.py

index ea32b621c390c390e22ebf8a6010304466700a4a..18ef5c252a9adc0ac2a1e6ae6806d2ea9b5b2546 100644 (file)
--- a/youtube_dl/extractor/gazeta.py
+++ b/youtube_dl/extractor/gazeta.py
@@ -7,7 +7,7 @@ from .common import InfoExtractor
  
  
  class GazetaIE(InfoExtractor):
-    _VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:(?:main|\d{4}/\d{2}/\d{2})/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
+    _VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:main/)*(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
      _TESTS = [{
          'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml',
          'md5': 'd49c9bdc6e5a7888f27475dc215ee789',
@@ -18,9 +18,19 @@ class GazetaIE(InfoExtractor):
              'description': 'md5:38617526050bd17b234728e7f9620a71',
              'thumbnail': 're:^https?://.*\.jpg',
          },
+        'skip': 'video not found',
      }, {
          'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml',
          'only_matching': True,
+    }, {
+        'url': 'http://www.gazeta.ru/video/main/main/2015/06/22/platit_ili_ne_platit_po_isku_yukosa.shtml',
+        'md5': '37f19f78355eb2f4256ee1688359f24c',
+        'info_dict': {
+            'id': '252048',
+            'ext': 'mp4',
+            'title': '"Если по иску ЮКОСа придется платить, это будет большой удар по бюджету"',
+        },
+        'add_ie': ['EaglePlatform'],
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/gdcvault.py b/youtube_dl/extractor/gdcvault.py

index 59ed4c38f654f75c2217ce04e1f350295c871de7..3136427db39a2f1739fa0a791bd2cc85f1eedd02 100644 (file)
--- a/youtube_dl/extractor/gdcvault.py
+++ b/youtube_dl/extractor/gdcvault.py
@@ -4,7 +4,6 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
-    remove_end,
      HEADRequest,
      sanitized_Request,
      urlencode_postdata,
@@ -51,63 +50,33 @@ class GDCVaultIE(InfoExtractor):
          {
              'url': 'http://gdcvault.com/play/1020791/',
              'only_matching': True,
-        }
+        },
+        {
+            # Hard-coded hostname
+            'url': 'http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface',
+            'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
+            'info_dict': {
+                'id': '1023460',
+                'ext': 'mp4',
+                'display_id': 'Tenacious-Design-and-The-Interface',
+                'title': 'Tenacious Design and The Interface of \'Destiny\'',
+            },
+        },
+        {
+            # Multiple audios
+            'url': 'http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC',
+            'info_dict': {
+                'id': '1014631',
+                'ext': 'flv',
+                'title': 'How to Create a Good Game - From My Experience of Designing Pac-Man',
+            },
+            'params': {
+                'skip_download': True,  # Requires rtmpdump
+                'format': 'jp',  # The japanese audio
+            }
+        },
      ]
  
-    def _parse_mp4(self, xml_description):
-        video_formats = []
-        mp4_video = xml_description.find('./metadata/mp4video')
-        if mp4_video is None:
-            return None
-
-        mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video.text)
-        video_root = mobj.group('root')
-        formats = xml_description.findall('./metadata/MBRVideos/MBRVideo')
-        for format in formats:
-            mobj = re.match(r'mp4\:(?P<path>.*)', format.find('streamName').text)
-            url = video_root + mobj.group('path')
-            vbr = format.find('bitrate').text
-            video_formats.append({
-                'url': url,
-                'vbr': int(vbr),
-            })
-        return video_formats
-
-    def _parse_flv(self, xml_description):
-        formats = []
-        akamai_url = xml_description.find('./metadata/akamaiHost').text
-        audios = xml_description.find('./metadata/audios')
-        if audios is not None:
-            for audio in audios:
-                formats.append({
-                    'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
-                    'play_path': remove_end(audio.get('url'), '.flv'),
-                    'ext': 'flv',
-                    'vcodec': 'none',
-                    'format_id': audio.get('code'),
-                })
-        slide_video_path = xml_description.find('./metadata/slideVideo').text
-        formats.append({
-            'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
-            'play_path': remove_end(slide_video_path, '.flv'),
-            'ext': 'flv',
-            'format_note': 'slide deck video',
-            'quality': -2,
-            'preference': -2,
-            'format_id': 'slides',
-        })
-        speaker_video_path = xml_description.find('./metadata/speakerVideo').text
-        formats.append({
-            'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
-            'play_path': remove_end(speaker_video_path, '.flv'),
-            'ext': 'flv',
-            'format_note': 'speaker video',
-            'quality': -1,
-            'preference': -1,
-            'format_id': 'speaker',
-        })
-        return formats
-
      def _login(self, webpage_url, display_id):
          (username, password) = self._get_login_info()
          if username is None or password is None:
@@ -159,9 +128,10 @@ class GDCVaultIE(InfoExtractor):
                  'title': title,
              }
  
+        PLAYER_REGEX = r'<iframe src="(?P<xml_root>.+?)/player.*?\.html.*?".*?</iframe>'
+
          xml_root = self._html_search_regex(
-            r'<iframe src="(?P<xml_root>.*?)player.html.*?".*?</iframe>',
-            start_page, 'xml root', default=None)
+            PLAYER_REGEX, start_page, 'xml root', default=None)
          if xml_root is None:
              # Probably need to authenticate
              login_res = self._login(webpage_url, display_id)
@@ -171,27 +141,21 @@ class GDCVaultIE(InfoExtractor):
                  start_page = login_res
                  # Grab the url from the authenticated page
                  xml_root = self._html_search_regex(
-                    r'<iframe src="(.*?)player.html.*?".*?</iframe>',
-                    start_page, 'xml root')
+                    PLAYER_REGEX, start_page, 'xml root')
  
          xml_name = self._html_search_regex(
              r'<iframe src=".*?\?xml=(.+?\.xml).*?".*?</iframe>',
              start_page, 'xml filename', default=None)
          if xml_name is None:
              # Fallback to the older format
-            xml_name = self._html_search_regex(r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', start_page, 'xml filename')
-
-        xml_description_url = xml_root + 'xml/' + xml_name
-        xml_description = self._download_xml(xml_description_url, display_id)
-
-        video_title = xml_description.find('./metadata/title').text
-        video_formats = self._parse_mp4(xml_description)
-        if video_formats is None:
-            video_formats = self._parse_flv(xml_description)
+            xml_name = self._html_search_regex(
+                r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>',
+                start_page, 'xml filename')
  
          return {
+            '_type': 'url_transparent',
              'id': video_id,
              'display_id': display_id,
-            'title': video_title,
-            'formats': video_formats,
+            'url': '%s/xml/%s' % (xml_root, xml_name),
+            'ie_key': 'DigitallySpeaking',
          }
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index f3de738f765819da7cbda1332e9b6b6dcada5052..bde65fa270fb399140e85ac63395060bd7007d2e 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  
  from __future__ import unicode_literals
  
@@ -27,7 +27,6 @@ from ..utils import (
      unified_strdate,
      unsmuggle_url,
      UnsupportedError,
-    url_basename,
      xpath_text,
  )
  from .brightcove import (
@@ -48,10 +47,15 @@ from .svt import SVTIE
  from .pornhub import PornHubIE
  from .xhamster import XHamsterEmbedIE
  from .tnaflix import TNAFlixNetworkEmbedIE
+from .drtuber import DrTuberIE
+from .redtube import RedTubeIE
  from .vimeo import VimeoIE
-from .dailymotion import DailymotionCloudIE
+from .dailymotion import (
+    DailymotionIE,
+    DailymotionCloudIE,
+)
  from .onionstudios import OnionStudiosIE
-from .snagfilms import SnagFilmsEmbedIE
+from .viewlift import ViewLiftEmbedIE
  from .screenwavemedia import ScreenwaveMediaIE
  from .mtv import MTVServicesEmbeddedIE
  from .pladform import PladformIE
@@ -59,7 +63,18 @@ from .videomore import VideomoreIE
  from .googledrive import GoogleDriveIE
  from .jwplatform import JWPlatformIE
  from .digiteka import DigitekaIE
+from .arkena import ArkenaIE
  from .instagram import InstagramIE
+from .liveleak import LiveLeakIE
+from .threeqsdn import ThreeQSDNIE
+from .theplatform import ThePlatformIE
+from .vessel import VesselIE
+from .kaltura import KalturaIE
+from .eagleplatform import EaglePlatformIE
+from .facebook import FacebookIE
+from .soundcloud import SoundcloudIE
+from .vbox7 import Vbox7IE
+from .dbtv import DBTVIE
  
  
  class GenericIE(InfoExtractor):
@@ -90,7 +105,8 @@ class GenericIE(InfoExtractor):
              },
              'expected_warnings': [
                  'URL could be a direct video link, returning it as such.'
-            ]
+            ],
+            'skip': 'URL invalid',
          },
          # Direct download with broken HEAD
          {
@@ -104,7 +120,8 @@ class GenericIE(InfoExtractor):
                  'skip_download': True,  # infinite live stream
              },
              'expected_warnings': [
-                r'501.*Not Implemented'
+                r'501.*Not Implemented',
+                r'400.*Bad Request',
              ],
          },
          # Direct link with incorrect MIME type
@@ -235,6 +252,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'car-20120827-manifest',
                  'formats': 'mincount:9',
+                'upload_date': '20130904',
              },
              'params': {
                  'format': 'bestvideo',
@@ -252,7 +270,8 @@ class GenericIE(InfoExtractor):
              'params': {
                  # m3u8 downloads
                  'skip_download': True,
-            }
+            },
+            'skip': 'video gone',
          },
          # m3u8 served with Content-Type: text/plain
          {
@@ -267,7 +286,8 @@ class GenericIE(InfoExtractor):
              'params': {
                  # m3u8 downloads
                  'skip_download': True,
-            }
+            },
+            'skip': 'video gone',
          },
          # google redirect
          {
@@ -352,6 +372,7 @@ class GenericIE(InfoExtractor):
                  'description': 'Mississauga resident David Farmer is still out of power as a result of the ice storm a month ago. To keep the house warm, Farmer cuts wood from his property for a wood burning stove downstairs.',
              },
              'add_ie': ['BrightcoveLegacy'],
+            'skip': 'video gone',
          },
          {
              'url': 'http://www.championat.com/video/football/v/87/87499.html',
@@ -405,19 +426,7 @@ class GenericIE(InfoExtractor):
              'params': {
                  'skip_download': True,
              },
-        },
-        # multiple ooyala embeds on SBN network websites
-        {
-            'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
-            'info_dict': {
-                'id': 'national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
-                'title': '25 lies you will tell yourself on National Signing Day - SBNation.com',
-            },
-            'playlist_mincount': 3,
-            'params': {
-                'skip_download': True,
-            },
-            'add_ie': ['Ooyala'],
+            'skip': 'movie expired',
          },
          # embed.ly video
          {
@@ -445,6 +454,8 @@ class GenericIE(InfoExtractor):
                  'title': 'Between Two Ferns with Zach Galifianakis: President Barack Obama',
                  'description': 'Episode 18: President Barack Obama sits down with Zach Galifianakis for his most memorable interview yet.',
              },
+            # HEAD requests lead to endless 301, while GET is OK
+            'expected_warnings': ['301'],
          },
          # RUTV embed
          {
@@ -474,7 +485,7 @@ class GenericIE(InfoExtractor):
              'url': 'http://www.vestifinance.ru/articles/25753',
              'info_dict': {
                  'id': '25753',
-                'title': 'Ð\92ÐµÑ\81Ñ\82Ð¸ ÐÐºÐ¾Ð½Ð¾Ð¼Ð¸ÐºÐ° â\80\95 Ð\9fÑ\80Ñ\8fÐ¼Ñ\8bÐµ Ñ\82Ñ\80Ð°Ð½Ñ\81Ð»Ñ\8fÑ\86Ð¸Ð¸ Ñ\81 Ð¤Ð¾Ñ\80Ñ\83Ð¼Ð°-Ð²Ñ\8bÑ\81Ñ\82Ð°Ð²ÐºÐ¸ "Ð\93Ð¾Ñ\81Ð·Ð°ÐºÐ°Ð·-2013"',
+                'title': 'Прямые трансляции с Форума-выставки "Госзаказ-2013"',
              },
              'playlist': [{
                  'info_dict': {
@@ -519,6 +530,9 @@ class GenericIE(InfoExtractor):
                  'title': '[NSFL] [FM15] which pumiscer was this ( vid ) ( alfa as fuck srx )',
              },
              'playlist_mincount': 7,
+            # This forum does not allow <iframe> syntaxes anymore
+            # Now HTML tags are displayed as-is
+            'skip': 'No videos on this page',
          },
          # Embedded TED video
          {
@@ -567,7 +581,8 @@ class GenericIE(InfoExtractor):
              },
              'params': {
                  'skip_download': 'Requires rtmpdump'
-            }
+            },
+            'skip': 'video gone',
          },
          # francetv embed
          {
@@ -607,7 +622,11 @@ class GenericIE(InfoExtractor):
                  'id': 'k2mm4bCdJ6CQ2i7c8o2',
                  'ext': 'mp4',
                  'title': 'Le Zap de Spi0n n°216 - Zapping du Web',
+                'description': 'md5:faf028e48a461b8b7fad38f1e104b119',
                  'uploader': 'Spi0n',
+                'uploader_id': 'xgditw',
+                'upload_date': '20140425',
+                'timestamp': 1398441542,
              },
              'add_ie': ['Dailymotion'],
          },
@@ -630,13 +649,15 @@ class GenericIE(InfoExtractor):
          },
          # MTVSercices embed
          {
-            'url': 'http://www.gametrailers.com/news-post/76093/north-america-europe-is-getting-that-mario-kart-8-mercedes-dlc-too',
-            'md5': '35727f82f58c76d996fc188f9755b0d5',
+            'url': 'http://www.vulture.com/2016/06/new-key-peele-sketches-released.html',
+            'md5': 'ca1aef97695ef2c1d6973256a57e5252',
              'info_dict': {
-                'id': '0306a69b-8adf-4fb5-aace-75f8e8cbfca9',
+                'id': '769f7ec0-0692-4d62-9b45-0d88074bffc1',
                  'ext': 'mp4',
-                'title': 'Review',
-                'description': 'Mario\'s life in the fast lane has never looked so good.',
+                'title': 'Key and Peele|October 10, 2012|2|203|Liam Neesons - Uncensored',
+                'description': 'Two valets share their love for movie star Liam Neesons.',
+                'timestamp': 1349922600,
+                'upload_date': '20121011',
              },
          },
          # YouTube embed via <data-embed-url="">
@@ -722,15 +743,18 @@ class GenericIE(InfoExtractor):
          },
          # Wistia embed
          {
-            'url': 'http://education-portal.com/academy/lesson/north-american-exploration-failed-colonies-of-spain-france-england.html#lesson',
-            'md5': '8788b683c777a5cf25621eaf286d0c23',
+            'url': 'http://study.com/academy/lesson/north-american-exploration-failed-colonies-of-spain-france-england.html#lesson',
+            'md5': '1953f3a698ab51cfc948ed3992a0b7ff',
              'info_dict': {
-                'id': '1cfaf6b7ea',
+                'id': '6e2wtrbdaf',
                  'ext': 'mov',
-                'title': 'md5:51364a8d3d009997ba99656004b5e20d',
-                'duration': 643.0,
-                'filesize': 182808282,
-                'uploader': 'education-portal.com',
+                'title': 'paywall_north-american-exploration-failed-colonies-of-spain-france-england',
+                'description': 'a Paywall Videos video from Remilon',
+                'duration': 644.072,
+                'uploader': 'study.com',
+                'timestamp': 1459678540,
+                'upload_date': '20160403',
+                'filesize': 24687186,
              },
          },
          {
@@ -739,11 +763,30 @@ class GenericIE(InfoExtractor):
              'info_dict': {
                  'id': 'uxjb0lwrcz',
                  'ext': 'mp4',
-                'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks',
+                'title': 'Conversation about Hexagonal Rails Part 1',
+                'description': 'a Martin Fowler video from ThoughtWorks',
                  'duration': 1715.0,
                  'uploader': 'thoughtworks.wistia.com',
+                'timestamp': 1401832161,
+                'upload_date': '20140603',
              },
          },
+        # Wistia standard embed (async)
+        {
+            'url': 'https://www.getdrip.com/university/brennan-dunn-drip-workshop/',
+            'info_dict': {
+                'id': '807fafadvk',
+                'ext': 'mp4',
+                'title': 'Drip Brennan Dunn Workshop',
+                'description': 'a JV Webinars video from getdrip-1',
+                'duration': 4986.95,
+                'timestamp': 1463607249,
+                'upload_date': '20160518',
+            },
+            'params': {
+                'skip_download': True,
+            }
+        },
          # Soundcloud embed
          {
              'url': 'http://nakedsecurity.sophos.com/2014/10/29/sscc-171-are-you-sure-that-1234-is-a-bad-password-podcast/',
@@ -756,6 +799,15 @@ class GenericIE(InfoExtractor):
                  'upload_date': '20141029',
              }
          },
+        # Soundcloud multiple embeds
+        {
+            'url': 'http://www.guitarplayer.com/lessons/1014/legato-workout-one-hour-to-more-fluid-performance---tab/52809',
+            'info_dict': {
+                'id': '52809',
+                'title': 'Guitar Essentials: Legato Workout—One-Hour to Fluid Performance  | TAB + AUDIO',
+            },
+            'playlist_mincount': 7,
+        },
          # Livestream embed
          {
              'url': 'http://www.esa.int/Our_Activities/Space_Science/Rosetta/Philae_comet_touch-down_webcast',
@@ -766,6 +818,19 @@ class GenericIE(InfoExtractor):
                  'title': 'Rosetta #CometLanding webcast HL 10',
              }
          },
+        # Another Livestream embed, without 'new.' in URL
+        {
+            'url': 'https://www.freespeech.org/',
+            'info_dict': {
+                'id': '123537347',
+                'ext': 'mp4',
+                'title': 're:^FSTV [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+            },
+            'params': {
+                # Live stream
+                'skip_download': True,
+            },
+        },
          # LazyYT
          {
              'url': 'http://discourse.ubuntu.com/t/unity-8-desktop-mode-windows-on-mir/1986',
@@ -818,6 +883,7 @@ class GenericIE(InfoExtractor):
                  'description': 'md5:601cb790edd05908957dae8aaa866465',
                  'upload_date': '20150220',
              },
+            'skip': 'All The Daily Show URLs now redirect to http://www.cc.com/shows/',
          },
          # jwplayer YouTube
          {
@@ -850,18 +916,6 @@ class GenericIE(InfoExtractor):
                  'title': 'EP3S5 - Bon Appétit - Baqueira Mi Corazon !',
              }
          },
-        # Kaltura embed
-        {
-            'url': 'http://www.monumentalnetwork.com/videos/john-carlson-postgame-2-25-15',
-            'info_dict': {
-                'id': '1_eergr3h1',
-                'ext': 'mp4',
-                'upload_date': '20150226',
-                'uploader_id': 'MonumentalSports-Kaltura@perfectsensedigital.com',
-                'timestamp': int,
-                'title': 'John Carlson Postgame 2/25/15',
-            },
-        },
          # Kaltura embed (different embed code)
          {
              'url': 'http://www.premierchristianradio.com/Shows/Saturday/Unbelievable/Conference-Videos/Os-Guinness-Is-It-Fools-Talk-Unbelievable-Conference-2014',
@@ -887,9 +941,41 @@ class GenericIE(InfoExtractor):
                  'uploader_id': 'echojecka',
              },
          },
+        # Kaltura embed with single quotes
+        {
+            'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY',
+            'info_dict': {
+                'id': '0_izeg5utt',
+                'ext': 'mp4',
+                'title': '35871',
+                'timestamp': 1355743100,
+                'upload_date': '20121217',
+                'uploader_id': 'batchUser',
+            },
+            'add_ie': ['Kaltura'],
+        },
+        {
+            # Kaltura embedded via quoted entry_id
+            'url': 'https://www.oreilly.com/ideas/my-cloud-makes-pretty-pictures',
+            'info_dict': {
+                'id': '0_utuok90b',
+                'ext': 'mp4',
+                'title': '06_matthew_brender_raj_dutt',
+                'timestamp': 1466638791,
+                'upload_date': '20160622',
+            },
+            'add_ie': ['Kaltura'],
+            'expected_warnings': [
+                'Could not send HEAD request'
+            ],
+            'params': {
+                'skip_download': True,
+            }
+        },
          # Eagle.Platform embed (generic URL)
          {
              'url': 'http://lenta.ru/news/2015/03/06/navalny/',
+            # Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
              'info_dict': {
                  'id': '227304',
                  'ext': 'mp4',
@@ -904,6 +990,7 @@ class GenericIE(InfoExtractor):
          # ClipYou (Eagle.Platform) embed (custom URL)
          {
              'url': 'http://muz-tv.ru/play/7129/',
+            # Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
              'info_dict': {
                  'id': '12820',
                  'ext': 'mp4',
@@ -992,18 +1079,36 @@ class GenericIE(InfoExtractor):
                  'ext': 'flv',
                  'title': "PFT Live: New leader in the 'new-look' defense",
                  'description': 'md5:65a19b4bbfb3b0c0c5768bed1dfad74e',
+                'uploader': 'NBCU-SPORTS',
+                'upload_date': '20140107',
+                'timestamp': 1389118457,
+            },
+        },
+        # NBC News embed
+        {
+            'url': 'http://www.vulture.com/2016/06/letterman-couldnt-care-less-about-late-night.html',
+            'md5': '1aa589c675898ae6d37a17913cf68d66',
+            'info_dict': {
+                'id': '701714499682',
+                'ext': 'mp4',
+                'title': 'PREVIEW: On Assignment: David Letterman',
+                'description': 'A preview of Tom Brokaw\'s interview with David Letterman as part of the On Assignment series powered by Dateline. Airs Sunday June 12 at 7/6c.',
              },
          },
          # UDN embed
          {
-            'url': 'http://www.udn.com/news/story/7314/822787',
+            'url': 'https://video.udn.com/news/300346',
              'md5': 'fd2060e988c326991037b9aff9df21a6',
              'info_dict': {
                  'id': '300346',
                  'ext': 'mp4',
                  'title': '中一中男師變性 全校師生力挺',
                  'thumbnail': 're:^https?://.*\.jpg$',
-            }
+            },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
          },
          # Ooyala embed
          {
@@ -1020,20 +1125,6 @@ class GenericIE(InfoExtractor):
                  'skip_download': True,
              }
          },
-        # Contains a SMIL manifest
-        {
-            'url': 'http://www.telewebion.com/fa/1263668/%D9%82%D8%B1%D8%B9%D9%87%E2%80%8C%DA%A9%D8%B4%DB%8C-%D9%84%DB%8C%DA%AF-%D9%82%D9%87%D8%B1%D9%85%D8%A7%D9%86%D8%A7%D9%86-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7/%2B-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84.html',
-            'info_dict': {
-                'id': 'file',
-                'ext': 'flv',
-                'title': '+ Football: Lottery Champions League Europe',
-                'uploader': 'www.telewebion.com',
-            },
-            'params': {
-                # rtmpe downloads
-                'skip_download': True,
-            }
-        },
          # Brightcove URL in single quotes
          {
              'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/',
@@ -1044,17 +1135,25 @@ class GenericIE(InfoExtractor):
                  'title': 'SN Presents: Russell Martin, World Citizen',
                  'description': 'To understand why he was the Toronto Blue Jays’ top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.',
                  'uploader': 'Rogers Sportsnet',
+                'uploader_id': '1704050871',
+                'upload_date': '20150525',
+                'timestamp': 1432570283,
              },
          },
          # Dailymotion Cloud video
          {
              'url': 'http://replay.publicsenat.fr/vod/le-debat/florent-kolandjian,dominique-cena,axel-decourtye,laurence-abeille,bruno-parmentier/175910',
-            'md5': '49444254273501a64675a7e68c502681',
+            'md5': 'dcaf23ad0c67a256f4278bce6e0bae38',
              'info_dict': {
-                'id': '5585de919473990de4bee11b',
+                'id': 'x2uy8t3',
                  'ext': 'mp4',
-                'title': 'Le débat',
+                'title': 'Sauvons les abeilles ! - Le débat',
+                'description': 'md5:d9082128b1c5277987825d684939ca26',
                  'thumbnail': 're:^https?://.*\.jpe?g$',
+                'timestamp': 1434970506,
+                'upload_date': '20150622',
+                'uploader': 'Public Sénat',
+                'uploader_id': 'xa9gza',
              }
          },
          # OnionStudios embed
@@ -1111,36 +1210,212 @@ class GenericIE(InfoExtractor):
                  'duration': 51690,
              },
          },
-        # JWPlayer with M3U8
+        # Brightcove embed, with no valid 'renditions' but valid 'IOSRenditions'
+        # This video can't be played in browsers if Flash disabled and UA set to iPhone, which is actually a false alarm
          {
-            'url': 'http://ren.tv/novosti/2015-09-25/sluchaynyy-prohozhiy-poymal-avtougonshchika-v-murmanske-video',
+            'url': 'https://dl.dropboxusercontent.com/u/29092637/interview.html',
              'info_dict': {
-                'id': 'playlist',
+                'id': '4785848093001',
                  'ext': 'mp4',
-                'title': 'Случайный прохожий поймал автоугонщика в Мурманске. ВИДЕО | РЕН ТВ',
-                'uploader': 'ren.tv',
+                'title': 'The Cardinal Pell Interview',
+                'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ',
+                'uploader': 'GlobeCast Australia - GlobeStream',
+                'uploader_id': '2733773828001',
+                'upload_date': '20160304',
+                'timestamp': 1457083087,
              },
              'params': {
                  # m3u8 downloads
                  'skip_download': True,
+            },
+        },
+        # Another form of arte.tv embed
+        {
+            'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
+            'md5': '850bfe45417ddf221288c88a0cffe2e2',
+            'info_dict': {
+                'id': '030273-562_PLUS7-F',
+                'ext': 'mp4',
+                'title': 'ARTE Reportage - Nulle part, en France',
+                'description': 'md5:e3a0e8868ed7303ed509b9e3af2b870d',
+                'upload_date': '20160409',
+            },
+        },
+        # LiveLeak embed
+        {
+            'url': 'http://www.wykop.pl/link/3088787/',
+            'md5': 'ace83b9ed19b21f68e1b50e844fdf95d',
+            'info_dict': {
+                'id': '874_1459135191',
+                'ext': 'mp4',
+                'title': 'Man shows poor quality of new apartment building',
+                'description': 'The wall is like a sand pile.',
+                'uploader': 'Lake8737',
              }
          },
-        # Brightcove embed, with no valid 'renditions' but valid 'IOSRenditions'
-        # This video can't be played in browsers if Flash disabled and UA set to iPhone, which is actually a false alarm
+        # Duplicated embedded video URLs
          {
-            'url': 'https://dl.dropboxusercontent.com/u/29092637/interview.html',
+            'url': 'http://www.hudl.com/athlete/2538180/highlights/149298443',
              'info_dict': {
-                'id': '4785848093001',
+                'id': '149298443_480_16c25b74_2',
                  'ext': 'mp4',
-                'title': 'The Cardinal Pell Interview',
-                'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ',
-                'uploader': 'GlobeCast Australia - GlobeStream',
+                'title': 'vs. Blue Orange Spring Game',
+                'uploader': 'www.hudl.com',
+            },
+        },
+        # twitter:player:stream embed
+        {
+            'url': 'http://www.rtl.be/info/video/589263.aspx?CategoryID=288',
+            'info_dict': {
+                'id': 'master',
+                'ext': 'mp4',
+                'title': 'Une nouvelle espèce de dinosaure découverte en Argentine',
+                'uploader': 'www.rtl.be',
              },
              'params': {
                  # m3u8 downloads
                  'skip_download': True,
              },
          },
+        # twitter:player embed
+        {
+            'url': 'http://www.theatlantic.com/video/index/484130/what-do-black-holes-sound-like/',
+            'md5': 'a3e0df96369831de324f0778e126653c',
+            'info_dict': {
+                'id': '4909620399001',
+                'ext': 'mp4',
+                'title': 'What Do Black Holes Sound Like?',
+                'description': 'what do black holes sound like',
+                'upload_date': '20160524',
+                'uploader_id': '29913724001',
+                'timestamp': 1464107587,
+                'uploader': 'TheAtlantic',
+            },
+            'add_ie': ['BrightcoveLegacy'],
+        },
+        # Facebook <iframe> embed
+        {
+            'url': 'https://www.hostblogger.de/blog/archives/6181-Auto-jagt-Betonmischer.html',
+            'md5': 'fbcde74f534176ecb015849146dd3aee',
+            'info_dict': {
+                'id': '599637780109885',
+                'ext': 'mp4',
+                'title': 'Facebook video #599637780109885',
+            },
+        },
+        # Facebook API embed
+        {
+            'url': 'http://www.lothype.com/blue-stars-2016-preview-standstill-full-show/',
+            'md5': 'a47372ee61b39a7b90287094d447d94e',
+            'info_dict': {
+                'id': '10153467542406923',
+                'ext': 'mp4',
+                'title': 'Facebook video #10153467542406923',
+            },
+        },
+        # Wordpress "YouTube Video Importer" plugin
+        {
+            'url': 'http://www.lothype.com/blue-devils-drumline-stanford-lot-2016/',
+            'md5': 'd16797741b560b485194eddda8121b48',
+            'info_dict': {
+                'id': 'HNTXWDXV9Is',
+                'ext': 'mp4',
+                'title': 'Blue Devils Drumline Stanford lot 2016',
+                'upload_date': '20160627',
+                'uploader_id': 'GENOCIDE8GENERAL10',
+                'uploader': 'cylus cyrus',
+            },
+        },
+        {
+            # video stored on custom kaltura server
+            'url': 'http://www.expansion.com/multimedia/videos.html?media=EQcM30NHIPv',
+            'md5': '537617d06e64dfed891fa1593c4b30cc',
+            'info_dict': {
+                'id': '0_1iotm5bh',
+                'ext': 'mp4',
+                'title': 'Elecciones británicas: 5 lecciones para Rajoy',
+                'description': 'md5:435a89d68b9760b92ce67ed227055f16',
+                'uploader_id': 'videos.expansion@el-mundo.net',
+                'upload_date': '20150429',
+                'timestamp': 1430303472,
+            },
+            'add_ie': ['Kaltura'],
+        },
+        {
+            # Non-standard Vimeo embed
+            'url': 'https://openclassrooms.com/courses/understanding-the-web',
+            'md5': '64d86f1c7d369afd9a78b38cbb88d80a',
+            'info_dict': {
+                'id': '148867247',
+                'ext': 'mp4',
+                'title': 'Understanding the web - Teaser',
+                'description': 'This is "Understanding the web - Teaser" by openclassrooms on Vimeo, the home for high quality videos and the people who love them.',
+                'upload_date': '20151214',
+                'uploader': 'OpenClassrooms',
+                'uploader_id': 'openclassrooms',
+            },
+            'add_ie': ['Vimeo'],
+        },
+        {
+            # generic vimeo embed that requires original URL passed as Referer
+            'url': 'http://racing4everyone.eu/2016/07/30/formula-1-2016-round12-germany/',
+            'only_matching': True,
+        },
+        {
+            'url': 'https://support.arkena.com/display/PLAY/Ways+to+embed+your+video',
+            'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
+            'info_dict': {
+                'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe',
+                'ext': 'mp4',
+                'title': 'Big Buck Bunny',
+                'description': 'Royalty free test video',
+                'timestamp': 1432816365,
+                'upload_date': '20150528',
+                'is_live': False,
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'add_ie': [ArkenaIE.ie_key()],
+        },
+        {
+            'url': 'http://nova.bg/news/view/2016/08/16/156543/%D0%BD%D0%B0-%D0%BA%D0%BE%D1%81%D1%8A%D0%BC-%D0%BE%D1%82-%D0%B2%D0%B7%D1%80%D0%B8%D0%B2-%D0%BE%D1%82%D1%86%D0%B5%D0%BF%D0%B8%D1%85%D0%B0-%D1%86%D1%8F%D0%BB-%D0%BA%D0%B2%D0%B0%D1%80%D1%82%D0%B0%D0%BB-%D0%B7%D0%B0%D1%80%D0%B0%D0%B4%D0%B8-%D0%B8%D0%B7%D1%82%D0%B8%D1%87%D0%B0%D0%BD%D0%B5-%D0%BD%D0%B0-%D0%B3%D0%B0%D0%B7-%D0%B2-%D0%BF%D0%BB%D0%BE%D0%B2%D0%B4%D0%B8%D0%B2/',
+            'info_dict': {
+                'id': '1c7141f46c',
+                'ext': 'mp4',
+                'title': 'НА КОСЪМ ОТ ВЗРИВ: Изтичане на газ на бензиностанция в Пловдив',
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'add_ie': [Vbox7IE.ie_key()],
+        },
+        {
+            # DBTV embeds
+            'url': 'http://www.dagbladet.no/2016/02/23/nyheter/nordlys/ski/troms/ver/43254897/',
+            'info_dict': {
+                'id': '43254897',
+                'title': 'Etter ett års planlegging, klaffet endelig alt: - Jeg måtte ta en liten dans',
+            },
+            'playlist_mincount': 3,
+        },
+        # {
+        #     # TODO: find another test
+        #     # http://schema.org/VideoObject
+        #     'url': 'https://flipagram.com/f/nyvTSJMKId',
+        #     'md5': '888dcf08b7ea671381f00fab74692755',
+        #     'info_dict': {
+        #         'id': 'nyvTSJMKId',
+        #         'ext': 'mp4',
+        #         'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
+        #         'description': '#love for cats.',
+        #         'timestamp': 1461244995,
+        #         'upload_date': '20160421',
+        #     },
+        #     'params': {
+        #         'force_generic_extractor': True,
+        #     },
+        # }
      ]
  
      def report_following_redirect(self, new_url):
@@ -1261,7 +1536,7 @@ class GenericIE(InfoExtractor):
              force_videoid = smuggled_data['force_videoid']
              video_id = force_videoid
          else:
-            video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
+            video_id = self._generic_id(url)
  
          self.to_screen('%s: Requesting header' % video_id)
  
@@ -1290,7 +1565,7 @@ class GenericIE(InfoExtractor):
  
          info_dict = {
              'id': video_id,
-            'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
+            'title': self._generic_title(url),
              'upload_date': unified_strdate(head_response.headers.get('Last-Modified'))
          }
  
@@ -1361,6 +1636,10 @@ class GenericIE(InfoExtractor):
              doc = compat_etree_fromstring(webpage.encode('utf-8'))
              if doc.tag == 'rss':
                  return self._extract_rss(url, video_id, doc)
+            elif doc.tag == 'SmoothStreamingMedia':
+                info_dict['formats'] = self._parse_ism_formats(doc, url)
+                self._sort_formats(info_dict['formats'])
+                return info_dict
              elif re.match(r'^(?:{[^}]+})?smil$', doc.tag):
                  smil = self._parse_smil(doc, url, video_id)
                  self._sort_formats(smil['formats'])
@@ -1369,7 +1648,9 @@ class GenericIE(InfoExtractor):
                  return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
              elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
                  info_dict['formats'] = self._parse_mpd_formats(
-                    doc, video_id, mpd_base_url=url.rpartition('/')[0])
+                    doc, video_id,
+                    mpd_base_url=full_response.geturl().rpartition('/')[0],
+                    mpd_url=url)
                  self._sort_formats(info_dict['formats'])
                  return info_dict
              elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag):
@@ -1395,7 +1676,8 @@ class GenericIE(InfoExtractor):
          #   Site Name | Video Title
          #   Video Title - Tagline | Site Name
          # and so on and so forth; it's just not practical
-        video_title = self._html_search_regex(
+        video_title = self._og_search_title(
+            webpage, default=None) or self._html_search_regex(
              r'(?s)<title>(.*?)</title>', webpage, 'video title',
              default='video')
  
@@ -1413,6 +1695,9 @@ class GenericIE(InfoExtractor):
          video_uploader = self._search_regex(
              r'^(?:https?://)?([^/]*)/.*', url, 'video uploader')
  
+        video_description = self._og_search_description(webpage, default=None)
+        video_thumbnail = self._og_search_thumbnail(webpage, default=None)
+
          # Helper method
          def _playlist_from_matches(matches, getter=None, ie=None):
              urlrs = orderedSet(
@@ -1443,6 +1728,16 @@ class GenericIE(InfoExtractor):
          if bc_urls:
              return _playlist_from_matches(bc_urls, ie='BrightcoveNew')
  
+        # Look for ThePlatform embeds
+        tp_urls = ThePlatformIE._extract_urls(webpage)
+        if tp_urls:
+            return _playlist_from_matches(tp_urls, ie='ThePlatform')
+
+        # Look for Vessel embeds
+        vessel_urls = VesselIE._extract_urls(webpage)
+        if vessel_urls:
+            return _playlist_from_matches(vessel_urls, ie=VesselIE.ie_key())
+
          # Look for embedded rtl.nl player
          matches = re.findall(
              r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"',
@@ -1450,9 +1745,9 @@ class GenericIE(InfoExtractor):
          if matches:
              return _playlist_from_matches(matches, ie='RtlNl')
  
-        vimeo_url = VimeoIE._extract_vimeo_url(url, webpage)
-        if vimeo_url is not None:
-            return self.url_result(vimeo_url)
+        vimeo_urls = VimeoIE._extract_urls(url, webpage)
+        if vimeo_urls:
+            return _playlist_from_matches(vimeo_urls, ie=VimeoIE.ie_key())
  
          vid_me_embed_url = self._search_regex(
              r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]',
@@ -1483,12 +1778,16 @@ class GenericIE(InfoExtractor):
          if matches:
              return _playlist_from_matches(matches, lambda m: unescapeHTML(m))
  
-        # Look for embedded Dailymotion player
-        matches = re.findall(
-            r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
+        # Look for Wordpress "YouTube Video Importer" plugin
+        matches = re.findall(r'''(?x)<div[^>]+
+            class=(?P<q1>[\'"])[^\'"]*\byvii_single_video_player\b[^\'"]*(?P=q1)[^>]+
+            data-video_id=(?P<q2>[\'"])([^\'"]+)(?P=q2)''', webpage)
          if matches:
-            return _playlist_from_matches(
-                matches, lambda m: unescapeHTML(m[1]))
+            return _playlist_from_matches(matches, lambda m: m[-1])
+
+        matches = DailymotionIE._extract_urls(webpage)
+        if matches:
+            return _playlist_from_matches(matches)
  
          # Look for embedded Dailymotion playlist player (#3822)
          m = re.search(
@@ -1511,21 +1810,26 @@ class GenericIE(InfoExtractor):
                  'url': embed_url,
                  'ie_key': 'Wistia',
                  'uploader': video_uploader,
-                'title': video_title,
-                'id': video_id,
              }
  
          match = re.search(r'(?:id=["\']wistia_|data-wistia-?id=["\']|Wistia\.embed\(["\'])(?P<id>[^"\']+)', webpage)
          if match:
              return {
                  '_type': 'url_transparent',
-                'url': 'http://fast.wistia.net/embed/iframe/{0:}'.format(match.group('id')),
+                'url': 'wistia:%s' % match.group('id'),
                  'ie_key': 'Wistia',
                  'uploader': video_uploader,
-                'title': video_title,
-                'id': match.group('id')
              }
  
+        match = re.search(
+            r'''(?sx)
+                <script[^>]+src=(["'])(?:https?:)?//fast\.wistia\.com/assets/external/E-v1\.js\1[^>]*>.*?
+                <div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]+)\b.*?\2
+            ''', webpage)
+        if match:
+            return self.url_result(self._proto_relative_url(
+                'wistia:%s' % match.group('id')), 'Wistia')
+
          # Look for SVT player
          svt_url = SVTIE._extract_url(webpage)
          if svt_url:
@@ -1620,10 +1924,9 @@ class GenericIE(InfoExtractor):
              return self.url_result(mobj.group('url'))
  
          # Look for embedded Facebook player
-        mobj = re.search(
-            r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
-        if mobj is not None:
-            return self.url_result(mobj.group('url'), 'Facebook')
+        facebook_url = FacebookIE._extract_url(webpage)
+        if facebook_url is not None:
+            return self.url_result(facebook_url, 'Facebook')
  
          # Look for embedded VK player
          mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1', webpage)
@@ -1680,11 +1983,6 @@ class GenericIE(InfoExtractor):
          if sportbox_urls:
              return _playlist_from_matches(sportbox_urls, ie='SportBoxEmbed')
  
-        # Look for embedded PornHub player
-        pornhub_url = PornHubIE._extract_url(webpage)
-        if pornhub_url:
-            return self.url_result(pornhub_url, 'PornHub')
-
          # Look for embedded XHamster player
          xhamster_urls = XHamsterEmbedIE._extract_urls(webpage)
          if xhamster_urls:
@@ -1695,6 +1993,21 @@ class GenericIE(InfoExtractor):
          if tnaflix_urls:
              return _playlist_from_matches(tnaflix_urls, ie=TNAFlixNetworkEmbedIE.ie_key())
  
+        # Look for embedded PornHub player
+        pornhub_urls = PornHubIE._extract_urls(webpage)
+        if pornhub_urls:
+            return _playlist_from_matches(pornhub_urls, ie=PornHubIE.ie_key())
+
+        # Look for embedded DrTuber player
+        drtuber_urls = DrTuberIE._extract_urls(webpage)
+        if drtuber_urls:
+            return _playlist_from_matches(drtuber_urls, ie=DrTuberIE.ie_key())
+
+        # Look for embedded RedTube player
+        redtube_urls = RedTubeIE._extract_urls(webpage)
+        if redtube_urls:
+            return _playlist_from_matches(redtube_urls, ie=RedTubeIE.ie_key())
+
          # Look for embedded Tvigle player
          mobj = re.search(
              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@@ -1715,7 +2028,7 @@ class GenericIE(InfoExtractor):
  
          # Look for embedded arte.tv player
          mobj = re.search(
-            r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"',
+            r'<(?:script|iframe) [^>]*?src="(?P<url>http://www\.arte\.tv/(?:playerv2/embed|arte_vp/index)[^"]+)"',
              webpage)
          if mobj is not None:
              return self.url_result(mobj.group('url'), 'ArteTVEmbed')
@@ -1738,20 +2051,9 @@ class GenericIE(InfoExtractor):
              return self.url_result(myvi_url)
  
          # Look for embedded soundcloud player
-        mobj = re.search(
-            r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
-            webpage)
-        if mobj is not None:
-            url = unescapeHTML(mobj.group('url'))
-            return self.url_result(url)
-
-        # Look for embedded vulture.com player
-        mobj = re.search(
-            r'<iframe src="(?P<url>https?://video\.vulture\.com/[^"]+)"',
-            webpage)
-        if mobj is not None:
-            url = unescapeHTML(mobj.group('url'))
-            return self.url_result(url, ie='Vulture')
+        soundcloud_urls = SoundcloudIE._extract_urls(webpage)
+        if soundcloud_urls:
+            return _playlist_from_matches(soundcloud_urls, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
  
          # Look for embedded mtvservices player
          mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
@@ -1801,7 +2103,7 @@ class GenericIE(InfoExtractor):
              return self.url_result(self._proto_relative_url(mobj.group('url'), scheme='http:'), 'CondeNast')
  
          mobj = re.search(
-            r'<iframe[^>]+src="(?P<url>https?://new\.livestream\.com/[^"]+/player[^"]+)"',
+            r'<iframe[^>]+src="(?P<url>https?://(?:new\.)?livestream\.com/[^"]+/player[^"]+)"',
              webpage)
          if mobj is not None:
              return self.url_result(mobj.group('url'), 'Livestream')
@@ -1813,18 +2115,14 @@ class GenericIE(InfoExtractor):
              return self.url_result(mobj.group('url'), 'Zapiks')
  
          # Look for Kaltura embeds
-        mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_?[Ii]d'\s*:\s*'(?P<id>[^']+)',", webpage) or
-                re.search(r'(?s)(?P<q1>["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P<q2>["\'])(?P<id>.+?)(?P=q2)', webpage))
-        if mobj is not None:
-            return self.url_result(smuggle_url(
-                'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(),
-                {'source_url': url}), 'Kaltura')
+        kaltura_url = KalturaIE._extract_url(webpage)
+        if kaltura_url:
+            return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
  
          # Look for Eagle.Platform embeds
-        mobj = re.search(
-            r'<iframe[^>]+src="(?P<url>https?://.+?\.media\.eagleplatform\.com/index/player\?.+?)"', webpage)
-        if mobj is not None:
-            return self.url_result(mobj.group('url'), 'EaglePlatform')
+        eagleplatform_url = EaglePlatformIE._extract_url(webpage)
+        if eagleplatform_url:
+            return self.url_result(eagleplatform_url, EaglePlatformIE.ie_key())
  
          # Look for ClipYou (uses Eagle.Platform) embeds
          mobj = re.search(
@@ -1865,6 +2163,12 @@ class GenericIE(InfoExtractor):
          if nbc_sports_url:
              return self.url_result(nbc_sports_url, 'NBCSportsVPlayer')
  
+        # Look for NBC News embeds
+        nbc_news_embed_url = re.search(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//www\.nbcnews\.com/widget/video-embed/[^"\']+)\1', webpage)
+        if nbc_news_embed_url:
+            return self.url_result(nbc_news_embed_url.group('url'), 'NBCNews')
+
          # Look for Google Drive embeds
          google_drive_url = GoogleDriveIE._extract_url(webpage)
          if google_drive_url:
@@ -1892,10 +2196,10 @@ class GenericIE(InfoExtractor):
          if onionstudios_url:
              return self.url_result(onionstudios_url)
  
-        # Look for SnagFilms embeds
-        snagfilms_url = SnagFilmsEmbedIE._extract_url(webpage)
-        if snagfilms_url:
-            return self.url_result(snagfilms_url)
+        # Look for ViewLift embeds
+        viewlift_url = ViewLiftEmbedIE._extract_url(webpage)
+        if viewlift_url:
+            return self.url_result(viewlift_url)
  
          # Look for JWPlatform embeds
          jwplatform_url = JWPlatformIE._extract_url(webpage)
@@ -1912,6 +2216,11 @@ class GenericIE(InfoExtractor):
          if digiteka_url:
              return self.url_result(self._proto_relative_url(digiteka_url), DigitekaIE.ie_key())
  
+        # Look for Arkena embeds
+        arkena_url = ArkenaIE._extract_url(webpage)
+        if arkena_url:
+            return self.url_result(arkena_url, ArkenaIE.ie_key())
+
          # Look for Limelight embeds
          mobj = re.search(r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', webpage)
          if mobj:
@@ -1940,17 +2249,107 @@ class GenericIE(InfoExtractor):
              return self.url_result(
                  self._proto_relative_url(unescapeHTML(mobj.group(1))), 'Vine')
  
+        # Look for VODPlatform embeds
+        mobj = re.search(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vod-platform\.net/[eE]mbed/.+?)\1',
+            webpage)
+        if mobj is not None:
+            return self.url_result(
+                self._proto_relative_url(unescapeHTML(mobj.group('url'))), 'VODPlatform')
+
+        # Look for Mangomolo embeds
+        mobj = re.search(
+            r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?admin\.mangomolo\.com/analytics/index\.php/customers/embed/
+                (?:
+                    video\?.*?\bid=(?P<video_id>\d+)|
+                    index\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
+                ).+?)\1''', webpage)
+        if mobj is not None:
+            info = {
+                '_type': 'url_transparent',
+                'url': self._proto_relative_url(unescapeHTML(mobj.group('url'))),
+                'title': video_title,
+                'description': video_description,
+                'thumbnail': video_thumbnail,
+                'uploader': video_uploader,
+            }
+            video_id = mobj.group('video_id')
+            if video_id:
+                info.update({
+                    'ie_key': 'MangomoloVideo',
+                    'id': video_id,
+                })
+            else:
+                info.update({
+                    'ie_key': 'MangomoloLive',
+                    'id': mobj.group('channel_id'),
+                })
+            return info
+
          # Look for Instagram embeds
          instagram_embed_url = InstagramIE._extract_embed_url(webpage)
          if instagram_embed_url is not None:
-            return self.url_result(instagram_embed_url, InstagramIE.ie_key())
+            return self.url_result(
+                self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
+
+        # Look for LiveLeak embeds
+        liveleak_url = LiveLeakIE._extract_url(webpage)
+        if liveleak_url:
+            return self.url_result(liveleak_url, 'LiveLeak')
+
+        # Look for 3Q SDN embeds
+        threeqsdn_url = ThreeQSDNIE._extract_url(webpage)
+        if threeqsdn_url:
+            return {
+                '_type': 'url_transparent',
+                'ie_key': ThreeQSDNIE.ie_key(),
+                'url': self._proto_relative_url(threeqsdn_url),
+                'title': video_title,
+                'description': video_description,
+                'thumbnail': video_thumbnail,
+                'uploader': video_uploader,
+            }
+
+        # Look for VBOX7 embeds
+        vbox7_url = Vbox7IE._extract_url(webpage)
+        if vbox7_url:
+            return self.url_result(vbox7_url, Vbox7IE.ie_key())
+
+        # Look for DBTV embeds
+        dbtv_urls = DBTVIE._extract_urls(webpage)
+        if dbtv_urls:
+            return _playlist_from_matches(dbtv_urls, ie=DBTVIE.ie_key())
+
+        # Looking for http://schema.org/VideoObject
+        json_ld = self._search_json_ld(
+            webpage, video_id, default={}, expected_type='VideoObject')
+        if json_ld.get('url'):
+            info_dict.update({
+                'title': video_title or info_dict['title'],
+                'description': video_description,
+                'thumbnail': video_thumbnail,
+                'age_limit': age_limit
+            })
+            info_dict.update(json_ld)
+            return info_dict
+
+        # Look for HTML5 media
+        entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
+        if entries:
+            for entry in entries:
+                entry.update({
+                    'id': video_id,
+                    'title': video_title,
+                })
+                self._sort_formats(entry['formats'])
+            return self.playlist_result(entries)
  
          def check_video(vurl):
              if YoutubeIE.suitable(vurl):
                  return True
              vpath = compat_urlparse.urlparse(vurl).path
              vext = determine_ext(vpath)
-            return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml')
+            return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml', 'js')
  
          def filter_video(urls):
              return list(filter(check_video, urls))
@@ -1988,6 +2387,9 @@ class GenericIE(InfoExtractor):
                  r"cinerama\.embedPlayer\(\s*\'[^']+\',\s*'([^']+)'", webpage)
          if not found:
              # Try to find twitter cards info
+            # twitter:player:stream should be checked before twitter:player since
+            # it is expected to contain a raw stream (see
+            # https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
              found = filter_video(re.findall(
                  r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage))
          if not found:
@@ -1997,9 +2399,6 @@ class GenericIE(InfoExtractor):
              # We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
              if m_video_type is not None:
                  found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage))
-        if not found:
-            # HTML5 video
-            found = re.findall(r'(?s)<(?:video|audio)[^<]*(?:>.*?<source[^>]*)?\s+src=["\'](.*?)["\']', webpage)
          if not found:
              REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
              found = re.search(
@@ -2021,11 +2420,21 @@ class GenericIE(InfoExtractor):
                      '_type': 'url',
                      'url': new_url,
                  }
+
+        if not found:
+            # twitter:player is a https URL to iframe player that may or may not
+            # be supported by youtube-dl thus this is checked the very last (see
+            # https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
+            embed_url = self._html_search_meta('twitter:player', webpage, default=None)
+            if embed_url:
+                return self.url_result(embed_url)
+
          if not found:
              raise UnsupportedError(url)
  
          entries = []
-        for video_url in found:
+        for video_url in orderedSet(found):
+            video_url = unescapeHTML(video_url)
              video_url = video_url.replace('\\/', '/')
              video_url = compat_urlparse.urljoin(url, video_url)
              video_id = compat_urllib_parse_unquote(os.path.basename(video_url))
@@ -2056,6 +2465,21 @@ class GenericIE(InfoExtractor):
                  entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id)
              elif ext == 'f4m':
                  entry_info_dict['formats'] = self._extract_f4m_formats(video_url, video_id)
+            elif re.search(r'(?i)\.(?:ism|smil)/manifest', video_url) and video_url != url:
+                # Just matching .ism/manifest is not enough to be reliably sure
+                # whether it's actually an ISM manifest or some other streaming
+                # manifest since there are various streaming URL formats
+                # possible (see [1]) as well as some other shenanigans like
+                # .smil/manifest URLs that actually serve an ISM (see [2]) and
+                # so on.
+                # Thus the most reasonable way to solve this is to delegate
+                # to generic extractor in order to look into the contents of
+                # the manifest itself.
+                # 1. https://azure.microsoft.com/en-us/documentation/articles/media-services-deliver-content-overview/#streaming-url-formats
+                # 2. https://svs.itworkscdn.net/lbcivod/smil:itwfcdn/lbci/170976.smil/Manifest
+                entry_info_dict = self.url_result(
+                    smuggle_url(video_url, {'to_generic': True}),
+                    GenericIE.ie_key())
              else:
                  entry_info_dict['url'] = video_url
  
diff --git a/youtube_dl/extractor/glide.py b/youtube_dl/extractor/glide.py

index 9561ed5fbaa25404654303956a676b000da2af67..f0d951396fdba4f74027e81af629f7c27c253f9a 100644 (file)
--- a/youtube_dl/extractor/glide.py
+++ b/youtube_dl/extractor/glide.py
@@ -13,24 +13,27 @@ class GlideIE(InfoExtractor):
          'info_dict': {
              'id': 'UZF8zlmuQbe4mr+7dCiQ0w==',
              'ext': 'mp4',
-            'title': 'Damon Timm\'s Glide message',
+            'title': "Damon's Glide message",
              'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+
          webpage = self._download_webpage(url, video_id)
+
          title = self._html_search_regex(
-            r'<title>(.*?)</title>', webpage, 'title')
-        video_url = self.http_scheme() + self._search_regex(
-            r'<source src="(.*?)" type="video/mp4">', webpage, 'video URL')
-        thumbnail_url = self._search_regex(
-            r'<img id="video-thumbnail" src="(.*?)"',
-            webpage, 'thumbnail url', fatal=False)
-        thumbnail = (
-            thumbnail_url if thumbnail_url is None
-            else self.http_scheme() + thumbnail_url)
+            r'<title>(.+?)</title>', webpage,
+            'title', default=None) or self._og_search_title(webpage)
+        video_url = self._proto_relative_url(self._search_regex(
+            r'<source[^>]+src=(["\'])(?P<url>.+?)\1',
+            webpage, 'video URL', default=None,
+            group='url')) or self._og_search_video_url(webpage)
+        thumbnail = self._proto_relative_url(self._search_regex(
+            r'<img[^>]+id=["\']video-thumbnail["\'][^>]+src=(["\'])(?P<url>.+?)\1',
+            webpage, 'thumbnail url', default=None,
+            group='url')) or self._og_search_thumbnail(webpage)
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/globo.py b/youtube_dl/extractor/globo.py

index 3de8356f68ef67e0913fd958995ad1d3e48ac62f..dc7b2661c58a0b35053ea50c7a2c1fa7b093f642 100644 (file)
--- a/youtube_dl/extractor/globo.py
+++ b/youtube_dl/extractor/globo.py
@@ -2,6 +2,7 @@
  from __future__ import unicode_literals
  
  import random
+import re
  import math
  
  from .common import InfoExtractor
@@ -14,12 +15,13 @@ from ..utils import (
      ExtractorError,
      float_or_none,
      int_or_none,
+    orderedSet,
      str_or_none,
  )
  
  
  class GloboIE(InfoExtractor):
-    _VALID_URL = '(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
+    _VALID_URL = r'(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
  
      _API_URL_TEMPLATE = 'http://api.globovideos.com/videos/%s/playlist'
      _SECURITY_URL_TEMPLATE = 'http://security.video.globo.com/videos/%s/hash?player=flash&version=17.0.0.132&resource_id=%s'
@@ -63,6 +65,9 @@ class GloboIE(InfoExtractor):
      }, {
          'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html',
          'only_matching': True,
+    }, {
+        'url': 'globo:3607726',
+        'only_matching': True,
      }]
  
      class MD5(object):
@@ -396,33 +401,41 @@ class GloboIE(InfoExtractor):
  
  
  class GloboArticleIE(InfoExtractor):
-    _VALID_URL = 'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)\.html'
+    _VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/.]+)(?:\.html)?'
  
      _VIDEOID_REGEXES = [
          r'\bdata-video-id=["\'](\d{7,})',
          r'\bdata-player-videosids=["\'](\d{7,})',
-        r'\bvideosIDs\s*:\s*["\'](\d{7,})',
+        r'\bvideosIDs\s*:\s*["\']?(\d{7,})',
          r'\bdata-id=["\'](\d{7,})',
          r'<div[^>]+\bid=["\'](\d{7,})',
      ]
  
      _TESTS = [{
          'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
-        'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
          'info_dict': {
-            'id': '3652183',
-            'ext': 'mp4',
-            'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião',
-            'duration': 110.711,
-            'uploader': 'Rede Globo',
-            'uploader_id': '196',
-        }
+            'id': 'novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes',
+            'title': 'Novidade na fiscalização de bagagem pela Receita provoca discussões',
+            'description': 'md5:c3c4b4d4c30c32fce460040b1ac46b12',
+        },
+        'playlist_count': 1,
+    }, {
+        'url': 'http://g1.globo.com/pr/parana/noticia/2016/09/mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato.html',
+        'info_dict': {
+            'id': 'mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato',
+            'title': "Lula era o 'comandante máximo' do esquema da Lava Jato, diz MPF",
+            'description': 'md5:8aa7cc8beda4dc71cc8553e00b77c54c',
+        },
+        'playlist_count': 6,
      }, {
          'url': 'http://gq.globo.com/Prazeres/Poder/noticia/2015/10/all-o-desafio-assista-ao-segundo-capitulo-da-serie.html',
          'only_matching': True,
      }, {
          'url': 'http://gshow.globo.com/programas/tv-xuxa/O-Programa/noticia/2014/01/xuxa-e-junno-namoram-muuuito-em-luau-de-zeze-di-camargo-e-luciano.html',
          'only_matching': True,
+    }, {
+        'url': 'http://oglobo.globo.com/rio/a-amizade-entre-um-entregador-de-farmacia-um-piano-19946271',
+        'only_matching': True,
      }]
  
      @classmethod
@@ -432,5 +445,12 @@ class GloboArticleIE(InfoExtractor):
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
-        video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id')
-        return self.url_result('globo:%s' % video_id, 'Globo')
+        video_ids = []
+        for video_regex in self._VIDEOID_REGEXES:
+            video_ids.extend(re.findall(video_regex, webpage))
+        entries = [
+            self.url_result('globo:%s' % video_id, GloboIE.ie_key())
+            for video_id in orderedSet(video_ids)]
+        title = self._og_search_title(webpage, fatal=False)
+        description = self._html_search_meta('description', webpage)
+        return self.playlist_result(entries, display_id, title, description)
diff --git a/youtube_dl/extractor/go.py b/youtube_dl/extractor/go.py

new file mode 100644 (file)

index 0000000..c7776b1
--- /dev/null
+++ b/youtube_dl/extractor/go.py
@@ -0,0 +1,122 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    determine_ext,
+    parse_age_limit,
+    urlencode_postdata,
+    ExtractorError,
+)
+
+
+class GoIE(InfoExtractor):
+    _BRANDS = {
+        'abc': '001',
+        'freeform': '002',
+        'watchdisneychannel': '004',
+        'watchdisneyjunior': '008',
+        'watchdisneyxd': '009',
+    }
+    _VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:[^/]+/)*(?:vdka(?P<id>\w+)|season-\d+/\d+-(?P<display_id>[^/?#]+))' % '|'.join(_BRANDS.keys())
+    _TESTS = [{
+        'url': 'http://abc.go.com/shows/castle/video/most-recent/vdka0_g86w5onx',
+        'info_dict': {
+            'id': '0_g86w5onx',
+            'ext': 'mp4',
+            'title': 'Sneak Peek: Language Arts',
+            'description': 'md5:7dcdab3b2d17e5217c953256af964e9c',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://abc.go.com/shows/after-paradise/video/most-recent/vdka3335601',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
+        if not video_id:
+            webpage = self._download_webpage(url, display_id)
+            video_id = self._search_regex(r'data-video-id=["\']VDKA(\w+)', webpage, 'video id')
+        brand = self._BRANDS[sub_domain]
+        video_data = self._download_json(
+            'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (brand, video_id),
+            video_id)['video'][0]
+        title = video_data['title']
+
+        formats = []
+        for asset in video_data.get('assets', {}).get('asset', []):
+            asset_url = asset.get('value')
+            if not asset_url:
+                continue
+            format_id = asset.get('format')
+            ext = determine_ext(asset_url)
+            if ext == 'm3u8':
+                video_type = video_data.get('type')
+                if video_type == 'lf':
+                    entitlement = self._download_json(
+                        'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
+                        video_id, data=urlencode_postdata({
+                            'video_id': video_data['id'],
+                            'video_type': video_type,
+                            'brand': brand,
+                            'device': '001',
+                        }))
+                    errors = entitlement.get('errors', {}).get('errors', [])
+                    if errors:
+                        error_message = ', '.join([error['message'] for error in errors])
+                        raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
+                    asset_url += '?' + entitlement['uplynkData']['sessionKey']
+                formats.extend(self._extract_m3u8_formats(
+                    asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False))
+            else:
+                formats.append({
+                    'format_id': format_id,
+                    'url': asset_url,
+                    'ext': ext,
+                })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for cc in video_data.get('closedcaption', {}).get('src', []):
+            cc_url = cc.get('value')
+            if not cc_url:
+                continue
+            ext = determine_ext(cc_url)
+            if ext == 'xml':
+                ext = 'ttml'
+            subtitles.setdefault(cc.get('lang'), []).append({
+                'url': cc_url,
+                'ext': ext,
+            })
+
+        thumbnails = []
+        for thumbnail in video_data.get('thumbnails', {}).get('thumbnail', []):
+            thumbnail_url = thumbnail.get('value')
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'url': thumbnail_url,
+                'width': int_or_none(thumbnail.get('width')),
+                'height': int_or_none(thumbnail.get('height')),
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('longdescription') or video_data.get('description'),
+            'duration': int_or_none(video_data.get('duration', {}).get('value'), 1000),
+            'age_limit': parse_age_limit(video_data.get('tvrating', {}).get('rating')),
+            'episode_number': int_or_none(video_data.get('episodenumber')),
+            'series': video_data.get('show', {}).get('title'),
+            'season_number': int_or_none(video_data.get('season', {}).get('num')),
+            'thumbnails': thumbnails,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/godtv.py b/youtube_dl/extractor/godtv.py

new file mode 100644 (file)

index 0000000..c5d3b4e
--- /dev/null
+++ b/youtube_dl/extractor/godtv.py
@@ -0,0 +1,66 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .ooyala import OoyalaIE
+from ..utils import js_to_json
+
+
+class GodTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?god\.tv(?:/[^/]+)*/(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'http://god.tv/jesus-image/video/jesus-conference-2016/randy-needham',
+        'info_dict': {
+            'id': 'lpd3g2MzE6D1g8zFAKz8AGpxWcpu6o_3',
+            'ext': 'mp4',
+            'title': 'Randy Needham',
+            'duration': 3615.08,
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://god.tv/playlist/bible-study',
+        'info_dict': {
+            'id': 'bible-study',
+        },
+        'playlist_mincount': 37,
+    }, {
+        'url': 'http://god.tv/node/15097',
+        'only_matching': True,
+    }, {
+        'url': 'http://god.tv/live/africa',
+        'only_matching': True,
+    }, {
+        'url': 'http://god.tv/liveevents',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        settings = self._parse_json(
+            self._search_regex(
+                r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
+                webpage, 'settings', default='{}'),
+            display_id, transform_source=js_to_json, fatal=False)
+
+        ooyala_id = None
+
+        if settings:
+            playlist = settings.get('playlist')
+            if playlist and isinstance(playlist, list):
+                entries = [
+                    OoyalaIE._build_url_result(video['content_id'])
+                    for video in playlist if video.get('content_id')]
+                if entries:
+                    return self.playlist_result(entries, display_id)
+            ooyala_id = settings.get('ooyala', {}).get('content_id')
+
+        if not ooyala_id:
+            ooyala_id = self._search_regex(
+                r'["\']content_id["\']\s*:\s*(["\'])(?P<id>[\w-]+)\1',
+                webpage, 'ooyala id', group='id')
+
+        return OoyalaIE._build_url_result(ooyala_id)
diff --git a/youtube_dl/extractor/goldenmoustache.py b/youtube_dl/extractor/goldenmoustache.py

deleted file mode 100644 (file)

index 0fb5097..0000000
--- a/youtube_dl/extractor/goldenmoustache.py
+++ /dev/null
@@ -1,48 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-
-
-class GoldenMoustacheIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?goldenmoustache\.com/(?P<display_id>[\w-]+)-(?P<id>\d+)'
-    _TESTS = [{
-        'url': 'http://www.goldenmoustache.com/suricate-le-poker-3700/',
-        'md5': '0f904432fa07da5054d6c8beb5efb51a',
-        'info_dict': {
-            'id': '3700',
-            'ext': 'mp4',
-            'title': 'Suricate - Le Poker',
-            'description': 'md5:3d1f242f44f8c8cb0a106f1fd08e5dc9',
-            'thumbnail': 're:^https?://.*\.jpg$',
-        }
-    }, {
-        'url': 'http://www.goldenmoustache.com/le-lab-tout-effacer-mc-fly-et-carlito-55249/',
-        'md5': '27f0c50fb4dd5f01dc9082fc67cd5700',
-        'info_dict': {
-            'id': '55249',
-            'ext': 'mp4',
-            'title': 'Le LAB - Tout Effacer (Mc Fly et Carlito)',
-            'description': 'md5:9b7fbf11023fb2250bd4b185e3de3b2a',
-            'thumbnail': 're:^https?://.*\.(?:png|jpg)$',
-        }
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        video_url = self._html_search_regex(
-            r'data-src-type="mp4" data-src="([^"]+)"', webpage, 'video URL')
-        title = self._html_search_regex(
-            r'<title>(.*?)(?: - Golden Moustache)?</title>', webpage, 'title')
-        thumbnail = self._og_search_thumbnail(webpage)
-        description = self._og_search_description(webpage)
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'ext': 'mp4',
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-        }
diff --git a/youtube_dl/extractor/googleplus.py b/youtube_dl/extractor/googleplus.py

index 731bacd673bd57fe82411268c5920a3e9c7447ac..427499b11286f00a8e10e09a8de1d9f84611b5c9 100644 (file)
--- a/youtube_dl/extractor/googleplus.py
+++ b/youtube_dl/extractor/googleplus.py
@@ -10,7 +10,7 @@ from ..utils import unified_strdate
  
  class GooglePlusIE(InfoExtractor):
      IE_DESC = 'Google Plus'
-    _VALID_URL = r'https://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
+    _VALID_URL = r'https?://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
      IE_NAME = 'plus.google'
      _TEST = {
          'url': 'https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH',
diff --git a/youtube_dl/extractor/googlesearch.py b/youtube_dl/extractor/googlesearch.py

index 498304cb2bd9b605d44e67291a2f38bf4481a6f8..5279fa807f6903fa757c552b3e9ad3e013e5b494 100644 (file)
--- a/youtube_dl/extractor/googlesearch.py
+++ b/youtube_dl/extractor/googlesearch.py
@@ -4,9 +4,6 @@ import itertools
  import re
  
  from .common import SearchInfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-)
  
  
  class GoogleSearchIE(SearchInfoExtractor):
@@ -34,13 +31,16 @@ class GoogleSearchIE(SearchInfoExtractor):
          }
  
          for pagenum in itertools.count():
-            result_url = (
-                'http://www.google.com/search?tbm=vid&q=%s&start=%s&hl=en'
-                % (compat_urllib_parse.quote_plus(query), pagenum * 10))
-
              webpage = self._download_webpage(
-                result_url, 'gvsearch:' + query,
-                note='Downloading result page ' + str(pagenum + 1))
+                'http://www.google.com/search',
+                'gvsearch:' + query,
+                note='Downloading result page %s' % (pagenum + 1),
+                query={
+                    'tbm': 'vid',
+                    'q': query,
+                    'start': pagenum * 10,
+                    'hl': 'en',
+                })
  
              for hit_idx, mobj in enumerate(re.finditer(
                      r'<h3 class="r"><a href="([^"]+)"', webpage)):
diff --git a/youtube_dl/extractor/goshgay.py b/youtube_dl/extractor/goshgay.py

index 1d9166455aae935f1eb51777d170e0f6259ffd4e..74e1720ee325da8fb4c011eddec342fe2de62d9b 100644 (file)
--- a/youtube_dl/extractor/goshgay.py
+++ b/youtube_dl/extractor/goshgay.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -11,16 +11,16 @@ from ..utils import (
  
  
  class GoshgayIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
+    _VALID_URL = r'https?://(?:www\.)?goshgay\.com/video(?P<id>\d+?)($|/)'
      _TEST = {
          'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
-        'md5': '027fcc54459dff0feb0bc06a7aeda680',
+        'md5': '4b6db9a0a333142eb9f15913142b0ed1',
          'info_dict': {
              'id': '299069',
              'ext': 'flv',
              'title': 'DIESEL SFW XXX Video',
              'thumbnail': 're:^http://.*\.jpg$',
-            'duration': 79,
+            'duration': 80,
              'age_limit': 18,
          }
      }
@@ -47,5 +47,5 @@ class GoshgayIE(InfoExtractor):
              'title': title,
              'thumbnail': thumbnail,
              'duration': duration,
-            'age_limit': self._family_friendly_search(webpage),
+            'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/gputechconf.py b/youtube_dl/extractor/gputechconf.py

index 145b55bf3e019277d1e8ef958aacb90015cad737..73dc62c494e4a69f5499841164dce0c90aaa3656 100644 (file)
--- a/youtube_dl/extractor/gputechconf.py
+++ b/youtube_dl/extractor/gputechconf.py
@@ -2,12 +2,6 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import (
-    xpath_element,
-    xpath_text,
-    int_or_none,
-    parse_duration,
-)
  
  
  class GPUTechConfIE(InfoExtractor):
@@ -27,29 +21,15 @@ class GPUTechConfIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        root_path = self._search_regex(r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path', 'http://evt.dispeak.com/nvidia/events/gtc15/')
-        xml_file_id = self._search_regex(r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
-
-        doc = self._download_xml('%sxml/%s.xml' % (root_path, xml_file_id), video_id)
-
-        metadata = xpath_element(doc, 'metadata')
-        http_host = xpath_text(metadata, 'httpHost', 'http host', True)
-        mbr_videos = xpath_element(metadata, 'MBRVideos')
-
-        formats = []
-        for mbr_video in mbr_videos.findall('MBRVideo'):
-            stream_name = xpath_text(mbr_video, 'streamName')
-            if stream_name:
-                formats.append({
-                    'url': 'http://%s/%s' % (http_host, stream_name.replace('mp4:', '')),
-                    'tbr': int_or_none(xpath_text(mbr_video, 'bitrate')),
-                })
-        self._sort_formats(formats)
+        root_path = self._search_regex(
+            r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path',
+            default='http://evt.dispeak.com/nvidia/events/gtc15/')
+        xml_file_id = self._search_regex(
+            r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
  
          return {
+            '_type': 'url_transparent',
              'id': video_id,
-            'title': xpath_text(metadata, 'title'),
-            'duration': parse_duration(xpath_text(metadata, 'endTime')),
-            'creator': xpath_text(metadata, 'speaker'),
-            'formats': formats,
+            'url': '%sxml/%s.xml' % (root_path, xml_file_id),
+            'ie_key': 'DigitallySpeaking',
          }
diff --git a/youtube_dl/extractor/groupon.py b/youtube_dl/extractor/groupon.py

index 63c05b6a6f96dfa4437f15cd77524ab3d89e1018..a6da909310a5591fe39a68244142a46fb24ce65d 100644 (file)
--- a/youtube_dl/extractor/groupon.py
+++ b/youtube_dl/extractor/groupon.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class GrouponIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.groupon\.com/deals/(?P<id>[^?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?groupon\.com/deals/(?P<id>[^/?#&]+)'
  
      _TEST = {
          'url': 'https://www.groupon.com/deals/bikram-yoga-huntington-beach-2#ooid=tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
@@ -14,17 +14,27 @@ class GrouponIE(InfoExtractor):
              'description': 'Studio kept at 105 degrees and 40% humidity with anti-microbial and anti-slip Flotex flooring; certified instructors',
          },
          'playlist': [{
+            'md5': '42428ce8a00585f9bc36e49226eae7a1',
              'info_dict': {
-                'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
+                'id': 'fk6OhWpXgIQ',
                  'ext': 'mp4',
-                'title': 'Bikram Yoga Huntington Beach | Orange County',
+                'title': 'Bikram Yoga Huntington Beach | Orange County !tubGNycTo@9Uxg82uESj4i61EYX8nyuf',
                  'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
-                'duration': 44.961,
+                'duration': 45,
+                'upload_date': '20160405',
+                'uploader_id': 'groupon',
+                'uploader': 'Groupon',
              },
+            'add_ie': ['Youtube'],
          }],
          'params': {
-            'skip_download': 'HLS',
-        }
+            'skip_download': True,
+        },
+    }
+
+    _PROVIDERS = {
+        'ooyala': ('ooyala:%s', 'Ooyala'),
+        'youtube': ('%s', 'Youtube'),
      }
  
      def _real_extract(self, url):
@@ -32,16 +42,21 @@ class GrouponIE(InfoExtractor):
          webpage = self._download_webpage(url, playlist_id)
  
          payload = self._parse_json(self._search_regex(
-            r'var\s+payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
+            r'(?:var\s+|window\.)payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
          videos = payload['carousel'].get('dealVideos', [])
          entries = []
          for v in videos:
-            if v.get('provider') != 'OOYALA':
+            provider = v.get('provider')
+            video_id = v.get('media') or v.get('id') or v.get('baseURL')
+            if not provider or not video_id:
+                continue
+            url_pattern, ie_key = self._PROVIDERS.get(provider.lower())
+            if not url_pattern:
                  self.report_warning(
                      '%s: Unsupported video provider %s, skipping video' %
-                    (playlist_id, v.get('provider')))
+                    (playlist_id, provider))
                  continue
-            entries.append(self.url_result('ooyala:%s' % v['media']))
+            entries.append(self.url_result(url_pattern % video_id, ie_key))
  
          return {
              '_type': 'playlist',
diff --git a/youtube_dl/extractor/hark.py b/youtube_dl/extractor/hark.py

index b6cc15b6fbad25c43fe0699668bd3ec452ed944d..342a6130ea10325d4b7e7ecee7ee86b130e90173 100644 (file)
--- a/youtube_dl/extractor/hark.py
+++ b/youtube_dl/extractor/hark.py
@@ -1,11 +1,11 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
  
  
  class HarkIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.hark\.com/clips/(?P<id>.+?)-.+'
+    _VALID_URL = r'https?://(?:www\.)?hark\.com/clips/(?P<id>.+?)-.+'
      _TEST = {
          'url': 'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
          'md5': '6783a58491b47b92c7c1af5a77d4cbee',
diff --git a/youtube_dl/extractor/hbo.py b/youtube_dl/extractor/hbo.py

index dad0f3994c93cd0a38a2be52741d7c55ce0e6749..cbf774377b7261c326bd71f5db2d5de8216be5f4 100644 (file)
--- a/youtube_dl/extractor/hbo.py
+++ b/youtube_dl/extractor/hbo.py
@@ -12,17 +12,7 @@ from ..utils import (
  )
  
  
-class HBOIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?hbo\.com/video/video\.html\?.*vid=(?P<id>[0-9]+)'
-    _TEST = {
-        'url': 'http://www.hbo.com/video/video.html?autoplay=true&g=u&vid=1437839',
-        'md5': '1c33253f0c7782142c993c0ba62a8753',
-        'info_dict': {
-            'id': '1437839',
-            'ext': 'mp4',
-            'title': 'Ep. 64 Clip: Encryption',
-        }
-    }
+class HBOBaseIE(InfoExtractor):
      _FORMATS_INFO = {
          '1920': {
              'width': 1280,
@@ -50,8 +40,7 @@ class HBOIE(InfoExtractor):
          },
      }
  
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
+    def _extract_from_id(self, video_id):
          video_data = self._download_xml(
              'http://render.lv3.hbo.com/data/content/global/videos/data/%s.xml' % video_id, video_id)
          title = xpath_text(video_data, 'title', 'title', True)
@@ -116,7 +105,60 @@ class HBOIE(InfoExtractor):
          return {
              'id': video_id,
              'title': title,
-            'duration': parse_duration(xpath_element(video_data, 'duration/tv14')),
+            'duration': parse_duration(xpath_text(video_data, 'duration/tv14')),
              'formats': formats,
              'thumbnails': thumbnails,
          }
+
+
+class HBOIE(HBOBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?hbo\.com/video/video\.html\?.*vid=(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.hbo.com/video/video.html?autoplay=true&g=u&vid=1437839',
+        'md5': '1c33253f0c7782142c993c0ba62a8753',
+        'info_dict': {
+            'id': '1437839',
+            'ext': 'mp4',
+            'title': 'Ep. 64 Clip: Encryption',
+            'thumbnail': 're:https?://.*\.jpg$',
+            'duration': 1072,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self._extract_from_id(video_id)
+
+
+class HBOEpisodeIE(HBOBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?hbo\.com/(?!video)([^/]+/)+video/(?P<id>[0-9a-z-]+)\.html'
+
+    _TESTS = [{
+        'url': 'http://www.hbo.com/girls/episodes/5/52-i-love-you-baby/video/ep-52-inside-the-episode.html?autoplay=true',
+        'md5': '689132b253cc0ab7434237fc3a293210',
+        'info_dict': {
+            'id': '1439518',
+            'display_id': 'ep-52-inside-the-episode',
+            'ext': 'mp4',
+            'title': 'Ep. 52: Inside the Episode',
+            'thumbnail': 're:https?://.*\.jpg$',
+            'duration': 240,
+        },
+    }, {
+        'url': 'http://www.hbo.com/game-of-thrones/about/video/season-5-invitation-to-the-set.html?autoplay=true',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        video_id = self._search_regex(
+            r'(?P<q1>[\'"])videoId(?P=q1)\s*:\s*(?P<q2>[\'"])(?P<video_id>\d+)(?P=q2)',
+            webpage, 'video ID', group='video_id')
+
+        info_dict = self._extract_from_id(video_id)
+        info_dict['display_id'] = display_id
+
+        return info_dict
diff --git a/youtube_dl/extractor/hearthisat.py b/youtube_dl/extractor/hearthisat.py

index 7d8698655666f8de4e8850ac2684a16dd28810af..2564538820e7d534adc24fd8c967ee44490e0dc3 100644 (file)
--- a/youtube_dl/extractor/hearthisat.py
+++ b/youtube_dl/extractor/hearthisat.py
@@ -7,6 +7,7 @@ from .common import InfoExtractor
  from ..compat import compat_urlparse
  from ..utils import (
      HEADRequest,
+    KNOWN_EXTENSIONS,
      sanitized_Request,
      str_to_int,
      urlencode_postdata,
@@ -17,7 +18,7 @@ from ..utils import (
  class HearThisAtIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?hearthis\.at/(?P<artist>[^/]+)/(?P<title>[A-Za-z0-9\-]+)/?$'
      _PLAYLIST_URL = 'https://hearthis.at/playlist.php'
-    _TEST = {
+    _TESTS = [{
          'url': 'https://hearthis.at/moofi/dr-kreep',
          'md5': 'ab6ec33c8fed6556029337c7885eb4e0',
          'info_dict': {
@@ -26,7 +27,7 @@ class HearThisAtIE(InfoExtractor):
              'title': 'Moofi - Dr. Kreep',
              'thumbnail': 're:^https?://.*\.jpg$',
              'timestamp': 1421564134,
-            'description': 'Creepy Patch. Mutable Instruments Braids Vowel + Formant Mode.',
+            'description': 'Listen to Dr. Kreep by Moofi on hearthis.at - Modular, Eurorack, Mutable Intruments Braids, Valhalla-DSP',
              'upload_date': '20150118',
              'comment_count': int,
              'view_count': int,
@@ -34,7 +35,25 @@ class HearThisAtIE(InfoExtractor):
              'duration': 71,
              'categories': ['Experimental'],
          }
-    }
+    }, {
+        # 'download' link redirects to the original webpage
+        'url': 'https://hearthis.at/twitchsf/dj-jim-hopkins-totally-bitchin-80s-dance-mix/',
+        'md5': '5980ceb7c461605d30f1f039df160c6e',
+        'info_dict': {
+            'id': '811296',
+            'ext': 'mp3',
+            'title': 'TwitchSF - DJ Jim Hopkins -  Totally Bitchin\' 80\'s Dance Mix!',
+            'description': 'Listen to DJ Jim Hopkins -  Totally Bitchin\' 80\'s Dance Mix! by TwitchSF on hearthis.at - Dance',
+            'upload_date': '20160328',
+            'timestamp': 1459186146,
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'comment_count': int,
+            'view_count': int,
+            'like_count': int,
+            'duration': 4360,
+            'categories': ['Dance'],
+        },
+    }]
  
      def _real_extract(self, url):
          m = re.match(self._VALID_URL, url)
@@ -90,13 +109,14 @@ class HearThisAtIE(InfoExtractor):
              ext_handle = self._request_webpage(
                  ext_req, display_id, note='Determining extension')
              ext = urlhandle_detect_ext(ext_handle)
-            formats.append({
-                'format_id': 'download',
-                'vcodec': 'none',
-                'ext': ext,
-                'url': download_url,
-                'preference': 2,  # Usually better quality
-            })
+            if ext in KNOWN_EXTENSIONS:
+                formats.append({
+                    'format_id': 'download',
+                    'vcodec': 'none',
+                    'ext': ext,
+                    'url': download_url,
+                    'preference': 2,  # Usually better quality
+                })
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/helsinki.py b/youtube_dl/extractor/helsinki.py

index 93107b3064ebfba513b3aa208556b5822f6cf979..575fb332a055465446fc5db9448313ec793d3258 100644 (file)
--- a/youtube_dl/extractor/helsinki.py
+++ b/youtube_dl/extractor/helsinki.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  
  from __future__ import unicode_literals
  
diff --git a/youtube_dl/extractor/hgtv.py b/youtube_dl/extractor/hgtv.py

new file mode 100644 (file)

index 0000000..69543bf
--- /dev/null
+++ b/youtube_dl/extractor/hgtv.py
@@ -0,0 +1,79 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    js_to_json,
+    smuggle_url,
+)
+
+
+class HGTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?hgtv\.ca/[^/]+/video/(?P<id>[^/]+)/video.html'
+    _TEST = {
+        'url': 'http://www.hgtv.ca/homefree/video/overnight-success/video.html?v=738081859718&p=1&s=da#video',
+        'md5': '',
+        'info_dict': {
+            'id': 'aFH__I_5FBOX',
+            'ext': 'mp4',
+            'title': 'Overnight Success',
+            'description': 'After weeks of hard work, high stakes, breakdowns and pep talks, the final 2 contestants compete to win the ultimate dream.',
+            'uploader': 'SHWM-NEW',
+            'timestamp': 1470320034,
+            'upload_date': '20160804',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        embed_vars = self._parse_json(self._search_regex(
+            r'(?s)embed_vars\s*=\s*({.*?});',
+            webpage, 'embed vars'), display_id, js_to_json)
+        return {
+            '_type': 'url_transparent',
+            'url': smuggle_url(
+                'http://link.theplatform.com/s/dtjsEC/%s?mbr=true&manifest=m3u' % embed_vars['pid'], {
+                    'force_smil_url': True
+                }),
+            'series': embed_vars.get('show'),
+            'season_number': int_or_none(embed_vars.get('season')),
+            'episode_number': int_or_none(embed_vars.get('episode')),
+            'ie_key': 'ThePlatform',
+        }
+
+
+class HGTVComShowIE(InfoExtractor):
+    IE_NAME = 'hgtv.com:show'
+    _VALID_URL = r'https?://(?:www\.)?hgtv\.com/shows/[^/]+/(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'http://www.hgtv.com/shows/flip-or-flop/flip-or-flop-full-episodes-videos',
+        'info_dict': {
+            'id': 'flip-or-flop-full-episodes-videos',
+            'title': 'Flip or Flop Full Episodes',
+        },
+        'playlist_mincount': 15,
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        config = self._parse_json(
+            self._search_regex(
+                r'(?s)data-module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
+                webpage, 'video config'),
+            display_id)['channels'][0]
+
+        entries = [
+            self.url_result(video['releaseUrl'])
+            for video in config['videos'] if video.get('releaseUrl')]
+
+        return self.playlist_result(
+            entries, display_id, config.get('title'), config.get('description'))
diff --git a/youtube_dl/extractor/hornbunny.py b/youtube_dl/extractor/hornbunny.py

index 5b6efb27eedfe0097cc47d96f3f287ab1858e9e8..0615f06af4139acbd3164f5aaac1ab2ede4cdc27 100644 (file)
--- a/youtube_dl/extractor/hornbunny.py
+++ b/youtube_dl/extractor/hornbunny.py
@@ -1,8 +1,6 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
@@ -14,29 +12,24 @@ class HornBunnyIE(InfoExtractor):
      _VALID_URL = r'http?://(?:www\.)?hornbunny\.com/videos/(?P<title_dash>[a-z-]+)-(?P<id>\d+)\.html'
      _TEST = {
          'url': 'http://hornbunny.com/videos/panty-slut-jerk-off-instruction-5227.html',
-        'md5': '95e40865aedd08eff60272b704852ad7',
+        'md5': 'e20fd862d1894b67564c96f180f43924',
          'info_dict': {
              'id': '5227',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'panty slut jerk off instruction',
              'duration': 550,
              'age_limit': 18,
+            'view_count': int,
+            'thumbnail': 're:^https?://.*\.jpg$',
          }
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        webpage = self._download_webpage(
-            url, video_id, note='Downloading initial webpage')
-        title = self._html_search_regex(
-            r'class="title">(.*?)</h2>', webpage, 'title')
-        redirect_url = self._html_search_regex(
-            r'pg&settings=(.*?)\|0"\);', webpage, 'title')
-        webpage2 = self._download_webpage(redirect_url, video_id)
-        video_url = self._html_search_regex(
-            r'flvMask:(.*?);', webpage2, 'video_url')
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+        title = self._og_search_title(webpage)
+        info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
  
          duration = parse_duration(self._search_regex(
              r'<strong>Runtime:</strong>\s*([0-9:]+)</div>',
@@ -45,12 +38,12 @@ class HornBunnyIE(InfoExtractor):
              r'<strong>Views:</strong>\s*(\d+)</div>',
              webpage, 'view count', fatal=False))
  
-        return {
+        info_dict.update({
              'id': video_id,
-            'url': video_url,
              'title': title,
-            'ext': 'flv',
              'duration': duration,
              'view_count': view_count,
              'age_limit': 18,
-        }
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/hotnewhiphop.py b/youtube_dl/extractor/hotnewhiphop.py

index 9db5652096acc5ead0cb926791d731d0f6f35565..34163725f8c9562380a3ea30a17780e599f3b0a7 100644 (file)
--- a/youtube_dl/extractor/hotnewhiphop.py
+++ b/youtube_dl/extractor/hotnewhiphop.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  
  class HotNewHipHopIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
+    _VALID_URL = r'https?://(?:www\.)?hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
      _TEST = {
          'url': 'http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html',
          'md5': '2c2cd2f76ef11a9b3b581e8b232f3d96',
diff --git a/youtube_dl/extractor/howcast.py b/youtube_dl/extractor/howcast.py

index e8f51e545bfd2b89a251e1a4fbbeefe80aa371f9..7e36b85ad586984dfb761e4518b23d2b4a074bf7 100644 (file)
--- a/youtube_dl/extractor/howcast.py
+++ b/youtube_dl/extractor/howcast.py
@@ -8,7 +8,7 @@ class HowcastIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?howcast\.com/videos/(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.howcast.com/videos/390161-How-to-Tie-a-Square-Knot-Properly',
-        'md5': '8b743df908c42f60cf6496586c7f12c3',
+        'md5': '7d45932269a288149483144f01b99789',
          'info_dict': {
              'id': '390161',
              'ext': 'mp4',
@@ -19,9 +19,9 @@ class HowcastIE(InfoExtractor):
              'duration': 56.823,
          },
          'params': {
-            # m3u8 download
              'skip_download': True,
          },
+        'add_ie': ['Ooyala'],
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/howstuffworks.py b/youtube_dl/extractor/howstuffworks.py

index 663e6632a194d8ee271a0c031a921d7eed139005..65ba2a48b069bd67d2b3382f2d87bc1160145612 100644 (file)
--- a/youtube_dl/extractor/howstuffworks.py
+++ b/youtube_dl/extractor/howstuffworks.py
@@ -6,6 +6,7 @@ from ..utils import (
      int_or_none,
      js_to_json,
      unescapeHTML,
+    determine_ext,
  )
  
  
@@ -23,6 +24,7 @@ class HowStuffWorksIE(InfoExtractor):
                  'thumbnail': 're:^https?://.*\.jpg$',
                  'duration': 161,
              },
+            'skip': 'Video broken',
          },
          {
              'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',
@@ -39,7 +41,7 @@ class HowStuffWorksIE(InfoExtractor):
              'url': 'http://entertainment.howstuffworks.com/arts/2706-sword-swallowing-1-by-dan-meyer-video.htm',
              'info_dict': {
                  'id': '440011',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Sword Swallowing #1 by Dan Meyer',
                  'description': 'Video footage (1 of 3) used by permission of the owner Dan Meyer through Sword Swallowers Association International <www.swordswallow.org>',
                  'display_id': 'sword-swallowing-1-by-dan-meyer',
@@ -63,13 +65,19 @@ class HowStuffWorksIE(InfoExtractor):
          video_id = clip_info['content_id']
          formats = []
          m3u8_url = clip_info.get('m3u8')
-        if m3u8_url:
-            formats += self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
+        if m3u8_url and determine_ext(m3u8_url) == 'm3u8':
+            formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', format_id='hls', fatal=True))
+        flv_url = clip_info.get('flv_url')
+        if flv_url:
+            formats.append({
+                'url': flv_url,
+                'format_id': 'flv',
+            })
          for video in clip_info.get('mp4', []):
              formats.append({
                  'url': video['src'],
-                'format_id': video['bitrate'],
-                'vbr': int(video['bitrate'].rstrip('k')),
+                'format_id': 'mp4-%s' % video['bitrate'],
+                'vbr': int_or_none(video['bitrate'].rstrip('k')),
              })
  
          if not formats:
@@ -102,6 +110,6 @@ class HowStuffWorksIE(InfoExtractor):
              'title': unescapeHTML(clip_info['clip_title']),
              'description': unescapeHTML(clip_info.get('caption')),
              'thumbnail': clip_info.get('video_still_url'),
-            'duration': clip_info.get('duration'),
+            'duration': int_or_none(clip_info.get('duration')),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/hrti.py b/youtube_dl/extractor/hrti.py

new file mode 100644 (file)

index 0000000..656ce6d
--- /dev/null
+++ b/youtube_dl/extractor/hrti.py
@@ -0,0 +1,202 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_HTTPError
+from ..utils import (
+    clean_html,
+    ExtractorError,
+    int_or_none,
+    parse_age_limit,
+    sanitized_Request,
+    try_get,
+)
+
+
+class HRTiBaseIE(InfoExtractor):
+    """
+        Base Information Extractor for Croatian Radiotelevision
+        video on demand site https://hrti.hrt.hr
+        Reverse engineered from the JavaScript app in app.min.js
+    """
+    _NETRC_MACHINE = 'hrti'
+
+    _APP_LANGUAGE = 'hr'
+    _APP_VERSION = '1.1'
+    _APP_PUBLICATION_ID = 'all_in_one'
+    _API_URL = 'http://clientapi.hrt.hr/client_api.php/config/identify/format/json'
+
+    def _initialize_api(self):
+        init_data = {
+            'application_publication_id': self._APP_PUBLICATION_ID
+        }
+
+        uuid = self._download_json(
+            self._API_URL, None, note='Downloading uuid',
+            errnote='Unable to download uuid',
+            data=json.dumps(init_data).encode('utf-8'))['uuid']
+
+        app_data = {
+            'uuid': uuid,
+            'application_publication_id': self._APP_PUBLICATION_ID,
+            'application_version': self._APP_VERSION
+        }
+
+        req = sanitized_Request(self._API_URL, data=json.dumps(app_data).encode('utf-8'))
+        req.get_method = lambda: 'PUT'
+
+        resources = self._download_json(
+            req, None, note='Downloading session information',
+            errnote='Unable to download session information')
+
+        self._session_id = resources['session_id']
+
+        modules = resources['modules']
+
+        self._search_url = modules['vod_catalog']['resources']['search']['uri'].format(
+            language=self._APP_LANGUAGE,
+            application_id=self._APP_PUBLICATION_ID)
+
+        self._login_url = (modules['user']['resources']['login']['uri'] +
+                           '/format/json').format(session_id=self._session_id)
+
+        self._logout_url = modules['user']['resources']['logout']['uri']
+
+    def _login(self):
+        (username, password) = self._get_login_info()
+        # TODO: figure out authentication with cookies
+        if username is None or password is None:
+            self.raise_login_required()
+
+        auth_data = {
+            'username': username,
+            'password': password,
+        }
+
+        try:
+            auth_info = self._download_json(
+                self._login_url, None, note='Logging in', errnote='Unable to log in',
+                data=json.dumps(auth_data).encode('utf-8'))
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 406:
+                auth_info = self._parse_json(e.cause.read().encode('utf-8'), None)
+            else:
+                raise
+
+        error_message = auth_info.get('error', {}).get('message')
+        if error_message:
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, error_message),
+                expected=True)
+
+        self._token = auth_info['secure_streaming_token']
+
+    def _real_initialize(self):
+        self._initialize_api()
+        self._login()
+
+
+class HRTiIE(HRTiBaseIE):
+    _VALID_URL = r'''(?x)
+                        (?:
+                            hrti:(?P<short_id>[0-9]+)|
+                            https?://
+                                hrti\.hrt\.hr/\#/video/show/(?P<id>[0-9]+)/(?P<display_id>[^/]+)?
+                        )
+                    '''
+    _TESTS = [{
+        'url': 'https://hrti.hrt.hr/#/video/show/2181385/republika-dokumentarna-serija-16-hd',
+        'info_dict': {
+            'id': '2181385',
+            'display_id': 'republika-dokumentarna-serija-16-hd',
+            'ext': 'mp4',
+            'title': 'REPUBLIKA, dokumentarna serija (1/6) (HD)',
+            'description': 'md5:48af85f620e8e0e1df4096270568544f',
+            'duration': 2922,
+            'view_count': int,
+            'average_rating': int,
+            'episode_number': int,
+            'season_number': int,
+            'age_limit': 12,
+        },
+        'skip': 'Requires account credentials',
+    }, {
+        'url': 'https://hrti.hrt.hr/#/video/show/2181385/',
+        'only_matching': True,
+    }, {
+        'url': 'hrti:2181385',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('short_id') or mobj.group('id')
+        display_id = mobj.group('display_id') or video_id
+
+        video = self._download_json(
+            '%s/video_id/%s/format/json' % (self._search_url, video_id),
+            display_id, 'Downloading video metadata JSON')['video'][0]
+
+        title_info = video['title']
+        title = title_info['title_long']
+
+        movie = video['video_assets']['movie'][0]
+        m3u8_url = movie['url'].format(TOKEN=self._token)
+        formats = self._extract_m3u8_formats(
+            m3u8_url, display_id, 'mp4', entry_protocol='m3u8_native',
+            m3u8_id='hls')
+        self._sort_formats(formats)
+
+        description = clean_html(title_info.get('summary_long'))
+        age_limit = parse_age_limit(video.get('parental_control', {}).get('rating'))
+        view_count = int_or_none(video.get('views'))
+        average_rating = int_or_none(video.get('user_rating'))
+        duration = int_or_none(movie.get('duration'))
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'view_count': view_count,
+            'average_rating': average_rating,
+            'age_limit': age_limit,
+            'formats': formats,
+        }
+
+
+class HRTiPlaylistIE(HRTiBaseIE):
+    _VALID_URL = r'https?://hrti.hrt.hr/#/video/list/category/(?P<id>[0-9]+)/(?P<display_id>[^/]+)?'
+    _TESTS = [{
+        'url': 'https://hrti.hrt.hr/#/video/list/category/212/ekumena',
+        'info_dict': {
+            'id': '212',
+            'title': 'ekumena',
+        },
+        'playlist_mincount': 8,
+        'skip': 'Requires account credentials',
+    }, {
+        'url': 'https://hrti.hrt.hr/#/video/list/category/212/',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        category_id = mobj.group('id')
+        display_id = mobj.group('display_id') or category_id
+
+        response = self._download_json(
+            '%s/category_id/%s/format/json' % (self._search_url, category_id),
+            display_id, 'Downloading video metadata JSON')
+
+        video_ids = try_get(
+            response, lambda x: x['video_listings'][0]['alternatives'][0]['list'],
+            list) or [video['id'] for video in response.get('videos', []) if video.get('id')]
+
+        entries = [self.url_result('hrti:%s' % video_id) for video_id in video_ids]
+
+        return self.playlist_result(entries, category_id, display_id)
diff --git a/youtube_dl/extractor/huajiao.py b/youtube_dl/extractor/huajiao.py

new file mode 100644 (file)

index 0000000..cec0df0
--- /dev/null
+++ b/youtube_dl/extractor/huajiao.py
@@ -0,0 +1,56 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    parse_duration,
+    parse_iso8601,
+)
+
+
+class HuajiaoIE(InfoExtractor):
+    IE_DESC = '花椒直播'
+    _VALID_URL = r'https?://(?:www\.)?huajiao\.com/l/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.huajiao.com/l/38941232',
+        'md5': 'd08bf9ac98787d24d1e4c0283f2d372d',
+        'info_dict': {
+            'id': '38941232',
+            'ext': 'mp4',
+            'title': '#新人求关注#',
+            'description': 're:.*',
+            'duration': 2424.0,
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'timestamp': 1475866459,
+            'upload_date': '20161007',
+            'uploader': 'Penny_余姿昀',
+            'uploader_id': '75206005',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        feed_json = self._search_regex(
+            r'var\s+feed\s*=\s*({.+})', webpage, 'feed json')
+        feed = self._parse_json(feed_json, video_id)
+
+        description = self._html_search_meta(
+            'description', webpage, 'description', fatal=False)
+
+        def get(section, field):
+            return feed.get(section, {}).get(field)
+
+        return {
+            'id': video_id,
+            'title': feed['feed']['formated_title'],
+            'description': description,
+            'duration': parse_duration(get('feed', 'duration')),
+            'thumbnail': get('feed', 'image'),
+            'timestamp': parse_iso8601(feed.get('creatime'), ' '),
+            'uploader': get('author', 'nickname'),
+            'uploader_id': get('author', 'uid'),
+            'formats': self._extract_m3u8_formats(
+                feed['feed']['m3u8'], video_id, 'mp4', 'm3u8_native'),
+        }
diff --git a/youtube_dl/extractor/huffpost.py b/youtube_dl/extractor/huffpost.py

index a38eae421a9199b578b3a724d205b13e6367c67a..059073749e67605464b6159b9391f71eb5a6052d 100644 (file)
--- a/youtube_dl/extractor/huffpost.py
+++ b/youtube_dl/extractor/huffpost.py
@@ -4,6 +4,7 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      parse_duration,
      unified_strdate,
  )
@@ -29,7 +30,12 @@ class HuffPostIE(InfoExtractor):
              'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more.  ',
              'duration': 1549,
              'upload_date': '20140124',
-        }
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'expected_warnings': ['HTTP Error 404: Not Found'],
      }
  
      def _real_extract(self, url):
@@ -45,7 +51,7 @@ class HuffPostIE(InfoExtractor):
          description = data.get('description')
  
          thumbnails = []
-        for url in data['images'].values():
+        for url in filter(None, data['images'].values()):
              m = re.match('.*-([0-9]+x[0-9]+)\.', url)
              if not m:
                  continue
@@ -54,13 +60,25 @@ class HuffPostIE(InfoExtractor):
                  'resolution': m.group(1),
              })
  
-        formats = [{
-            'format': key,
-            'format_id': key.replace('/', '.'),
-            'ext': 'mp4',
-            'url': url,
-            'vcodec': 'none' if key.startswith('audio/') else None,
-        } for key, url in data.get('sources', {}).get('live', {}).items()]
+        formats = []
+        sources = data.get('sources', {})
+        live_sources = list(sources.get('live', {}).items()) + list(sources.get('live_again', {}).items())
+        for key, url in live_sources:
+            ext = determine_ext(url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    url + '?hdcore=2.9.5', video_id, f4m_id='hds', fatal=False))
+            else:
+                formats.append({
+                    'format': key,
+                    'format_id': key.replace('/', '.'),
+                    'ext': 'mp4',
+                    'url': url,
+                    'vcodec': 'none' if key.startswith('audio/') else None,
+                })
  
          if not formats and data.get('fivemin_id'):
              return self.url_result('5min:%s' % data['fivemin_id'])
diff --git a/youtube_dl/extractor/imdb.py b/youtube_dl/extractor/imdb.py

index 8bed8ccd06e2eeb64eba69f3407c9271c0643731..f0fc8d49a4ad50c128d124534fc37141cb510ba6 100644 (file)
--- a/youtube_dl/extractor/imdb.py
+++ b/youtube_dl/extractor/imdb.py
@@ -1,28 +1,38 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
  from ..utils import (
+    mimetype2ext,
      qualities,
+    remove_end,
  )
  
  
  class ImdbIE(InfoExtractor):
      IE_NAME = 'imdb'
      IE_DESC = 'Internet Movie Database trailers'
-    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-)vi(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.imdb.com/video/imdb/vi2524815897',
          'info_dict': {
              'id': '2524815897',
              'ext': 'mp4',
-            'title': 'Ice Age: Continental Drift Trailer (No. 2) - IMDb',
+            'title': 'Ice Age: Continental Drift Trailer (No. 2)',
              'description': 'md5:9061c2219254e5d14e03c25c98e96a81',
          }
-    }
+    }, {
+        'url': 'http://www.imdb.com/video/_/vi2524815897',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.imdb.com/title/tt1667889/?ref_=ext_shr_eml_vi#lb-vi2524815897',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.imdb.com/title/tt1667889/#lb-vi2524815897',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -48,29 +58,43 @@ class ImdbIE(InfoExtractor):
              json_data = self._search_regex(
                  r'<script[^>]+class="imdb-player-data"[^>]*?>(.*?)</script>',
                  format_page, 'json data', flags=re.DOTALL)
-            info = json.loads(json_data)
-            format_info = info['videoPlayerObject']['video']
-            f_id = format_info['ffname']
+            info = self._parse_json(json_data, video_id, fatal=False)
+            if not info:
+                continue
+            format_info = info.get('videoPlayerObject', {}).get('video', {})
+            if not format_info:
+                continue
+            video_info_list = format_info.get('videoInfoList')
+            if not video_info_list or not isinstance(video_info_list, list):
+                continue
+            video_info = video_info_list[0]
+            if not video_info or not isinstance(video_info, dict):
+                continue
+            video_url = video_info.get('videoUrl')
+            if not video_url:
+                continue
+            format_id = format_info.get('ffname')
              formats.append({
-                'format_id': f_id,
-                'url': format_info['videoInfoList'][0]['videoUrl'],
-                'quality': quality(f_id),
+                'format_id': format_id,
+                'url': video_url,
+                'ext': mimetype2ext(video_info.get('videoMimeType')),
+                'quality': quality(format_id),
              })
          self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'title': self._og_search_title(webpage),
+            'title': remove_end(self._og_search_title(webpage), ' - IMDb'),
              'formats': formats,
              'description': descr,
-            'thumbnail': format_info['slate'],
+            'thumbnail': format_info.get('slate'),
          }
  
  
  class ImdbListIE(InfoExtractor):
      IE_NAME = 'imdb:list'
      IE_DESC = 'Internet Movie Database lists'
-    _VALID_URL = r'https?://www\.imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
+    _VALID_URL = r'https?://(?:www\.)?imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
      _TEST = {
          'url': 'http://www.imdb.com/list/JFs9NWw6XI0',
          'info_dict': {
diff --git a/youtube_dl/extractor/imgur.py b/youtube_dl/extractor/imgur.py

index 85e9344aab18e22be204134ca846318367b54329..67c24a51c861f4dd9a1da8f790d61469c8e2220c 100644 (file)
--- a/youtube_dl/extractor/imgur.py
+++ b/youtube_dl/extractor/imgur.py
@@ -13,7 +13,7 @@ from ..utils import (
  
  
  class ImgurIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|topic/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z]+)?$'
+    _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|(?:topic|r)/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z]+)?$'
  
      _TESTS = [{
          'url': 'https://i.imgur.com/A61SaA1.gifv',
@@ -43,6 +43,9 @@ class ImgurIE(InfoExtractor):
      }, {
          'url': 'http://imgur.com/topic/Funny/N8rOudd',
          'only_matching': True,
+    }, {
+        'url': 'http://imgur.com/r/aww/VQcQPhM',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -50,12 +53,10 @@ class ImgurIE(InfoExtractor):
          webpage = self._download_webpage(
              compat_urlparse.urljoin(url, video_id), video_id)
  
-        width = int_or_none(self._search_regex(
-            r'<param name="width" value="([0-9]+)"',
-            webpage, 'width', fatal=False))
-        height = int_or_none(self._search_regex(
-            r'<param name="height" value="([0-9]+)"',
-            webpage, 'height', fatal=False))
+        width = int_or_none(self._og_search_property(
+            'video:width', webpage, default=None))
+        height = int_or_none(self._og_search_property(
+            'video:height', webpage, default=None))
  
          video_elements = self._search_regex(
              r'(?s)<div class="video-elements">(.*?)</div>',
diff --git a/youtube_dl/extractor/ina.py b/youtube_dl/extractor/ina.py

index 65712abc28c3cc68cab7052ab709b2c1e6500cb5..9544ff9d469c52a932cc3a8fed0dafbed9f4ae83 100644 (file)
--- a/youtube_dl/extractor/ina.py
+++ b/youtube_dl/extractor/ina.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/indavideo.py b/youtube_dl/extractor/indavideo.py

index 9622f198aa6aaf99094a9b85c5a914d4f0c07d46..c6f080484a99f43614f104ead8023e8e57609cda 100644 (file)
--- a/youtube_dl/extractor/indavideo.py
+++ b/youtube_dl/extractor/indavideo.py
@@ -60,7 +60,8 @@ class IndavideoEmbedIE(InfoExtractor):
  
          formats = [{
              'url': video_url,
-            'height': self._search_regex(r'\.(\d{3,4})\.mp4$', video_url, 'height', default=None),
+            'height': int_or_none(self._search_regex(
+                r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None)),
          } for video_url in video_urls]
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/instagram.py b/youtube_dl/extractor/instagram.py

index 4e62098b05fa1ec753f240adc5a27b7da57e2e95..196407b063a9393b94c759be6c8080de9a494277 100644 (file)
--- a/youtube_dl/extractor/instagram.py
+++ b/youtube_dl/extractor/instagram.py
@@ -8,29 +8,44 @@ from ..utils import (
      int_or_none,
      limit_length,
      lowercase_escape,
+    try_get,
  )
  
  
  class InstagramIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+)'
+    _VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
      _TESTS = [{
          'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
          'md5': '0d2da106a9d2631273e192b372806516',
          'info_dict': {
              'id': 'aye83DjauH',
              'ext': 'mp4',
-            'uploader_id': 'naomipq',
              'title': 'Video by naomipq',
              'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
-        }
+            'thumbnail': 're:^https?://.*\.jpg',
+            'timestamp': 1371748545,
+            'upload_date': '20130620',
+            'uploader_id': 'naomipq',
+            'uploader': 'Naomi Leonor Phan-Quang',
+            'like_count': int,
+            'comment_count': int,
+            'comments': list,
+        },
      }, {
          # missing description
          'url': 'https://www.instagram.com/p/BA-pQFBG8HZ/?taken-by=britneyspears',
          'info_dict': {
              'id': 'BA-pQFBG8HZ',
              'ext': 'mp4',
-            'uploader_id': 'britneyspears',
              'title': 'Video by britneyspears',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'timestamp': 1453760977,
+            'upload_date': '20160125',
+            'uploader_id': 'britneyspears',
+            'uploader': 'Britney Spears',
+            'like_count': int,
+            'comment_count': int,
+            'comments': list,
          },
          'params': {
              'skip_download': True,
@@ -38,10 +53,19 @@ class InstagramIE(InfoExtractor):
      }, {
          'url': 'https://instagram.com/p/-Cmh1cukG2/',
          'only_matching': True,
+    }, {
+        'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
+        'only_matching': True,
      }]
  
      @staticmethod
      def _extract_embed_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
          blockquote_el = get_element_by_attribute(
              'class', 'instagram-media', webpage)
          if blockquote_el is None:
@@ -53,24 +77,79 @@ class InstagramIE(InfoExtractor):
              return mobj.group('link')
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        url = mobj.group('url')
  
          webpage = self._download_webpage(url, video_id)
-        uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
-                                         webpage, 'uploader id', fatal=False)
-        desc = self._search_regex(
-            r'"caption":"(.+?)"', webpage, 'description', default=None)
-        if desc is not None:
-            desc = lowercase_escape(desc)
+
+        (video_url, description, thumbnail, timestamp, uploader,
+         uploader_id, like_count, comment_count, height, width) = [None] * 10
+
+        shared_data = self._parse_json(
+            self._search_regex(
+                r'window\._sharedData\s*=\s*({.+?});',
+                webpage, 'shared data', default='{}'),
+            video_id, fatal=False)
+        if shared_data:
+            media = try_get(
+                shared_data, lambda x: x['entry_data']['PostPage'][0]['media'], dict)
+            if media:
+                video_url = media.get('video_url')
+                height = int_or_none(media.get('dimensions', {}).get('height'))
+                width = int_or_none(media.get('dimensions', {}).get('width'))
+                description = media.get('caption')
+                thumbnail = media.get('display_src')
+                timestamp = int_or_none(media.get('date'))
+                uploader = media.get('owner', {}).get('full_name')
+                uploader_id = media.get('owner', {}).get('username')
+                like_count = int_or_none(media.get('likes', {}).get('count'))
+                comment_count = int_or_none(media.get('comments', {}).get('count'))
+                comments = [{
+                    'author': comment.get('user', {}).get('username'),
+                    'author_id': comment.get('user', {}).get('id'),
+                    'id': comment.get('id'),
+                    'text': comment.get('text'),
+                    'timestamp': int_or_none(comment.get('created_at')),
+                } for comment in media.get(
+                    'comments', {}).get('nodes', []) if comment.get('text')]
+
+        if not video_url:
+            video_url = self._og_search_video_url(webpage, secure=False)
+
+        formats = [{
+            'url': video_url,
+            'width': width,
+            'height': height,
+        }]
+
+        if not uploader_id:
+            uploader_id = self._search_regex(
+                r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"',
+                webpage, 'uploader id', fatal=False)
+
+        if not description:
+            description = self._search_regex(
+                r'"caption"\s*:\s*"(.+?)"', webpage, 'description', default=None)
+            if description is not None:
+                description = lowercase_escape(description)
+
+        if not thumbnail:
+            thumbnail = self._og_search_thumbnail(webpage)
  
          return {
              'id': video_id,
-            'url': self._og_search_video_url(webpage, secure=False),
+            'formats': formats,
              'ext': 'mp4',
              'title': 'Video by %s' % uploader_id,
-            'thumbnail': self._og_search_thumbnail(webpage),
+            'description': description,
+            'thumbnail': thumbnail,
+            'timestamp': timestamp,
              'uploader_id': uploader_id,
-            'description': desc,
+            'uploader': uploader,
+            'like_count': like_count,
+            'comment_count': comment_count,
+            'comments': comments,
          }
  
  
@@ -152,7 +231,7 @@ class InstagramUserIE(InfoExtractor):
  
              if not page['items']:
                  break
-            max_id = page['items'][-1]['id']
+            max_id = page['items'][-1]['id'].split('_')[0]
              media_url = (
                  'http://instagram.com/%s/media?max_id=%s' % (
                      uploader_id, max_id))
diff --git a/youtube_dl/extractor/internetvideoarchive.py b/youtube_dl/extractor/internetvideoarchive.py

index e60145b3dc5dc80f921c86a3b03a59cf5844b60e..76cc5ec3ee21450f724564ef0c75f9c08931d2f7 100644 (file)
--- a/youtube_dl/extractor/internetvideoarchive.py
+++ b/youtube_dl/extractor/internetvideoarchive.py
@@ -1,93 +1,100 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..compat import (
+    compat_parse_qs,
      compat_urlparse,
-    compat_urllib_parse_urlencode,
  )
  from ..utils import (
-    xpath_with_ns,
+    determine_ext,
+    int_or_none,
+    xpath_text,
  )
  
  
  class InternetVideoArchiveIE(InfoExtractor):
-    _VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
+    _VALID_URL = r'https?://video\.internetvideoarchive\.net/(?:player|flash/players)/.*?\?.*?publishedid.*?'
  
      _TEST = {
-        'url': 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
+        'url': 'http://video.internetvideoarchive.net/player/6/configuration.ashx?customerid=69249&publishedid=194487&reporttag=vdbetatitle&playerid=641&autolist=0&domain=www.videodetective.com&maxrate=high&minrate=low&socialplayer=false',
          'info_dict': {
-            'id': '452693',
+            'id': '194487',
              'ext': 'mp4',
-            'title': 'SKYFALL',
-            'description': 'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
-            'duration': 152,
+            'title': 'KICK-ASS 2',
+            'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          },
      }
  
      @staticmethod
-    def _build_url(query):
-        return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
+    def _build_json_url(query):
+        return 'http://video.internetvideoarchive.net/player/6/configuration.ashx?' + query
  
      @staticmethod
-    def _clean_query(query):
-        NEEDED_ARGS = ['publishedid', 'customerid']
-        query_dic = compat_urlparse.parse_qs(query)
-        cleaned_dic = dict((k, v[0]) for (k, v) in query_dic.items() if k in NEEDED_ARGS)
-        # Other player ids return m3u8 urls
-        cleaned_dic['playerid'] = '247'
-        cleaned_dic['videokbrate'] = '100000'
-        return compat_urllib_parse_urlencode(cleaned_dic)
+    def _build_xml_url(query):
+        return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
  
      def _real_extract(self, url):
          query = compat_urlparse.urlparse(url).query
-        query_dic = compat_urlparse.parse_qs(query)
+        query_dic = compat_parse_qs(query)
          video_id = query_dic['publishedid'][0]
-        url = self._build_url(query)
  
-        flashconfiguration = self._download_xml(url, video_id,
-                                                'Downloading flash configuration')
-        file_url = flashconfiguration.find('file').text
-        file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
-        # Replace some of the parameters in the query to get the best quality
-        # and http links (no m3u8 manifests)
-        file_url = re.sub(r'(?<=\?)(.+)$',
-                          lambda m: self._clean_query(m.group()),
-                          file_url)
-        info = self._download_xml(file_url, video_id,
-                                  'Downloading video info')
-        item = info.find('channel/item')
+        if '/player/' in url:
+            configuration = self._download_json(url, video_id)
+
+            # There are multiple videos in the playlist whlie only the first one
+            # matches the video played in browsers
+            video_info = configuration['playlist'][0]
+            title = video_info['title']
+
+            formats = []
+            for source in video_info['sources']:
+                file_url = source['file']
+                if determine_ext(file_url) == 'm3u8':
+                    m3u8_formats = self._extract_m3u8_formats(
+                        file_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
+                    if m3u8_formats:
+                        formats.extend(m3u8_formats)
+                        file_url = m3u8_formats[0]['url']
+                        formats.extend(self._extract_f4m_formats(
+                            file_url.replace('.m3u8', '.f4m'),
+                            video_id, f4m_id='hds', fatal=False))
+                        formats.extend(self._extract_mpd_formats(
+                            file_url.replace('.m3u8', '.mpd'),
+                            video_id, mpd_id='dash', fatal=False))
+                else:
+                    a_format = {
+                        'url': file_url,
+                    }
+
+                    if source.get('label') and source['label'][-4:] == ' kbs':
+                        tbr = int_or_none(source['label'][:-4])
+                        a_format.update({
+                            'tbr': tbr,
+                            'format_id': 'http-%d' % tbr,
+                        })
+                        formats.append(a_format)
  
-        def _bp(p):
-            return xpath_with_ns(
-                p,
-                {
-                    'media': 'http://search.yahoo.com/mrss/',
-                    'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats',
-                }
-            )
-        formats = []
-        for content in item.findall(_bp('media:group/media:content')):
-            attr = content.attrib
-            f_url = attr['url']
-            width = int(attr['width'])
-            bitrate = int(attr['bitrate'])
-            format_id = '%d-%dk' % (width, bitrate)
-            formats.append({
-                'format_id': format_id,
-                'url': f_url,
-                'width': width,
-                'tbr': bitrate,
-            })
+            self._sort_formats(formats)
  
-        self._sort_formats(formats)
+            description = video_info.get('description')
+            thumbnail = video_info.get('image')
+        else:
+            configuration = self._download_xml(url, video_id)
+            formats = [{
+                'url': xpath_text(configuration, './file', 'file URL', fatal=True),
+            }]
+            thumbnail = xpath_text(configuration, './image', 'thumbnail')
+            title = 'InternetVideoArchive video %s' % video_id
+            description = None
  
          return {
              'id': video_id,
-            'title': item.find('title').text,
+            'title': title,
              'formats': formats,
-            'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
-            'description': item.find('description').text,
-            'duration': int(attr['duration']),
+            'thumbnail': thumbnail,
+            'description': description,
          }
diff --git a/youtube_dl/extractor/iprima.py b/youtube_dl/extractor/iprima.py

index 788bbe0d5c44177b5a943da9f9c3c3adf46a77b1..da2cdc656ac90f15a575eceabf33309b084c8f28 100644 (file)
--- a/youtube_dl/extractor/iprima.py
+++ b/youtube_dl/extractor/iprima.py
@@ -81,6 +81,9 @@ class IPrimaIE(InfoExtractor):
              for _, src in re.findall(r'src["\']\s*:\s*(["\'])(.+?)\1', playerpage):
                  extract_formats(src)
  
+        if not formats and '>GEO_IP_NOT_ALLOWED<' in playerpage:
+            self.raise_geo_restricted()
+
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/iqiyi.py b/youtube_dl/extractor/iqiyi.py

index 9e8c9432a6947ad2ad1866e257e577c98c3ac38b..01c7b30428f8a750c9932b0ea734f795c09866d6 100644 (file)
--- a/youtube_dl/extractor/iqiyi.py
+++ b/youtube_dl/extractor/iqiyi.py
@@ -3,28 +3,22 @@ from __future__ import unicode_literals
  
  import hashlib
  import itertools
-import math
-import os
-import random
  import re
  import time
-import uuid
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_parse_qs,
      compat_str,
      compat_urllib_parse_urlencode,
-    compat_urllib_parse_urlparse,
  )
  from ..utils import (
+    clean_html,
      decode_packed_codes,
+    get_element_by_id,
+    get_element_by_attribute,
      ExtractorError,
      ohdave_rsa_encrypt,
      remove_start,
-    sanitized_Request,
-    urlencode_postdata,
-    url_basename,
  )
  
  
@@ -165,76 +159,27 @@ class IqiyiIE(InfoExtractor):
      IE_NAME = 'iqiyi'
      IE_DESC = '爱奇艺'
  
-    _VALID_URL = r'https?://(?:[^.]+\.)?iqiyi\.com/.+\.html'
+    _VALID_URL = r'https?://(?:(?:[^.]+\.)?iqiyi\.com|www\.pps\.tv)/.+\.html'
  
      _NETRC_MACHINE = 'iqiyi'
  
      _TESTS = [{
          'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
-        'md5': '2cb594dc2781e6c941a110d8f358118b',
+        # MD5 checksum differs on my machine and Travis CI
          'info_dict': {
              'id': '9c1fb1b99d192b21c559e5a1a2cb3c73',
+            'ext': 'mp4',
              'title': '美国德州空中惊现奇异云团 酷似UFO',
-            'ext': 'f4v',
          }
      }, {
          'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
+        'md5': '667171934041350c5de3f5015f7f1152',
          'info_dict': {
              'id': 'e3f585b550a280af23c98b6cb2be19fb',
-            'title': '名侦探柯南第752集',
-        },
-        'playlist': [{
-            'info_dict': {
-                'id': 'e3f585b550a280af23c98b6cb2be19fb_part1',
-                'ext': 'f4v',
-                'title': '名侦探柯南第752集',
-            },
-        }, {
-            'info_dict': {
-                'id': 'e3f585b550a280af23c98b6cb2be19fb_part2',
-                'ext': 'f4v',
-                'title': '名侦探柯南第752集',
-            },
-        }, {
-            'info_dict': {
-                'id': 'e3f585b550a280af23c98b6cb2be19fb_part3',
-                'ext': 'f4v',
-                'title': '名侦探柯南第752集',
-            },
-        }, {
-            'info_dict': {
-                'id': 'e3f585b550a280af23c98b6cb2be19fb_part4',
-                'ext': 'f4v',
-                'title': '名侦探柯南第752集',
-            },
-        }, {
-            'info_dict': {
-                'id': 'e3f585b550a280af23c98b6cb2be19fb_part5',
-                'ext': 'f4v',
-                'title': '名侦探柯南第752集',
-            },
-        }, {
-            'info_dict': {
-                'id': 'e3f585b550a280af23c98b6cb2be19fb_part6',
-                'ext': 'f4v',
-                'title': '名侦探柯南第752集',
-            },
-        }, {
-            'info_dict': {
-                'id': 'e3f585b550a280af23c98b6cb2be19fb_part7',
-                'ext': 'f4v',
-                'title': '名侦探柯南第752集',
-            },
-        }, {
-            'info_dict': {
-                'id': 'e3f585b550a280af23c98b6cb2be19fb_part8',
-                'ext': 'f4v',
-                'title': '名侦探柯南第752集',
-            },
-        }],
-        'params': {
-            'skip_download': True,
+            'ext': 'mp4',
+            'title': '名侦探柯南 国语版：第752集 迫近灰原秘密的黑影 下篇',
          },
+        'skip': 'Geo-restricted to China',
      }, {
          'url': 'http://www.iqiyi.com/w_19rt6o8t9p.html',
          'only_matching': True,
@@ -250,22 +195,10 @@ class IqiyiIE(InfoExtractor):
          'url': 'http://www.iqiyi.com/v_19rrny4w8w.html',
          'info_dict': {
              'id': 'f3cf468b39dddb30d676f89a91200dc1',
+            'ext': 'mp4',
              'title': '泰坦尼克号',
          },
-        'playlist': [{
-            'info_dict': {
-                'id': 'f3cf468b39dddb30d676f89a91200dc1_part1',
-                'ext': 'f4v',
-                'title': '泰坦尼克号',
-            },
-        }, {
-            'info_dict': {
-                'id': 'f3cf468b39dddb30d676f89a91200dc1_part2',
-                'ext': 'f4v',
-                'title': '泰坦尼克号',
-            },
-        }],
-        'expected_warnings': ['Needs a VIP account for full video'],
+        'skip': 'Geo-restricted to China',
      }, {
          'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html',
          'info_dict': {
@@ -273,16 +206,21 @@ class IqiyiIE(InfoExtractor):
              'title': '灌篮高手 国语版',
          },
          'playlist_count': 101,
+    }, {
+        'url': 'http://www.pps.tv/w_19rrbav0ph.html',
+        'only_matching': True,
      }]
  
-    _FORMATS_MAP = [
-        ('1', 'h6'),
-        ('2', 'h5'),
-        ('3', 'h4'),
-        ('4', 'h3'),
-        ('5', 'h2'),
-        ('10', 'h1'),
-    ]
+    _FORMATS_MAP = {
+        '96': 1,    # 216p, 240p
+        '1': 2,     # 336p, 360p
+        '2': 3,     # 480p, 504p
+        '21': 4,    # 504p
+        '4': 5,     # 720p
+        '17': 5,    # 720p
+        '5': 6,     # 1072p, 1080p
+        '18': 7,    # 1080p
+    }
  
      def _real_initialize(self):
          self._login()
@@ -342,167 +280,23 @@ class IqiyiIE(InfoExtractor):
  
          return True
  
-    def _authenticate_vip_video(self, api_video_url, video_id, tvid, _uuid, do_report_warning):
-        auth_params = {
-            # version and platform hard-coded in com/qiyi/player/core/model/remote/AuthenticationRemote.as
-            'version': '2.0',
-            'platform': 'b6c13e26323c537d',
-            'aid': tvid,
-            'tvid': tvid,
-            'uid': '',
-            'deviceId': _uuid,
-            'playType': 'main',  # XXX: always main?
-            'filename': os.path.splitext(url_basename(api_video_url))[0],
-        }
-
-        qd_items = compat_parse_qs(compat_urllib_parse_urlparse(api_video_url).query)
-        for key, val in qd_items.items():
-            auth_params[key] = val[0]
-
-        auth_req = sanitized_Request(
-            'http://api.vip.iqiyi.com/services/ckn.action',
-            urlencode_postdata(auth_params))
-        # iQiyi server throws HTTP 405 error without the following header
-        auth_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        auth_result = self._download_json(
-            auth_req, video_id,
-            note='Downloading video authentication JSON',
-            errnote='Unable to download video authentication JSON')
-        if auth_result['code'] == 'Q00506':  # requires a VIP account
-            if do_report_warning:
-                self.report_warning('Needs a VIP account for full video')
-            return False
+    def get_raw_data(self, tvid, video_id):
+        tm = int(time.time() * 1000)
  
-        return auth_result
-
-    def construct_video_urls(self, data, video_id, _uuid, tvid):
-        def do_xor(x, y):
-            a = y % 3
-            if a == 1:
-                return x ^ 121
-            if a == 2:
-                return x ^ 72
-            return x ^ 103
-
-        def get_encode_code(l):
-            a = 0
-            b = l.split('-')
-            c = len(b)
-            s = ''
-            for i in range(c - 1, -1, -1):
-                a = do_xor(int(b[c - i - 1], 16), i)
-                s += chr(a)
-            return s[::-1]
-
-        def get_path_key(x, format_id, segment_index):
-            mg = ')(*&^flash@#$%a'
-            tm = self._download_json(
-                'http://data.video.qiyi.com/t?tn=' + str(random.random()), video_id,
-                note='Download path key of segment %d for format %s' % (segment_index + 1, format_id)
-            )['t']
-            t = str(int(math.floor(int(tm) / (600.0))))
-            return md5_text(t + mg + x)
-
-        video_urls_dict = {}
-        need_vip_warning_report = True
-        for format_item in data['vp']['tkl'][0]['vs']:
-            if 0 < int(format_item['bid']) <= 10:
-                format_id = self.get_format(format_item['bid'])
-            else:
-                continue
-
-            video_urls = []
-
-            video_urls_info = format_item['fs']
-            if not format_item['fs'][0]['l'].startswith('/'):
-                t = get_encode_code(format_item['fs'][0]['l'])
-                if t.endswith('mp4'):
-                    video_urls_info = format_item['flvs']
-
-            for segment_index, segment in enumerate(video_urls_info):
-                vl = segment['l']
-                if not vl.startswith('/'):
-                    vl = get_encode_code(vl)
-                is_vip_video = '/vip/' in vl
-                filesize = segment['b']
-                base_url = data['vp']['du'].split('/')
-                if not is_vip_video:
-                    key = get_path_key(
-                        vl.split('/')[-1].split('.')[0], format_id, segment_index)
-                    base_url.insert(-1, key)
-                base_url = '/'.join(base_url)
-                param = {
-                    'su': _uuid,
-                    'qyid': uuid.uuid4().hex,
-                    'client': '',
-                    'z': '',
-                    'bt': '',
-                    'ct': '',
-                    'tn': str(int(time.time()))
-                }
-                api_video_url = base_url + vl
-                if is_vip_video:
-                    api_video_url = api_video_url.replace('.f4v', '.hml')
-                    auth_result = self._authenticate_vip_video(
-                        api_video_url, video_id, tvid, _uuid, need_vip_warning_report)
-                    if auth_result is False:
-                        need_vip_warning_report = False
-                        break
-                    param.update({
-                        't': auth_result['data']['t'],
-                        # cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
-                        'cid': 'afbe8fd3d73448c9',
-                        'vid': video_id,
-                        'QY00001': auth_result['data']['u'],
-                    })
-                api_video_url += '?' if '?' not in api_video_url else '&'
-                api_video_url += compat_urllib_parse_urlencode(param)
-                js = self._download_json(
-                    api_video_url, video_id,
-                    note='Download video info of segment %d for format %s' % (segment_index + 1, format_id))
-                video_url = js['l']
-                video_urls.append(
-                    (video_url, filesize))
-
-            video_urls_dict[format_id] = video_urls
-        return video_urls_dict
-
-    def get_format(self, bid):
-        matched_format_ids = [_format_id for _bid, _format_id in self._FORMATS_MAP if _bid == str(bid)]
-        return matched_format_ids[0] if len(matched_format_ids) else None
-
-    def get_bid(self, format_id):
-        matched_bids = [_bid for _bid, _format_id in self._FORMATS_MAP if _format_id == format_id]
-        return matched_bids[0] if len(matched_bids) else None
-
-    def get_raw_data(self, tvid, video_id, enc_key, _uuid):
-        tm = str(int(time.time()))
-        tail = tm + tvid
-        param = {
-            'key': 'fvip',
-            'src': md5_text('youtube-dl'),
-            'tvId': tvid,
+        key = 'd5fb4bd9d50c4be6948c97edd7254b0e'
+        sc = md5_text(compat_str(tm) + key + tvid)
+        params = {
+            'tvid': tvid,
              'vid': video_id,
-            'vinfo': 1,
-            'tm': tm,
-            'enc': md5_text(enc_key + tail),
-            'qyid': _uuid,
-            'tn': random.random(),
-            'um': 0,
-            'authkey': md5_text(md5_text('') + tail),
-            'k_tag': 1,
+            'src': '76f90cbd92f94a2e925d83e8ccd22cb7',
+            'sc': sc,
+            't': tm,
          }
  
-        api_url = 'http://cache.video.qiyi.com/vms' + '?' + \
-            compat_urllib_parse_urlencode(param)
-        raw_data = self._download_json(api_url, video_id)
-        return raw_data
-
-    def get_enc_key(self, video_id):
-        # TODO: automatic key extraction
-        # last update at 2016-01-22 for Zombie::bite
-        enc_key = '4a1caba4b4465345366f28da7c117d20'
-        return enc_key
+        return self._download_json(
+            'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id),
+            video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='),
+            query=params, headers=self.geo_verification_headers())
  
      def _extract_playlist(self, webpage):
          PAGE_SIZE = 50
@@ -551,58 +345,41 @@ class IqiyiIE(InfoExtractor):
              r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid')
          video_id = self._search_regex(
              r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id')
-        _uuid = uuid.uuid4().hex
-
-        enc_key = self.get_enc_key(video_id)
-
-        raw_data = self.get_raw_data(tvid, video_id, enc_key, _uuid)
-
-        if raw_data['code'] != 'A000000':
-            raise ExtractorError('Unable to load data. Error code: ' + raw_data['code'])
-
-        data = raw_data['data']
-
-        title = data['vi']['vn']
-
-        # generate video_urls_dict
-        video_urls_dict = self.construct_video_urls(
-            data, video_id, _uuid, tvid)
-
-        # construct info
-        entries = []
-        for format_id in video_urls_dict:
-            video_urls = video_urls_dict[format_id]
-            for i, video_url_info in enumerate(video_urls):
-                if len(entries) < i + 1:
-                    entries.append({'formats': []})
-                entries[i]['formats'].append(
-                    {
-                        'url': video_url_info[0],
-                        'filesize': video_url_info[-1],
-                        'format_id': format_id,
-                        'preference': int(self.get_bid(format_id))
-                    }
-                )
-
-        for i in range(len(entries)):
-            self._sort_formats(entries[i]['formats'])
-            entries[i].update(
-                {
-                    'id': '%s_part%d' % (video_id, i + 1),
-                    'title': title,
-                }
-            )
-
-        if len(entries) > 1:
-            info = {
-                '_type': 'multi_video',
-                'id': video_id,
-                'title': title,
-                'entries': entries,
-            }
-        else:
-            info = entries[0]
-            info['id'] = video_id
-            info['title'] = title
-
-        return info
+
+        formats = []
+        for _ in range(5):
+            raw_data = self.get_raw_data(tvid, video_id)
+
+            if raw_data['code'] != 'A00000':
+                if raw_data['code'] == 'A00111':
+                    self.raise_geo_restricted()
+                raise ExtractorError('Unable to load data. Error code: ' + raw_data['code'])
+
+            data = raw_data['data']
+
+            for stream in data['vidl']:
+                if 'm3utx' not in stream:
+                    continue
+                vd = compat_str(stream['vd'])
+                formats.append({
+                    'url': stream['m3utx'],
+                    'format_id': vd,
+                    'ext': 'mp4',
+                    'preference': self._FORMATS_MAP.get(vd, -1),
+                    'protocol': 'm3u8_native',
+                })
+
+            if formats:
+                break
+
+            self._sleep(5, video_id)
+
+        self._sort_formats(formats)
+        title = (get_element_by_id('widget-videotitle', webpage) or
+                 clean_html(get_element_by_attribute('class', 'mod-play-tit', webpage)))
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py

index 472d72b4c34fa3305b6b2808be1e45c6da25a60e..7c8cb21c2c5619b4809f5daf8605958a808eccb9 100644 (file)
--- a/youtube_dl/extractor/ivi.py
+++ b/youtube_dl/extractor/ivi.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -8,7 +8,7 @@ from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
      int_or_none,
-    sanitized_Request,
+    qualities,
  )
  
  
@@ -49,11 +49,27 @@ class IviIE(InfoExtractor):
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
              'skip': 'Only works from Russia',
+        },
+        {
+            # with MP4-HD720 format
+            'url': 'http://www.ivi.ru/watch/146500',
+            'md5': 'd63d35cdbfa1ea61a5eafec7cc523e1e',
+            'info_dict': {
+                'id': '146500',
+                'ext': 'mp4',
+                'title': 'Кукла',
+                'description': 'md5:ffca9372399976a2d260a407cc74cce6',
+                'duration': 5599,
+                'thumbnail': 're:^https?://.*\.jpg$',
+            },
+            'skip': 'Only works from Russia',
          }
      ]
  
      # Sorted by quality
-    _KNOWN_FORMATS = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
+    _KNOWN_FORMATS = (
+        'MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi',
+        'MP4-SHQ', 'MP4-HD720', 'MP4-HD1080')
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -69,10 +85,9 @@ class IviIE(InfoExtractor):
              ]
          }
  
-        request = sanitized_Request(
-            'http://api.digitalaccess.ru/api/json/', json.dumps(data))
          video_json = self._download_json(
-            request, video_id, 'Downloading video JSON')
+            'http://api.digitalaccess.ru/api/json/', video_id,
+            'Downloading video JSON', data=json.dumps(data))
  
          if 'error' in video_json:
              error = video_json['error']
@@ -84,11 +99,13 @@ class IviIE(InfoExtractor):
  
          result = video_json['result']
  
+        quality = qualities(self._KNOWN_FORMATS)
+
          formats = [{
              'url': x['url'],
-            'format_id': x['content_format'],
-            'preference': self._KNOWN_FORMATS.index(x['content_format']),
-        } for x in result['files'] if x['content_format'] in self._KNOWN_FORMATS]
+            'format_id': x.get('content_format'),
+            'quality': quality(x.get('content_format')),
+        } for x in result['files'] if x.get('url')]
  
          self._sort_formats(formats)
  
@@ -115,7 +132,7 @@ class IviIE(InfoExtractor):
              webpage, 'season number', default=None))
  
          episode_number = int_or_none(self._search_regex(
-            r'<meta[^>]+itemprop="episode"[^>]*>\s*<meta[^>]+itemprop="episodeNumber"[^>]+content="(\d+)',
+            r'[^>]+itemprop="episode"[^>]*>\s*<meta[^>]+itemprop="episodeNumber"[^>]+content="(\d+)',
              webpage, 'episode number', default=None))
  
          description = self._og_search_description(webpage, default=None) or self._html_search_meta(
diff --git a/youtube_dl/extractor/iwara.py b/youtube_dl/extractor/iwara.py

new file mode 100644 (file)

index 0000000..8d7e7f4
--- /dev/null
+++ b/youtube_dl/extractor/iwara.py
@@ -0,0 +1,77 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_urlparse
+from ..utils import remove_end
+
+
+class IwaraIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.|ecchi\.)?iwara\.tv/videos/(?P<id>[a-zA-Z0-9]+)'
+    _TESTS = [{
+        'url': 'http://iwara.tv/videos/amVwUl1EHpAD9RD',
+        'md5': '1d53866b2c514b23ed69e4352fdc9839',
+        'info_dict': {
+            'id': 'amVwUl1EHpAD9RD',
+            'ext': 'mp4',
+            'title': '【MMD R-18】ガールフレンド carry_me_off',
+            'age_limit': 18,
+        },
+    }, {
+        'url': 'http://ecchi.iwara.tv/videos/Vb4yf2yZspkzkBO',
+        'md5': '7e5f1f359cd51a027ba4a7b7710a50f0',
+        'info_dict': {
+            'id': '0B1LvuHnL-sRFNXB1WHNqbGw4SXc',
+            'ext': 'mp4',
+            'title': '[3D Hentai] Kyonyu Ã\x97 Genkai Ã\x97 Emaki Shinobi Girls.mp4',
+            'age_limit': 18,
+        },
+        'add_ie': ['GoogleDrive'],
+    }, {
+        'url': 'http://www.iwara.tv/videos/nawkaumd6ilezzgq',
+        'md5': '1d85f1e5217d2791626cff5ec83bb189',
+        'info_dict': {
+            'id': '6liAP9s2Ojc',
+            'ext': 'mp4',
+            'age_limit': 0,
+            'title': '[MMD] Do It Again Ver.2 [1080p 60FPS] (Motion,Camera,Wav+DL)',
+            'description': 'md5:590c12c0df1443d833fbebe05da8c47a',
+            'upload_date': '20160910',
+            'uploader': 'aMMDsork',
+            'uploader_id': 'UCVOFyOSCyFkXTYYHITtqB7A',
+        },
+        'add_ie': ['Youtube'],
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage, urlh = self._download_webpage_handle(url, video_id)
+
+        hostname = compat_urllib_parse_urlparse(urlh.geturl()).hostname
+        # ecchi is 'sexy' in Japanese
+        age_limit = 18 if hostname.split('.')[0] == 'ecchi' else 0
+
+        entries = self._parse_html5_media_entries(url, webpage, video_id)
+
+        if not entries:
+            iframe_url = self._html_search_regex(
+                r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1',
+                webpage, 'iframe URL', group='url')
+            return {
+                '_type': 'url_transparent',
+                'url': iframe_url,
+                'age_limit': age_limit,
+            }
+
+        title = remove_end(self._html_search_regex(
+            r'<title>([^<]+)</title>', webpage, 'title'), ' | Iwara')
+
+        info_dict = entries[0]
+        info_dict.update({
+            'id': video_id,
+            'title': title,
+            'age_limit': age_limit,
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/izlesene.py b/youtube_dl/extractor/izlesene.py

index bc226fa67c064b991674a510b1eba54d40dc67e0..aa0728abc0155fa6abbe8e2a88de18dd89d85138 100644 (file)
--- a/youtube_dl/extractor/izlesene.py
+++ b/youtube_dl/extractor/izlesene.py
@@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
                  'description': 'md5:253753e2655dde93f59f74b572454f6d',
-                'thumbnail': 're:^http://.*\.jpg',
+                'thumbnail': 're:^https?://.*\.jpg',
                  'uploader_id': 'pelikzzle',
                  'timestamp': int,
                  'upload_date': '20140702',
@@ -44,8 +44,7 @@ class IzleseneIE(InfoExtractor):
                  'id': '17997',
                  'ext': 'mp4',
                  'title': 'Tarkan Dortmund 2006 Konseri',
-                'description': 'Tarkan Dortmund 2006 Konseri',
-                'thumbnail': 're:^http://.*\.jpg',
+                'thumbnail': 're:^https://.*\.jpg',
                  'uploader_id': 'parlayankiz',
                  'timestamp': int,
                  'upload_date': '20061112',
@@ -62,7 +61,7 @@ class IzleseneIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
+        description = self._og_search_description(webpage, default=None)
          thumbnail = self._proto_relative_url(
              self._og_search_thumbnail(webpage), scheme='http:')
  
diff --git a/youtube_dl/extractor/jadorecettepub.py b/youtube_dl/extractor/jadorecettepub.py

deleted file mode 100644 (file)

index 158c09a..0000000
--- a/youtube_dl/extractor/jadorecettepub.py
+++ /dev/null
@@ -1,47 +0,0 @@
-# coding: utf-8
-
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from .youtube import YoutubeIE
-
-
-class JadoreCettePubIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
-
-    _TEST = {
-        'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',
-        'md5': '401286a06067c70b44076044b66515de',
-        'info_dict': {
-            'id': 'jLMja3tr7a4',
-            'ext': 'mp4',
-            'title': 'La pire utilisation de Star Wars',
-            'description': "Jadorecettepub.com vous a gratifié de plusieurs pubs géniales utilisant Star Wars et Dark Vador plus particulièrement... Mais l'heure est venue de vous proposer une version totalement massacrée, venue du Japon.  Quand les Japonais détruisent l'image de Star Wars pour vendre du thon en boite, ça promet...",
-        },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('id')
-
-        webpage = self._download_webpage(url, display_id)
-
-        title = self._html_search_regex(
-            r'<span style="font-size: x-large;"><b>(.*?)</b></span>',
-            webpage, 'title')
-        description = self._html_search_regex(
-            r'(?s)<div id="fb-root">(.*?)<script>', webpage, 'description',
-            fatal=False)
-        real_url = self._search_regex(
-            r'\[/postlink\](.*)endofvid', webpage, 'video URL')
-        video_id = YoutubeIE.extract_id(real_url)
-
-        return {
-            '_type': 'url_transparent',
-            'url': real_url,
-            'id': video_id,
-            'title': title,
-            'description': description,
-        }
diff --git a/youtube_dl/extractor/jamendo.py b/youtube_dl/extractor/jamendo.py

new file mode 100644 (file)

index 0000000..ee9acac
--- /dev/null
+++ b/youtube_dl/extractor/jamendo.py
@@ -0,0 +1,107 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from ..compat import compat_urlparse
+from .common import InfoExtractor
+
+
+class JamendoIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)'
+    _TEST = {
+        'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
+        'md5': '6e9e82ed6db98678f171c25a8ed09ffd',
+        'info_dict': {
+            'id': '196219',
+            'display_id': 'stories-from-emona-i',
+            'ext': 'flac',
+            'title': 'Stories from Emona I',
+            'thumbnail': 're:^https?://.*\.jpg'
+        }
+    }
+
+    def _real_extract(self, url):
+        mobj = self._VALID_URL_RE.match(url)
+        track_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        webpage = self._download_webpage(url, display_id)
+
+        title = self._html_search_meta('name', webpage, 'title')
+
+        formats = [{
+            'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
+                   % (sub_domain, track_id, format_id),
+            'format_id': format_id,
+            'ext': ext,
+            'quality': quality,
+        } for quality, (format_id, sub_domain, ext) in enumerate((
+            ('mp31', 'mp3l', 'mp3'),
+            ('mp32', 'mp3d', 'mp3'),
+            ('ogg1', 'ogg', 'ogg'),
+            ('flac', 'flac', 'flac'),
+        ))]
+        self._sort_formats(formats)
+
+        thumbnail = self._html_search_meta(
+            'image', webpage, 'thumbnail', fatal=False)
+
+        return {
+            'id': track_id,
+            'display_id': display_id,
+            'thumbnail': thumbnail,
+            'title': title,
+            'formats': formats
+        }
+
+
+class JamendoAlbumIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)/(?P<display_id>[\w-]+)'
+    _TEST = {
+        'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
+        'info_dict': {
+            'id': '121486',
+            'title': 'Duck On Cover'
+        },
+        'playlist': [{
+            'md5': 'e1a2fcb42bda30dfac990212924149a8',
+            'info_dict': {
+                'id': '1032333',
+                'ext': 'flac',
+                'title': 'Warmachine'
+            }
+        }, {
+            'md5': '1f358d7b2f98edfe90fd55dac0799d50',
+            'info_dict': {
+                'id': '1032330',
+                'ext': 'flac',
+                'title': 'Without Your Ghost'
+            }
+        }],
+        'params': {
+            'playlistend': 2
+        }
+    }
+
+    def _real_extract(self, url):
+        mobj = self._VALID_URL_RE.match(url)
+        album_id = mobj.group('id')
+
+        webpage = self._download_webpage(url, mobj.group('display_id'))
+
+        title = self._html_search_meta('name', webpage, 'title')
+
+        entries = [
+            self.url_result(
+                compat_urlparse.urljoin(url, m.group('path')),
+                ie=JamendoIE.ie_key(),
+                video_id=self._search_regex(
+                    r'/track/(\d+)', m.group('path'),
+                    'track id', default=None))
+            for m in re.finditer(
+                r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link',
+                webpage)
+        ]
+
+        return self.playlist_result(entries, album_id, title)
diff --git a/youtube_dl/extractor/jpopsukitv.py b/youtube_dl/extractor/jpopsukitv.py

index 122e2dd8cad8c9fba6d861a80d77752e1b508301..4b5f346d1ef909e286b5c0555ab07a6e20bc11d4 100644 (file)
--- a/youtube_dl/extractor/jpopsukitv.py
+++ b/youtube_dl/extractor/jpopsukitv.py
@@ -1,4 +1,4 @@
-# coding=utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/jwplatform.py b/youtube_dl/extractor/jwplatform.py

index 6770685d7027c3738fba35f3e057f6be2a3a512c..5d56e0a28bd55b93153a92446834ba440ad59572 100644 (file)
--- a/youtube_dl/extractor/jwplatform.py
+++ b/youtube_dl/extractor/jwplatform.py
@@ -4,46 +4,131 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..compat import compat_urlparse
+from ..utils import (
+    determine_ext,
+    float_or_none,
+    int_or_none,
+    js_to_json,
+    mimetype2ext,
+)
  
  
  class JWPlatformBaseIE(InfoExtractor):
-    def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True):
-        video_data = jwplayer_data['playlist'][0]
-        subtitles = {}
-        for track in video_data['tracks']:
-            if track['kind'] == 'captions':
-                subtitles[track['label']] = [{'url': self._proto_relative_url(track['file'])}]
-
-        formats = []
-        for source in video_data['sources']:
-            source_url = self._proto_relative_url(source['file'])
-            source_type = source.get('type') or ''
-            if source_type in ('application/vnd.apple.mpegurl', 'hls'):
-                formats.extend(self._extract_m3u8_formats(
-                    source_url, video_id, 'mp4', 'm3u8_native', fatal=False))
-            elif source_type.startswith('audio'):
-                formats.append({
-                    'url': source_url,
-                    'vcodec': 'none',
-                })
-            else:
-                formats.append({
-                    'url': source_url,
-                    'width': int_or_none(source.get('width')),
-                    'height': int_or_none(source.get('height')),
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': video_data['title'] if require_title else video_data.get('title'),
-            'description': video_data.get('description'),
-            'thumbnail': self._proto_relative_url(video_data.get('image')),
-            'timestamp': int_or_none(video_data.get('pubdate')),
-            'subtitles': subtitles,
-            'formats': formats,
-        }
+    @staticmethod
+    def _find_jwplayer_data(webpage):
+        # TODO: Merge this with JWPlayer-related codes in generic.py
+
+        mobj = re.search(
+            r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
+            webpage)
+        if mobj:
+            return mobj.group('options')
+
+    def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
+        jwplayer_data = self._parse_json(
+            self._find_jwplayer_data(webpage), video_id,
+            transform_source=js_to_json)
+        return self._parse_jwplayer_data(
+            jwplayer_data, video_id, *args, **kwargs)
+
+    def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
+                             m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
+        # JWPlayer backward compatibility: flattened playlists
+        # https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
+        if 'playlist' not in jwplayer_data:
+            jwplayer_data = {'playlist': [jwplayer_data]}
+
+        entries = []
+
+        # JWPlayer backward compatibility: single playlist item
+        # https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
+        if not isinstance(jwplayer_data['playlist'], list):
+            jwplayer_data['playlist'] = [jwplayer_data['playlist']]
+
+        for video_data in jwplayer_data['playlist']:
+            # JWPlayer backward compatibility: flattened sources
+            # https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
+            if 'sources' not in video_data:
+                video_data['sources'] = [video_data]
+
+            this_video_id = video_id or video_data['mediaid']
+
+            formats = []
+            for source in video_data['sources']:
+                source_url = self._proto_relative_url(source['file'])
+                if base_url:
+                    source_url = compat_urlparse.urljoin(base_url, source_url)
+                source_type = source.get('type') or ''
+                ext = mimetype2ext(source_type) or determine_ext(source_url)
+                if source_type == 'hls' or ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
+                elif ext == 'mpd':
+                    formats.extend(self._extract_mpd_formats(
+                        source_url, this_video_id, mpd_id=mpd_id, fatal=False))
+                # https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
+                elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
+                    formats.append({
+                        'url': source_url,
+                        'vcodec': 'none',
+                        'ext': ext,
+                    })
+                else:
+                    height = int_or_none(source.get('height'))
+                    if height is None:
+                        # Often no height is provided but there is a label in
+                        # format like 1080p.
+                        height = int_or_none(self._search_regex(
+                            r'^(\d{3,})[pP]$', source.get('label') or '',
+                            'height', default=None))
+                    a_format = {
+                        'url': source_url,
+                        'width': int_or_none(source.get('width')),
+                        'height': height,
+                        'ext': ext,
+                    }
+                    if source_url.startswith('rtmp'):
+                        a_format['ext'] = 'flv'
+
+                        # See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
+                        # of jwplayer.flash.swf
+                        rtmp_url_parts = re.split(
+                            r'((?:mp4|mp3|flv):)', source_url, 1)
+                        if len(rtmp_url_parts) == 3:
+                            rtmp_url, prefix, play_path = rtmp_url_parts
+                            a_format.update({
+                                'url': rtmp_url,
+                                'play_path': prefix + play_path,
+                            })
+                        if rtmp_params:
+                            a_format.update(rtmp_params)
+                    formats.append(a_format)
+            self._sort_formats(formats)
+
+            subtitles = {}
+            tracks = video_data.get('tracks')
+            if tracks and isinstance(tracks, list):
+                for track in tracks:
+                    if track.get('file') and track.get('kind') == 'captions':
+                        subtitles.setdefault(track.get('label') or 'en', []).append({
+                            'url': self._proto_relative_url(track['file'])
+                        })
+
+            entries.append({
+                'id': this_video_id,
+                'title': video_data['title'] if require_title else video_data.get('title'),
+                'description': video_data.get('description'),
+                'thumbnail': self._proto_relative_url(video_data.get('image')),
+                'timestamp': int_or_none(video_data.get('pubdate')),
+                'duration': float_or_none(jwplayer_data.get('duration')),
+                'subtitles': subtitles,
+                'formats': formats,
+            })
+        if len(entries) == 1:
+            return entries[0]
+        else:
+            return self.playlist_result(entries)
  
  
  class JWPlatformIE(JWPlatformBaseIE):
diff --git a/youtube_dl/extractor/kaltura.py b/youtube_dl/extractor/kaltura.py

index a65697ff558864f36cc5e8b8f82f959b19ea16fc..91bc3a0a7c0af4690cf1a16713de1e76bccaa67a 100644 (file)
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -6,7 +6,6 @@ import base64
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse_urlencode,
      compat_urlparse,
      compat_parse_qs,
  )
@@ -15,6 +14,7 @@ from ..utils import (
      ExtractorError,
      int_or_none,
      unsmuggle_url,
+    smuggle_url,
  )
  
  
@@ -34,7 +34,14 @@ class KalturaIE(InfoExtractor):
                          )(?:/(?P<path>[^?]+))?(?:\?(?P<query>.*))?
                  )
                  '''
-    _API_BASE = 'http://cdnapi.kaltura.com/api_v3/index.php?'
+    _SERVICE_URL = 'http://cdnapi.kaltura.com'
+    _SERVICE_BASE = '/api_v3/index.php'
+    # See https://github.com/kaltura/server/blob/master/plugins/content/caption/base/lib/model/enums/CaptionType.php
+    _CAPTION_TYPES = {
+        1: 'srt',
+        2: 'ttml',
+        3: 'vtt',
+    }
      _TESTS = [
          {
              'url': 'kaltura:269692:1_1jc2y3e4',
@@ -61,19 +68,79 @@ class KalturaIE(InfoExtractor):
          {
              'url': 'https://cdnapisec.kaltura.com/html5/html5lib/v2.30.2/mwEmbedFrame.php/p/1337/uiconf_id/20540612/entry_id/1_sf5ovm7u?wid=_243342',
              'only_matching': True,
+        },
+        {
+            # video with subtitles
+            'url': 'kaltura:111032:1_cw786r8q',
+            'only_matching': True,
+        },
+        {
+            # video with ttml subtitles (no fileExt)
+            'url': 'kaltura:1926081:0_l5ye1133',
+            'info_dict': {
+                'id': '0_l5ye1133',
+                'ext': 'mp4',
+                'title': 'What Can You Do With Python?',
+                'upload_date': '20160221',
+                'uploader_id': 'stork',
+                'thumbnail': 're:^https?://.*/thumbnail/.*',
+                'timestamp': int,
+                'subtitles': {
+                    'en': [{
+                        'ext': 'ttml',
+                    }],
+                },
+            },
+            'params': {
+                'skip_download': True,
+            },
          }
      ]
  
-    def _kaltura_api_call(self, video_id, actions, *args, **kwargs):
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = (
+            re.search(
+                r"""(?xs)
+                    kWidget\.(?:thumb)?[Ee]mbed\(
+                    \{.*?
+                        (?P<q1>['\"])wid(?P=q1)\s*:\s*
+                        (?P<q2>['\"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
+                        (?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
+                        (?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4),
+                """, webpage) or
+            re.search(
+                r'''(?xs)
+                    (?P<q1>["\'])
+                        (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/(?:(?!(?P=q1)).)*(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
+                    (?P=q1).*?
+                    (?:
+                        entry_?[Ii]d|
+                        (?P<q2>["\'])entry_?[Ii]d(?P=q2)
+                    )\s*:\s*
+                    (?P<q3>["\'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
+                ''', webpage))
+        if mobj:
+            embed_info = mobj.groupdict()
+            url = 'kaltura:%(partner_id)s:%(id)s' % embed_info
+            escaped_pid = re.escape(embed_info['partner_id'])
+            service_url = re.search(
+                r'<script[^>]+src=["\']((?:https?:)?//.+?)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
+                webpage)
+            if service_url:
+                url = smuggle_url(url, {'service_url': service_url.group(1)})
+            return url
+
+    def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
          params = actions[0]
          if len(actions) > 1:
              for i, a in enumerate(actions[1:], start=1):
                  for k, v in a.items():
                      params['%d:%s' % (i, k)] = v
  
-        query = compat_urllib_parse_urlencode(params)
-        url = self._API_BASE + query
-        data = self._download_json(url, video_id, *args, **kwargs)
+        data = self._download_json(
+            (service_url or self._SERVICE_URL) + self._SERVICE_BASE,
+            video_id, query=params, *args, **kwargs)
  
          status = data if len(actions) == 1 else data[0]
          if status.get('objectType') == 'KalturaAPIException':
@@ -82,20 +149,7 @@ class KalturaIE(InfoExtractor):
  
          return data
  
-    def _get_kaltura_signature(self, video_id, partner_id):
-        actions = [{
-            'apiVersion': '3.1',
-            'expiry': 86400,
-            'format': 1,
-            'service': 'session',
-            'action': 'startWidgetSession',
-            'widgetId': '_%s' % partner_id,
-        }]
-        return self._kaltura_api_call(
-            video_id, actions, note='Downloading Kaltura signature')['ks']
-
-    def _get_video_info(self, video_id, partner_id):
-        signature = self._get_kaltura_signature(video_id, partner_id)
+    def _get_video_info(self, video_id, partner_id, service_url=None):
          actions = [
              {
                  'action': 'null',
@@ -103,22 +157,34 @@ class KalturaIE(InfoExtractor):
                  'clientTag': 'kdp:v3.8.5',
                  'format': 1,  # JSON, 2 = XML, 3 = PHP
                  'service': 'multirequest',
-                'ks': signature,
+            },
+            {
+                'expiry': 86400,
+                'service': 'session',
+                'action': 'startWidgetSession',
+                'widgetId': '_%s' % partner_id,
              },
              {
                  'action': 'get',
                  'entryId': video_id,
                  'service': 'baseentry',
-                'version': '-1',
+                'ks': '{1:result:ks}',
              },
              {
                  'action': 'getbyentryid',
                  'entryId': video_id,
                  'service': 'flavorAsset',
+                'ks': '{1:result:ks}',
+            },
+            {
+                'action': 'list',
+                'filter:entryIdEqual': video_id,
+                'service': 'caption_captionasset',
+                'ks': '{1:result:ks}',
              },
          ]
          return self._kaltura_api_call(
-            video_id, actions, note='Downloading video info JSON')
+            video_id, actions, service_url, note='Downloading video info JSON')
  
      def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
@@ -126,8 +192,9 @@ class KalturaIE(InfoExtractor):
          mobj = re.match(self._VALID_URL, url)
          partner_id, entry_id = mobj.group('partner_id', 'id')
          ks = None
+        captions = None
          if partner_id and entry_id:
-            info, flavor_assets = self._get_video_info(entry_id, partner_id)
+            _, info, flavor_assets, captions = self._get_video_info(entry_id, partner_id, smuggled_data.get('service_url'))
          else:
              path, query = mobj.group('path', 'query')
              if not path and not query:
@@ -146,7 +213,7 @@ class KalturaIE(InfoExtractor):
                  raise ExtractorError('Invalid URL', expected=True)
              if 'entry_id' in params:
                  entry_id = params['entry_id'][0]
-                info, flavor_assets = self._get_video_info(entry_id, partner_id)
+                _, info, flavor_assets, captions = self._get_video_info(entry_id, partner_id)
              elif 'uiconf_id' in params and 'flashvars[referenceId]' in params:
                  reference_id = params['flashvars[referenceId]'][0]
                  webpage = self._download_webpage(url, reference_id)
@@ -156,6 +223,17 @@ class KalturaIE(InfoExtractor):
                      reference_id)['entryResult']
                  info, flavor_assets = entry_data['meta'], entry_data['contextData']['flavorAssets']
                  entry_id = info['id']
+                # Unfortunately, data returned in kalturaIframePackageData lacks
+                # captions so we will try requesting the complete data using
+                # regular approach since we now know the entry_id
+                try:
+                    _, info, flavor_assets, captions = self._get_video_info(
+                        entry_id, partner_id)
+                except ExtractorError:
+                    # Regular scenario failed but we already have everything
+                    # extracted apart from captions and can process at least
+                    # with this
+                    pass
              else:
                  raise ExtractorError('Invalid URL', expected=True)
              ks = params.get('flashvars[ks]', [None])[0]
@@ -175,12 +253,25 @@ class KalturaIE(InfoExtractor):
                  unsigned_url += '?referrer=%s' % referrer
              return unsigned_url
  
+        data_url = info['dataUrl']
+        if '/flvclipper/' in data_url:
+            data_url = re.sub(r'/flvclipper/.*', '/serveFlavor', data_url)
+
          formats = []
          for f in flavor_assets:
              # Continue if asset is not ready
-            if f['status'] != 2:
+            if f.get('status') != 2:
                  continue
-            video_url = sign_url('%s/flavorId/%s' % (info['dataUrl'], f['id']))
+            # Original format that's not available (e.g. kaltura:1926081:0_c03e1b5g)
+            # skip for now.
+            if f.get('fileExt') == 'chun':
+                continue
+            video_url = sign_url(
+                '%s/flavorId/%s' % (data_url, f['id']))
+            # audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
+            # -f mp4-56)
+            vcodec = 'none' if 'videoCodecId' not in f and f.get(
+                'frameRate') == 0 else f.get('videoCodecId')
              formats.append({
                  'format_id': '%(fileExt)s-%(bitrate)s' % f,
                  'ext': f.get('fileExt'),
@@ -188,22 +279,39 @@ class KalturaIE(InfoExtractor):
                  'fps': int_or_none(f.get('frameRate')),
                  'filesize_approx': int_or_none(f.get('size'), invscale=1024),
                  'container': f.get('containerFormat'),
-                'vcodec': f.get('videoCodecId'),
+                'vcodec': vcodec,
                  'height': int_or_none(f.get('height')),
                  'width': int_or_none(f.get('width')),
                  'url': video_url,
              })
-        m3u8_url = sign_url(info['dataUrl'].replace('format/url', 'format/applehttp'))
-        formats.extend(self._extract_m3u8_formats(
-            m3u8_url, entry_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+        if '/playManifest/' in data_url:
+            m3u8_url = sign_url(data_url.replace(
+                'format/url', 'format/applehttp'))
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, entry_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False))
  
-        self._check_formats(formats, entry_id)
          self._sort_formats(formats)
  
+        subtitles = {}
+        if captions:
+            for caption in captions.get('objects', []):
+                # Continue if caption is not ready
+                if f.get('status') != 2:
+                    continue
+                if not caption.get('id'):
+                    continue
+                caption_format = int_or_none(caption.get('format'))
+                subtitles.setdefault(caption.get('languageCode') or caption.get('language'), []).append({
+                    'url': '%s/api_v3/service/caption_captionasset/action/serve/captionAssetId/%s' % (self._SERVICE_URL, caption['id']),
+                    'ext': caption.get('fileExt') or self._CAPTION_TYPES.get(caption_format) or 'ttml',
+                })
+
          return {
              'id': entry_id,
              'title': info['name'],
              'formats': formats,
+            'subtitles': subtitles,
              'description': clean_html(info.get('description')),
              'thumbnail': info.get('thumbnailUrl'),
              'duration': info.get('duration'),
diff --git a/youtube_dl/extractor/kamcord.py b/youtube_dl/extractor/kamcord.py

new file mode 100644 (file)

index 0000000..b50120d
--- /dev/null
+++ b/youtube_dl/extractor/kamcord.py
@@ -0,0 +1,71 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    int_or_none,
+    qualities,
+)
+
+
+class KamcordIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
+        'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
+        'info_dict': {
+            'id': 'hNYRduDgWb4',
+            'ext': 'mp4',
+            'title': 'Drinking Madness',
+            'uploader': 'jacksfilms',
+            'uploader_id': '3044562',
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        video = self._parse_json(
+            self._search_regex(
+                r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
+                webpage, 'video'),
+            video_id)['video']
+
+        title = video['title']
+
+        formats = self._extract_m3u8_formats(
+            video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
+        self._sort_formats(formats)
+
+        uploader = video.get('user', {}).get('username')
+        uploader_id = video.get('user', {}).get('id')
+
+        view_count = int_or_none(video.get('viewCount'))
+        like_count = int_or_none(video.get('heartCount'))
+        comment_count = int_or_none(video.get('messageCount'))
+
+        preference_key = qualities(('small', 'medium', 'large'))
+
+        thumbnails = [{
+            'url': thumbnail_url,
+            'id': thumbnail_id,
+            'preference': preference_key(thumbnail_id),
+        } for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
+            if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'view_count': view_count,
+            'like_count': like_count,
+            'comment_count': comment_count,
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/karaoketv.py b/youtube_dl/extractor/karaoketv.py

index b4c30b7f3145fef78ec107d402c97927f1a8ad2e..bfccf89b0fda0be1100764290681a53e022947e0 100644 (file)
--- a/youtube_dl/extractor/karaoketv.py
+++ b/youtube_dl/extractor/karaoketv.py
@@ -2,39 +2,63 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote_plus
-from ..utils import (
-    js_to_json,
-)
  
  
  class KaraoketvIE(InfoExtractor):
-    _VALID_URL = r'https?://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?karaoketv\.co\.il/[^/]+/(?P<id>\d+)'
      _TEST = {
-        'url': 'http://karaoketv.co.il/?container=songs&id=171568',
+        'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
          'info_dict': {
-            'id': '171568',
-            'ext': 'mp4',
-            'title': 'אל העולם שלך - רותם כהן - שרים קריוקי',
+            'id': '58356',
+            'ext': 'flv',
+            'title': 'קריוקי של איזון',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+
          webpage = self._download_webpage(url, video_id)
+        api_page_url = self._search_regex(
+            r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.karaoke\.co\.il/api_play\.php\?.+?)\1',
+            webpage, 'API play URL', group='url')
+
+        api_page = self._download_webpage(api_page_url, video_id)
+        video_cdn_url = self._search_regex(
+            r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.video-cdn\.com/embed/iframe/.+?)\1',
+            api_page, 'video cdn URL', group='url')
+
+        video_cdn = self._download_webpage(video_cdn_url, video_id)
+        play_path = self._parse_json(
+            self._search_regex(
+                r'var\s+options\s*=\s*({.+?});', video_cdn, 'options'),
+            video_id)['clip']['url']
  
-        page_video_url = self._og_search_video_url(webpage, video_id)
-        config_json = compat_urllib_parse_unquote_plus(self._search_regex(
-            r'config=(.*)', page_video_url, 'configuration'))
+        settings = self._parse_json(
+            self._search_regex(
+                r'var\s+settings\s*=\s*({.+?});', video_cdn, 'servers', default='{}'),
+            video_id, fatal=False) or {}
  
-        urls_info_json = self._download_json(
-            config_json, video_id, 'Downloading configuration',
-            transform_source=js_to_json)
+        servers = settings.get('servers')
+        if not servers or not isinstance(servers, list):
+            servers = ('wowzail.video-cdn.com:80/vodcdn', )
  
-        url = urls_info_json['playlist'][0]['url']
+        formats = [{
+            'url': 'rtmp://%s' % server if not server.startswith('rtmp') else server,
+            'play_path': play_path,
+            'app': 'vodcdn',
+            'page_url': video_cdn_url,
+            'player_url': 'http://www.video-cdn.com/assets/flowplayer/flowplayer.commercial-3.2.18.swf',
+            'rtmp_real_time': True,
+            'ext': 'flv',
+        } for server in servers]
  
          return {
              'id': video_id,
              'title': self._og_search_title(webpage),
-            'url': url,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/karrierevideos.py b/youtube_dl/extractor/karrierevideos.py

index 2cb04e533d2e5c7caf5d3be062b9c0a51635cb1c..c05263e6165159320376939c252af7dea7aeadb2 100644 (file)
--- a/youtube_dl/extractor/karrierevideos.py
+++ b/youtube_dl/extractor/karrierevideos.py
@@ -52,9 +52,12 @@ class KarriereVideosIE(InfoExtractor):
  
          video_id = self._search_regex(
              r'/config/video/(.+?)\.xml', webpage, 'video id')
+        # Server returns malformed headers
+        # Force Accept-Encoding: * to prevent gzipped results
          playlist = self._download_xml(
              'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id,
-            video_id, transform_source=fix_xml_ampersands)
+            video_id, transform_source=fix_xml_ampersands,
+            headers={'Accept-Encoding': '*'})
  
          NS_MAP = {
              'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'
diff --git a/youtube_dl/extractor/keezmovies.py b/youtube_dl/extractor/keezmovies.py

index 126ca13df1b8c30e9d94b204beb54eab03644fea..588a4d0ec4eda6e38817b26f192536c40a172f3e 100644 (file)
--- a/youtube_dl/extractor/keezmovies.py
+++ b/youtube_dl/extractor/keezmovies.py
@@ -3,64 +3,126 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..aes import aes_decrypt_text
+from ..compat import (
+    compat_str,
+    compat_urllib_parse_unquote,
+)
  from ..utils import (
-    sanitized_Request,
-    url_basename,
+    determine_ext,
+    ExtractorError,
+    int_or_none,
+    str_to_int,
+    strip_or_none,
  )
  
  
  class KeezMoviesIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/.+?(?P<id>[0-9]+)(?:[/?&]|$)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/(?:(?P<display_id>[^/]+)-)?(?P<id>\d+)'
+    _TESTS = [{
          'url': 'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
          'md5': '1c1e75d22ffa53320f45eeb07bc4cdc0',
          'info_dict': {
              'id': '1214711',
+            'display_id': 'petite-asian-lady-mai-playing-in-bathtub',
              'ext': 'mp4',
              'title': 'Petite Asian Lady Mai Playing In Bathtub',
-            'age_limit': 18,
              'thumbnail': 're:^https?://.*\.jpg$',
+            'view_count': int,
+            'age_limit': 18,
          }
-    }
+    }, {
+        'url': 'http://www.keezmovies.com/video/1214711',
+        'only_matching': True,
+    }]
  
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
+    def _extract_info(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = (mobj.group('display_id')
+                      if 'display_id' in mobj.groupdict()
+                      else None) or mobj.group('id')
  
-        req = sanitized_Request(url)
-        req.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(req, video_id)
+        webpage = self._download_webpage(
+            url, display_id, headers={'Cookie': 'age_verified=1'})
  
-        # embedded video
-        mobj = re.search(r'href="([^"]+)"></iframe>', webpage)
-        if mobj:
-            embedded_url = mobj.group(1)
-            return self.url_result(embedded_url)
+        formats = []
+        format_urls = set()
  
-        video_title = self._html_search_regex(
-            r'<h1 [^>]*>([^<]+)', webpage, 'title')
-        flashvars = self._parse_json(self._search_regex(
-            r'var\s+flashvars\s*=\s*([^;]+);', webpage, 'flashvars'), video_id)
+        title = None
+        thumbnail = None
+        duration = None
+        encrypted = False
  
-        formats = []
-        for height in (180, 240, 480):
-            if flashvars.get('quality_%dp' % height):
-                video_url = flashvars['quality_%dp' % height]
-                a_format = {
-                    'url': video_url,
-                    'height': height,
-                    'format_id': '%dp' % height,
-                }
-                filename_parts = url_basename(video_url).split('_')
-                if len(filename_parts) >= 2 and re.match(r'\d+[Kk]', filename_parts[1]):
-                    a_format['tbr'] = int(filename_parts[1][:-1])
-                formats.append(a_format)
-
-        age_limit = self._rta_search(webpage)
-
-        return {
+        def extract_format(format_url, height=None):
+            if not isinstance(format_url, compat_str) or not format_url.startswith('http'):
+                return
+            if format_url in format_urls:
+                return
+            format_urls.add(format_url)
+            tbr = int_or_none(self._search_regex(
+                r'[/_](\d+)[kK][/_]', format_url, 'tbr', default=None))
+            if not height:
+                height = int_or_none(self._search_regex(
+                    r'[/_](\d+)[pP][/_]', format_url, 'height', default=None))
+            if encrypted:
+                format_url = aes_decrypt_text(
+                    video_url, title, 32).decode('utf-8')
+            formats.append({
+                'url': format_url,
+                'format_id': '%dp' % height if height else None,
+                'height': height,
+                'tbr': tbr,
+            })
+
+        flashvars = self._parse_json(
+            self._search_regex(
+                r'flashvars\s*=\s*({.+?});', webpage,
+                'flashvars', default='{}'),
+            display_id, fatal=False)
+
+        if flashvars:
+            title = flashvars.get('video_title')
+            thumbnail = flashvars.get('image_url')
+            duration = int_or_none(flashvars.get('video_duration'))
+            encrypted = flashvars.get('encrypted') is True
+            for key, value in flashvars.items():
+                mobj = re.search(r'quality_(\d+)[pP]', key)
+                if mobj:
+                    extract_format(value, int(mobj.group(1)))
+            video_url = flashvars.get('video_url')
+            if video_url and determine_ext(video_url, None):
+                extract_format(video_url)
+
+        video_url = self._html_search_regex(
+            r'flashvars\.video_url\s*=\s*(["\'])(?P<url>http.+?)\1',
+            webpage, 'video url', default=None, group='url')
+        if video_url:
+            extract_format(compat_urllib_parse_unquote(video_url))
+
+        if not formats:
+            if 'title="This video is no longer available"' in webpage:
+                raise ExtractorError(
+                    'Video %s is no longer available' % video_id, expected=True)
+
+        self._sort_formats(formats)
+
+        if not title:
+            title = self._html_search_regex(
+                r'<h1[^>]*>([^<]+)', webpage, 'title')
+
+        return webpage, {
              'id': video_id,
-            'title': video_title,
+            'display_id': display_id,
+            'title': strip_or_none(title),
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'age_limit': 18,
              'formats': formats,
-            'age_limit': age_limit,
-            'thumbnail': flashvars.get('image_url')
          }
+
+    def _real_extract(self, url):
+        webpage, info = self._extract_info(url)
+        info['view_count'] = str_to_int(self._search_regex(
+            r'<b>([\d,.]+)</b> Views?', webpage, 'view count', fatal=False))
+        return info
diff --git a/youtube_dl/extractor/ketnet.py b/youtube_dl/extractor/ketnet.py

new file mode 100644 (file)

index 0000000..eb0a160
--- /dev/null
+++ b/youtube_dl/extractor/ketnet.py
@@ -0,0 +1,72 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class KetnetIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?ketnet\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'https://www.ketnet.be/kijken/zomerse-filmpjes',
+        'md5': 'd907f7b1814ef0fa285c0475d9994ed7',
+        'info_dict': {
+            'id': 'zomerse-filmpjes',
+            'ext': 'mp4',
+            'title': 'Gluur mee op de filmset en op Pennenzakkenrock',
+            'description': 'Gluur mee met Ghost Rockers op de filmset',
+            'thumbnail': 're:^https?://.*\.jpg$',
+        }
+    }, {
+        'url': 'https://www.ketnet.be/kijken/karrewiet/uitzending-8-september-2016',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.ketnet.be/achter-de-schermen/sien-repeteert-voor-stars-for-life',
+        'only_matching': True,
+    }, {
+        # mzsource, geo restricted to Belgium
+        'url': 'https://www.ketnet.be/kijken/nachtwacht/de-bermadoe',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        config = self._parse_json(
+            self._search_regex(
+                r'(?s)playerConfig\s*=\s*({.+?})\s*;', webpage,
+                'player config'),
+            video_id)
+
+        title = config['title']
+
+        formats = []
+        for source_key in ('', 'mz'):
+            source = config.get('%ssource' % source_key)
+            if not isinstance(source, dict):
+                continue
+            for format_id, format_url in source.items():
+                if format_id == 'hls':
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, video_id, 'mp4',
+                        entry_protocol='m3u8_native', m3u8_id=format_id,
+                        fatal=False))
+                elif format_id == 'hds':
+                    formats.extend(self._extract_f4m_formats(
+                        format_url, video_id, f4m_id=format_id, fatal=False))
+                else:
+                    formats.append({
+                        'url': format_url,
+                        'format_id': format_id,
+                    })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': config.get('description'),
+            'thumbnail': config.get('image'),
+            'series': config.get('program'),
+            'episode': config.get('episode'),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/kickstarter.py b/youtube_dl/extractor/kickstarter.py

index 9f1ade2e46e8e2905adaa65eeaf2de22bfed8d2c..d4da8f48462f61358c649537cb1a41d47d9e82b1 100644 (file)
--- a/youtube_dl/extractor/kickstarter.py
+++ b/youtube_dl/extractor/kickstarter.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -6,7 +6,7 @@ from ..utils import smuggle_url
  
  
  class KickStarterIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>[^/]*)/.*'
+    _VALID_URL = r'https?://(?:www\.)?kickstarter\.com/projects/(?P<id>[^/]*)/.*'
      _TESTS = [{
          'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant/description',
          'md5': 'c81addca81327ffa66c642b5d8b08cab',
@@ -37,7 +37,6 @@ class KickStarterIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Power Drive 2000',
          },
-        'expected_warnings': ['OpenGraph description'],
      }]
  
      def _real_extract(self, url):
@@ -67,6 +66,6 @@ class KickStarterIE(InfoExtractor):
              'id': video_id,
              'url': video_url,
              'title': title,
-            'description': self._og_search_description(webpage),
+            'description': self._og_search_description(webpage, default=None),
              'thumbnail': thumbnail,
          }
diff --git a/youtube_dl/extractor/kontrtube.py b/youtube_dl/extractor/kontrtube.py

index 704bd7b34554af60dfec9b811251f5270cbd1f55..1fda451075e4e0638e0799fc2bb976f21a4bcf8e 100644 (file)
--- a/youtube_dl/extractor/kontrtube.py
+++ b/youtube_dl/extractor/kontrtube.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/krasview.py b/youtube_dl/extractor/krasview.py

index 0ae8ebd687034343c364dbc968d90d84f5bc37df..cf8876fa1f2321e7b020e2e773452f82df1bd2f1 100644 (file)
--- a/youtube_dl/extractor/krasview.py
+++ b/youtube_dl/extractor/krasview.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import json
diff --git a/youtube_dl/extractor/kusi.py b/youtube_dl/extractor/kusi.py

index 12cc56e444aaa63839664c8e70f82154045041c7..2e66e8cf9d791abe27d908e04e48fd6cd3bfd4dc 100644 (file)
--- a/youtube_dl/extractor/kusi.py
+++ b/youtube_dl/extractor/kusi.py
@@ -18,31 +18,20 @@ from ..utils import (
  class KUSIIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?kusi\.com/(?P<path>story/.+|video\?clipId=(?P<clipId>\d+))'
      _TESTS = [{
-        'url': 'http://www.kusi.com/story/31183873/turko-files-case-closed-put-on-hold',
-        'md5': 'f926e7684294cf8cb7bdf8858e1b3988',
+        'url': 'http://www.kusi.com/story/32849881/turko-files-refused-to-help-it-aint-right',
+        'md5': '4e76ce8e53660ce9697d06c0ba6fc47d',
          'info_dict': {
-            'id': '12203019',
+            'id': '12689020',
              'ext': 'mp4',
-            'title': 'Turko Files: Case Closed! & Put On Hold!',
-            'duration': 231.0,
-            'upload_date': '20160210',
-            'timestamp': 1455087571,
+            'title': "Turko Files: Refused to Help, It Ain't Right!",
+            'duration': 223.586,
+            'upload_date': '20160826',
+            'timestamp': 1472233118,
              'thumbnail': 're:^https?://.*\.jpg$'
          },
      }, {
          'url': 'http://kusi.com/video?clipId=12203019',
-        'info_dict': {
-            'id': '12203019',
-            'ext': 'mp4',
-            'title': 'Turko Files: Case Closed! & Put On Hold!',
-            'duration': 231.0,
-            'upload_date': '20160210',
-            'timestamp': 1455087571,
-            'thumbnail': 're:^https?://.*\.jpg$'
-        },
-        'params': {
-            'skip_download': True,  # Same as previous one
-        },
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/kuwo.py b/youtube_dl/extractor/kuwo.py

index a586308b2d31e8bbac83b8446c5e00dd9b2bdce9..63e10125e670b96cf706bb6c1c131ea33377a920 100644 (file)
--- a/youtube_dl/extractor/kuwo.py
+++ b/youtube_dl/extractor/kuwo.py
@@ -4,6 +4,7 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_urlparse
  from ..utils import (
      get_element_by_id,
      clean_html,
@@ -26,10 +27,18 @@ class KuwoBaseIE(InfoExtractor):
      def _get_formats(self, song_id, tolerate_ip_deny=False):
          formats = []
          for file_format in self._FORMATS:
+            query = {
+                'format': file_format['ext'],
+                'br': file_format.get('br', ''),
+                'rid': 'MUSIC_%s' % song_id,
+                'type': 'convert_url',
+                'response': 'url'
+            }
+
              song_url = self._download_webpage(
-                'http://antiserver.kuwo.cn/anti.s?format=%s&br=%s&rid=MUSIC_%s&type=convert_url&response=url' %
-                (file_format['ext'], file_format.get('br', ''), song_id),
+                'http://antiserver.kuwo.cn/anti.s',
                  song_id, note='Download %s url info' % file_format['format'],
+                query=query, headers=self.geo_verification_headers(),
              )
  
              if song_url == 'IPDeny' and not tolerate_ip_deny:
@@ -44,18 +53,13 @@ class KuwoBaseIE(InfoExtractor):
                      'abr': file_format.get('abr'),
                  })
  
-        # XXX _sort_formats fails if there are not formats, while it's not the
-        # desired behavior if 'IPDeny' is ignored
-        # This check can be removed if https://github.com/rg3/youtube-dl/pull/8051 is merged
-        if not tolerate_ip_deny:
-            self._sort_formats(formats)
          return formats
  
  
  class KuwoIE(KuwoBaseIE):
      IE_NAME = 'kuwo:song'
      IE_DESC = '酷我音乐'
-    _VALID_URL = r'https?://www\.kuwo\.cn/yinyue/(?P<id>\d+?)'
+    _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/yinyue/(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://www.kuwo.cn/yinyue/635632/',
          'info_dict': {
@@ -73,12 +77,12 @@ class KuwoIE(KuwoBaseIE):
              'id': '6446136',
              'ext': 'mp3',
              'title': '心',
-            'description': 'md5:b2ab6295d014005bfc607525bfc1e38a',
+            'description': 'md5:5d0e947b242c35dc0eb1d2fce9fbf02c',
              'creator': 'IU',
              'upload_date': '20150518',
          },
          'params': {
-            'format': 'mp3-320'
+            'format': 'mp3-320',
          },
      }, {
          'url': 'http://www.kuwo.cn/yinyue/3197154?catalog=yueku2016',
@@ -87,25 +91,26 @@ class KuwoIE(KuwoBaseIE):
  
      def _real_extract(self, url):
          song_id = self._match_id(url)
-        webpage = self._download_webpage(
+        webpage, urlh = self._download_webpage_handle(
              url, song_id, note='Download song detail info',
              errnote='Unable to get song detail info')
-        if '对不起，该歌曲由于版权问题已被下线，将返回网站首页' in webpage:
+        if song_id not in urlh.geturl() or '对不起，该歌曲由于版权问题已被下线，将返回网站首页' in webpage:
              raise ExtractorError('this song has been offline because of copyright issues', expected=True)
  
          song_name = self._html_search_regex(
-            r'(?s)class="(?:[^"\s]+\s+)*title(?:\s+[^"\s]+)*".*?<h1[^>]+title="([^"]+)"', webpage, 'song name')
-        singer_name = self._html_search_regex(
-            r'<div[^>]+class="s_img">\s*<a[^>]+title="([^>]+)"',
-            webpage, 'singer name', fatal=False)
+            r'<p[^>]+id="lrcName">([^<]+)</p>', webpage, 'song name')
+        singer_name = remove_start(self._html_search_regex(
+            r'<a[^>]+href="http://www\.kuwo\.cn/artist/content\?name=([^"]+)">',
+            webpage, 'singer name', fatal=False), '歌手')
          lrc_content = clean_html(get_element_by_id('lrcContent', webpage))
          if lrc_content == '暂无':     # indicates no lyrics
              lrc_content = None
  
          formats = self._get_formats(song_id)
+        self._sort_formats(formats)
  
          album_id = self._html_search_regex(
-            r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
+            r'<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
              webpage, 'album id', fatal=False)
  
          publish_time = None
@@ -134,13 +139,13 @@ class KuwoIE(KuwoBaseIE):
  class KuwoAlbumIE(InfoExtractor):
      IE_NAME = 'kuwo:album'
      IE_DESC = '酷我音乐 - 专辑'
-    _VALID_URL = r'https?://www\.kuwo\.cn/album/(?P<id>\d+?)/'
+    _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/album/(?P<id>\d+?)/'
      _TEST = {
          'url': 'http://www.kuwo.cn/album/502294/',
          'info_dict': {
              'id': '502294',
-            'title': 'M',
-            'description': 'md5:6a7235a84cc6400ec3b38a7bdaf1d60c',
+            'title': 'Made\xa0Series\xa0《M》',
+            'description': 'md5:d463f0d8a0ff3c3ea3d6ed7452a9483f',
          },
          'playlist_count': 2,
      }
@@ -176,7 +181,7 @@ class KuwoChartIE(InfoExtractor):
          'info_dict': {
              'id': '香港中文龙虎榜',
          },
-        'playlist_mincount': 10,
+        'playlist_mincount': 7,
      }
  
      def _real_extract(self, url):
@@ -195,12 +200,12 @@ class KuwoChartIE(InfoExtractor):
  class KuwoSingerIE(InfoExtractor):
      IE_NAME = 'kuwo:singer'
      IE_DESC = '酷我音乐 - 歌手'
-    _VALID_URL = r'https?://www\.kuwo\.cn/mingxing/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/mingxing/(?P<id>[^/]+)'
      _TESTS = [{
          'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
          'info_dict': {
              'id': 'bruno+mars',
-            'title': 'Bruno Mars',
+            'title': 'Bruno\xa0Mars',
          },
          'playlist_mincount': 329,
      }, {
@@ -238,8 +243,9 @@ class KuwoSingerIE(InfoExtractor):
                  query={'artistId': artist_id, 'pn': page_num, 'rn': self.PAGE_SIZE})
  
              return [
-                self.url_result(song_url, 'Kuwo') for song_url in re.findall(
-                    r'<div[^>]+class="name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)',
+                self.url_result(compat_urlparse.urljoin(url, song_url), 'Kuwo')
+                for song_url in re.findall(
+                    r'<div[^>]+class="name"><a[^>]+href="(/yinyue/\d+)',
                      webpage)
              ]
  
@@ -259,7 +265,7 @@ class KuwoCategoryIE(InfoExtractor):
              'title': '八十年代精选',
              'description': '这些都是属于八十年代的回忆！',
          },
-        'playlist_count': 30,
+        'playlist_mincount': 24,
      }
  
      def _real_extract(self, url):
@@ -274,6 +280,8 @@ class KuwoCategoryIE(InfoExtractor):
          category_desc = remove_start(
              get_element_by_id('intro', webpage).strip(),
              '%s简介：' % category_name)
+        if category_desc == '暂无':
+            category_desc = None
  
          jsonm = self._parse_json(self._html_search_regex(
              r'var\s+jsonm\s*=\s*([^;]+);', webpage, 'category songs'), category_id)
@@ -288,7 +296,7 @@ class KuwoCategoryIE(InfoExtractor):
  class KuwoMvIE(KuwoBaseIE):
      IE_NAME = 'kuwo:mv'
      IE_DESC = '酷我音乐 - MV'
-    _VALID_URL = r'https?://www\.kuwo\.cn/mv/(?P<id>\d+?)/'
+    _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/mv/(?P<id>\d+?)/'
      _TEST = {
          'url': 'http://www.kuwo.cn/mv/6480076/',
          'info_dict': {
diff --git a/youtube_dl/extractor/la7.py b/youtube_dl/extractor/la7.py

index b08f6e3c9548de02217e43bebbf20b5f2ab871e8..da5a5de4ad7e65b995a257303096b4bc58061b67 100644 (file)
--- a/youtube_dl/extractor/la7.py
+++ b/youtube_dl/extractor/la7.py
@@ -1,60 +1,65 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from ..utils import (
-    parse_duration,
+    js_to_json,
+    smuggle_url,
  )
  
  
  class LA7IE(InfoExtractor):
-    IE_NAME = 'la7.tv'
-    _VALID_URL = r'''(?x)
-        https?://(?:www\.)?la7\.tv/
-        (?:
-            richplayer/\?assetid=|
-            \?contentId=
-        )
-        (?P<id>[0-9]+)'''
-
-    _TEST = {
-        'url': 'http://www.la7.tv/richplayer/?assetid=50355319',
-        'md5': 'ec7d1f0224d20ba293ab56cf2259651f',
+    IE_NAME = 'la7.it'
+    _VALID_URL = r'''(?x)(https?://)?(?:
+        (?:www\.)?la7\.it/([^/]+)/(?:rivedila7|video)/|
+        tg\.la7\.it/repliche-tgla7\?id=
+    )(?P<id>.+)'''
+
+    _TESTS = [{
+        # 'src' is a plain URL
+        'url': 'http://www.la7.it/crozza/video/inccool8-02-10-2015-163722',
+        'md5': '8b613ffc0c4bf9b9e377169fc19c214c',
          'info_dict': {
-            'id': '50355319',
+            'id': 'inccool8-02-10-2015-163722',
              'ext': 'mp4',
-            'title': 'IL DIVO',
-            'description': 'Un film di Paolo Sorrentino con Toni Servillo, Anna Bonaiuto, Giulio Bosetti  e Flavio Bucci',
-            'duration': 6254,
+            'title': 'Inc.Cool8',
+            'description': 'Benvenuti nell\'incredibile mondo della INC. COOL. 8. dove “INC.” sta per “Incorporated” “COOL” sta per “fashion” ed Eight sta per il gesto  atletico',
+            'thumbnail': 're:^https?://.*',
+            'uploader_id': 'kdla7pillole@iltrovatore.it',
+            'timestamp': 1443814869,
+            'upload_date': '20151002',
          },
-        'skip': 'Blocked in the US',
-    }
+    }, {
+        # 'src' is a dictionary
+        'url': 'http://tg.la7.it/repliche-tgla7?id=189080',
+        'md5': '6b0d8888d286e39870208dfeceaf456b',
+        'info_dict': {
+            'id': '189080',
+            'ext': 'mp4',
+            'title': 'TG LA7',
+        },
+    }, {
+        'url': 'http://www.la7.it/omnibus/rivedila7/omnibus-news-02-07-2016-189077',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        xml_url = 'http://www.la7.tv/repliche/content/index.php?contentId=%s' % video_id
-        doc = self._download_xml(xml_url, video_id)
-
-        video_title = doc.find('title').text
-        description = doc.find('description').text
-        duration = parse_duration(doc.find('duration').text)
-        thumbnail = doc.find('img').text
-        view_count = int(doc.find('views').text)
  
-        prefix = doc.find('.//fqdn').text.strip().replace('auto:', 'http:')
+        webpage = self._download_webpage(url, video_id)
  
-        formats = [{
-            'format': vnode.find('quality').text,
-            'tbr': int(vnode.find('quality').text),
-            'url': vnode.find('fms').text.strip().replace('mp4:', prefix),
-        } for vnode in doc.findall('.//videos/video')]
-        self._sort_formats(formats)
+        player_data = self._parse_json(
+            self._search_regex(r'videoLa7\(({[^;]+})\);', webpage, 'player data'),
+            video_id, transform_source=js_to_json)
  
          return {
+            '_type': 'url_transparent',
+            'url': smuggle_url('kaltura:103:%s' % player_data['vid'], {
+                'service_url': 'http://kdam.iltrovatore.it',
+            }),
              'id': video_id,
-            'title': video_title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats,
-            'view_count': view_count,
+            'title': player_data['title'],
+            'description': self._og_search_description(webpage, default=None),
+            'thumbnail': player_data.get('poster'),
+            'ie_key': 'Kaltura',
          }
diff --git a/youtube_dl/extractor/laola1tv.py b/youtube_dl/extractor/laola1tv.py

index d4fbafece22cc18a3938ff32cb29dfeb162ac122..2fab38079aac0c5f20a1772d52fa52642cb520bf 100644 (file)
--- a/youtube_dl/extractor/laola1tv.py
+++ b/youtube_dl/extractor/laola1tv.py
@@ -63,6 +63,7 @@ class Laola1TvIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
+        'skip': 'This live stream has already finished.',
      }]
  
      def _real_extract(self, url):
@@ -74,6 +75,9 @@ class Laola1TvIE(InfoExtractor):
  
          webpage = self._download_webpage(url, display_id)
  
+        if 'Dieser Livestream ist bereits beendet.' in webpage:
+            raise ExtractorError('This live stream has already finished.', expected=True)
+
          iframe_url = self._search_regex(
              r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
              webpage, 'iframe url')
diff --git a/youtube_dl/extractor/lci.py b/youtube_dl/extractor/lci.py

new file mode 100644 (file)

index 0000000..af34829
--- /dev/null
+++ b/youtube_dl/extractor/lci.py
@@ -0,0 +1,24 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class LCIIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?lci\.fr/[^/]+/[\w-]+-(?P<id>\d+)\.html'
+    _TEST = {
+        'url': 'http://www.lci.fr/international/etats-unis-a-j-62-hillary-clinton-reste-sans-voix-2001679.html',
+        'md5': '2fdb2538b884d4d695f9bd2bde137e6c',
+        'info_dict': {
+            'id': '13244802',
+            'ext': 'mp4',
+            'title': 'Hillary Clinton et sa quinte de toux, en plein meeting',
+            'description': 'md5:a4363e3a960860132f8124b62f4a01c9',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        wat_id = self._search_regex(r'data-watid=[\'"](\d+)', webpage, 'wat id')
+        return self.url_result('wat:' + wat_id, 'Wat', wat_id)
diff --git a/youtube_dl/extractor/lcp.py b/youtube_dl/extractor/lcp.py

new file mode 100644 (file)

index 0000000..ade27a9
--- /dev/null
+++ b/youtube_dl/extractor/lcp.py
@@ -0,0 +1,90 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .arkena import ArkenaIE
+
+
+class LcpPlayIE(ArkenaIE):
+    _VALID_URL = r'https?://play\.lcp\.fr/embed/(?P<id>[^/]+)/(?P<account_id>[^/]+)/[^/]+/[^/]+'
+    _TESTS = [{
+        'url': 'http://play.lcp.fr/embed/327336/131064/darkmatter/0',
+        'md5': 'b8bd9298542929c06c1c15788b1f277a',
+        'info_dict': {
+            'id': '327336',
+            'ext': 'mp4',
+            'title': '327336',
+            'timestamp': 1456391602,
+            'upload_date': '20160225',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
+
+
+class LcpIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?lcp\.fr/(?:[^/]+/)*(?P<id>[^/]+)'
+
+    _TESTS = [{
+        # arkena embed
+        'url': 'http://www.lcp.fr/la-politique-en-video/schwartzenberg-prg-preconise-francois-hollande-de-participer-une-primaire',
+        'md5': 'b8bd9298542929c06c1c15788b1f277a',
+        'info_dict': {
+            'id': 'd56d03e9',
+            'ext': 'mp4',
+            'title': 'Schwartzenberg (PRG) préconise à François Hollande de participer à une primaire à gauche',
+            'description': 'md5:96ad55009548da9dea19f4120c6c16a8',
+            'timestamp': 1456488895,
+            'upload_date': '20160226',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # dailymotion live stream
+        'url': 'http://www.lcp.fr/le-direct',
+        'info_dict': {
+            'id': 'xji3qy',
+            'ext': 'mp4',
+            'title': 'La Chaine Parlementaire (LCP), Live TNT',
+            'description': 'md5:5c69593f2de0f38bd9a949f2c95e870b',
+            'uploader': 'LCP',
+            'uploader_id': 'xbz33d',
+            'timestamp': 1308923058,
+            'upload_date': '20110624',
+        },
+        'params': {
+            # m3u8 live stream
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.lcp.fr/emissions/277792-les-volontaires',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        play_url = self._search_regex(
+            r'<iframe[^>]+src=(["\'])(?P<url>%s?(?:(?!\1).)*)\1' % LcpPlayIE._VALID_URL,
+            webpage, 'play iframe', default=None, group='url')
+
+        if not play_url:
+            return self.url_result(url, 'Generic')
+
+        title = self._og_search_title(webpage, default=None) or self._html_search_meta(
+            'twitter:title', webpage, fatal=True)
+        description = self._html_search_meta(
+            ('description', 'twitter:description'), webpage)
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': LcpPlayIE.ie_key(),
+            'url': play_url,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+        }
diff --git a/youtube_dl/extractor/learnr.py b/youtube_dl/extractor/learnr.py

new file mode 100644 (file)

index 0000000..1435e09
--- /dev/null
+++ b/youtube_dl/extractor/learnr.py
@@ -0,0 +1,33 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class LearnrIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?learnr\.pro/view/video/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.learnr.pro/view/video/51624-web-development-tutorial-for-beginners-1-how-to-build-webpages-with-html-css-javascript',
+        'md5': '3719fdf0a68397f49899e82c308a89de',
+        'info_dict': {
+            'id': '51624',
+            'ext': 'mp4',
+            'title': 'Web Development Tutorial for Beginners (#1) - How to build webpages with HTML, CSS, Javascript',
+            'description': 'md5:b36dbfa92350176cdf12b4d388485503',
+            'uploader': 'LearnCode.academy',
+            'uploader_id': 'learncodeacademy',
+            'upload_date': '20131021',
+        },
+        'add_ie': ['Youtube'],
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        return {
+            '_type': 'url_transparent',
+            'url': self._search_regex(
+                r"videoId\s*:\s*'([^']+)'", webpage, 'youtube id'),
+            'id': video_id,
+        }
diff --git a/youtube_dl/extractor/lecture2go.py b/youtube_dl/extractor/lecture2go.py

index 40a3d23468636877cc485ac9e064ee3527a3dcb2..81b5d41be4a676e55c795fe233591913e7a691c8 100644 (file)
--- a/youtube_dl/extractor/lecture2go.py
+++ b/youtube_dl/extractor/lecture2go.py
@@ -6,6 +6,7 @@ import re
  from .common import InfoExtractor
  from ..utils import (
      determine_ext,
+    determine_protocol,
      parse_duration,
      int_or_none,
  )
@@ -18,10 +19,14 @@ class Lecture2GoIE(InfoExtractor):
          'md5': 'ac02b570883020d208d405d5a3fd2f7f',
          'info_dict': {
              'id': '17473',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': '2 - Endliche Automaten und reguläre Sprachen',
              'creator': 'Frank Heitmann',
              'duration': 5220,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          }
      }
  
@@ -32,14 +37,18 @@ class Lecture2GoIE(InfoExtractor):
          title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title')
  
          formats = []
-        for url in set(re.findall(r'"src","([^"]+)"', webpage)):
+        for url in set(re.findall(r'var\s+playerUri\d+\s*=\s*"([^"]+)"', webpage)):
              ext = determine_ext(url)
+            protocol = determine_protocol({'url': url})
              if ext == 'f4m':
-                formats.extend(self._extract_f4m_formats(url, video_id))
+                formats.extend(self._extract_f4m_formats(url, video_id, f4m_id='hds'))
              elif ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(url, video_id))
+                formats.extend(self._extract_m3u8_formats(url, video_id, ext='mp4', m3u8_id='hls'))
              else:
+                if protocol == 'rtmp':
+                    continue  # XXX: currently broken
                  formats.append({
+                    'format_id': protocol,
                      'url': url,
                  })
  
diff --git a/youtube_dl/extractor/leeco.py b/youtube_dl/extractor/leeco.py

index 375fdaed129421371f8575c3aeceb71ed4712de7..c48a5aad17ad36324b3cf70956d0ed234ffa522b 100644 (file)
--- a/youtube_dl/extractor/leeco.py
+++ b/youtube_dl/extractor/leeco.py
@@ -20,15 +20,16 @@ from ..utils import (
      int_or_none,
      orderedSet,
      parse_iso8601,
-    sanitized_Request,
      str_or_none,
      url_basename,
+    urshift,
+    update_url_query,
  )
  
  
  class LeIE(InfoExtractor):
      IE_DESC = '乐视网'
-    _VALID_URL = r'https?://www\.le\.com/ptv/vplay/(?P<id>\d+)\.html'
+    _VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|(?:sports\.le|(?:www\.)?lesports)\.com/(?:match|video))/(?P<id>\d+)\.html'
  
      _URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
  
@@ -69,17 +70,22 @@ class LeIE(InfoExtractor):
              'hls_prefer_native': True,
          },
          'skip': 'Only available in China',
+    }, {
+        'url': 'http://sports.le.com/video/25737697.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.lesports.com/match/1023203003.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://sports.le.com/match/1023203003.html',
+        'only_matching': True,
      }]
  
-    @staticmethod
-    def urshift(val, n):
-        return val >> n if val >= 0 else (val + 0x100000000) >> n
-
      # ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf
      def ror(self, param1, param2):
          _loc3_ = 0
          while _loc3_ < param2:
-            param1 = self.urshift(param1, 1) + ((param1 & 1) << 31)
+            param1 = urshift(param1, 1) + ((param1 & 1) << 31)
              _loc3_ += 1
          return param1
  
@@ -90,6 +96,10 @@ class LeIE(InfoExtractor):
          _loc3_ = self.ror(_loc3_, _loc2_ % 17)
          return _loc3_
  
+    # reversed from http://jstatic.letvcdn.com/sdk/player.js
+    def get_mms_key(self, time):
+        return self.ror(time, 8) ^ 185025305
+
      # see M3U8Encryption class in KLetvPlayer.swf
      @staticmethod
      def decrypt_m3u8(encrypted_data):
@@ -110,28 +120,7 @@ class LeIE(InfoExtractor):
  
          return bytes(_loc7_)
  
-    def _real_extract(self, url):
-        media_id = self._match_id(url)
-        page = self._download_webpage(url, media_id)
-        params = {
-            'id': media_id,
-            'platid': 1,
-            'splatid': 101,
-            'format': 1,
-            'tkey': self.calc_time_key(int(time.time())),
-            'domain': 'www.le.com'
-        }
-        play_json_req = sanitized_Request(
-            'http://api.le.com/mms/out/video/playJson?' + compat_urllib_parse_urlencode(params)
-        )
-        cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
-        if cn_verification_proxy:
-            play_json_req.add_header('Ytdl-request-proxy', cn_verification_proxy)
-
-        play_json = self._download_json(
-            play_json_req,
-            media_id, 'Downloading playJson data')
-
+    def _check_errors(self, play_json):
          # Check for errors
          playstatus = play_json['playstatus']
          if playstatus['status'] == 0:
@@ -142,43 +131,99 @@ class LeIE(InfoExtractor):
                  msg = 'Generic error. flag = %d' % flag
              raise ExtractorError(msg, expected=True)
  
-        playurl = play_json['playurl']
-
-        formats = ['350', '1000', '1300', '720p', '1080p']
-        dispatch = playurl['dispatch']
+    def _real_extract(self, url):
+        media_id = self._match_id(url)
+        page = self._download_webpage(url, media_id)
  
-        urls = []
-        for format_id in formats:
-            if format_id in dispatch:
-                media_url = playurl['domain'][0] + dispatch[format_id][0]
-                media_url += '&' + compat_urllib_parse_urlencode({
-                    'm3v': 1,
+        play_json_h5 = self._download_json(
+            'http://api.le.com/mms/out/video/playJsonH5',
+            media_id, 'Downloading html5 playJson data', query={
+                'id': media_id,
+                'platid': 3,
+                'splatid': 304,
+                'format': 1,
+                'tkey': self.get_mms_key(int(time.time())),
+                'domain': 'www.le.com',
+                'tss': 'no',
+            },
+            headers=self.geo_verification_headers())
+        self._check_errors(play_json_h5)
+
+        play_json_flash = self._download_json(
+            'http://api.le.com/mms/out/video/playJson',
+            media_id, 'Downloading flash playJson data', query={
+                'id': media_id,
+                'platid': 1,
+                'splatid': 101,
+                'format': 1,
+                'tkey': self.calc_time_key(int(time.time())),
+                'domain': 'www.le.com',
+            },
+            headers=self.geo_verification_headers())
+        self._check_errors(play_json_flash)
+
+        def get_h5_urls(media_url, format_id):
+            location = self._download_json(
+                media_url, media_id,
+                'Download JSON metadata for format %s' % format_id, query={
                      'format': 1,
                      'expect': 3,
-                    'rateid': format_id,
-                })
+                    'tss': 'no',
+                })['location']
  
-                nodes_data = self._download_json(
-                    media_url, media_id,
-                    'Download JSON metadata for format %s' % format_id)
+            return {
+                'http': update_url_query(location, {'tss': 'no'}),
+                'hls': update_url_query(location, {'tss': 'ios'}),
+            }
+
+        def get_flash_urls(media_url, format_id):
+            media_url += '&' + compat_urllib_parse_urlencode({
+                'm3v': 1,
+                'format': 1,
+                'expect': 3,
+                'rateid': format_id,
+            })
  
-                req = self._request_webpage(
-                    nodes_data['nodelist'][0]['location'], media_id,
-                    note='Downloading m3u8 information for format %s' % format_id)
+            nodes_data = self._download_json(
+                media_url, media_id,
+                'Download JSON metadata for format %s' % format_id)
  
-                m3u8_data = self.decrypt_m3u8(req.read())
+            req = self._request_webpage(
+                nodes_data['nodelist'][0]['location'], media_id,
+                note='Downloading m3u8 information for format %s' % format_id)
  
-                url_info_dict = {
-                    'url': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
-                    'ext': determine_ext(dispatch[format_id][1]),
-                    'format_id': format_id,
-                    'protocol': 'm3u8',
-                }
+            m3u8_data = self.decrypt_m3u8(req.read())
  
-                if format_id[-1:] == 'p':
-                    url_info_dict['height'] = int_or_none(format_id[:-1])
+            return {
+                'hls': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
+            }
  
-                urls.append(url_info_dict)
+        extracted_formats = []
+        formats = []
+        for play_json, get_urls in ((play_json_h5, get_h5_urls), (play_json_flash, get_flash_urls)):
+            playurl = play_json['playurl']
+            play_domain = playurl['domain'][0]
+
+            for format_id, format_data in playurl.get('dispatch', []).items():
+                if format_id in extracted_formats:
+                    continue
+                extracted_formats.append(format_id)
+
+                media_url = play_domain + format_data[0]
+                for protocol, format_url in get_urls(media_url, format_id).items():
+                    f = {
+                        'url': format_url,
+                        'ext': determine_ext(format_data[1]),
+                        'format_id': '%s-%s' % (protocol, format_id),
+                        'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
+                        'quality': int_or_none(format_id),
+                    }
+
+                    if format_id[-1:] == 'p':
+                        f['height'] = int_or_none(format_id[:-1])
+
+                    formats.append(f)
+        self._sort_formats(formats, ('height', 'quality', 'format_id'))
  
          publish_time = parse_iso8601(self._html_search_regex(
              r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None),
@@ -187,7 +232,7 @@ class LeIE(InfoExtractor):
  
          return {
              'id': media_id,
-            'formats': urls,
+            'formats': formats,
              'title': playurl['title'],
              'thumbnail': playurl['pic'],
              'description': description,
@@ -196,7 +241,7 @@ class LeIE(InfoExtractor):
  
  
  class LePlaylistIE(InfoExtractor):
-    _VALID_URL = r'https?://[a-z]+\.le\.com/[a-z]+/(?P<id>[a-z0-9_]+)'
+    _VALID_URL = r'https?://[a-z]+\.le\.com/(?!video)[a-z]+/(?P<id>[a-z0-9_]+)'
  
      _TESTS = [{
          'url': 'http://www.le.com/tv/46177.html',
diff --git a/youtube_dl/extractor/lego.py b/youtube_dl/extractor/lego.py

new file mode 100644 (file)

index 0000000..d3bca64
--- /dev/null
+++ b/youtube_dl/extractor/lego.py
@@ -0,0 +1,128 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    unescapeHTML,
+    parse_duration,
+    get_element_by_class,
+)
+
+
+class LEGOIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
+    _TESTS = [{
+        'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
+        'md5': 'f34468f176cfd76488767fc162c405fa',
+        'info_dict': {
+            'id': '55492d823b1b4d5e985787fa8c2973b1',
+            'ext': 'mp4',
+            'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
+            'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
+        },
+    }, {
+        # geo-restricted but the contentUrl contain a valid url
+        'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
+        'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
+        'info_dict': {
+            'id': '13bdc2299ab24d9685701a915b3d71e7',
+            'ext': 'mp4',
+            'title': 'Aflevering 20 - Helden van het koninkrijk',
+            'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
+        },
+    }, {
+        # special characters in title
+        'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
+        'info_dict': {
+            'id': '9685ee9d12e84ff38e84b4e3d0db533d',
+            'ext': 'mp4',
+            'title': 'Force Surprise – LEGO® Star Wars™ Microfighters',
+            'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
+    _BITRATES = [256, 512, 1024, 1536, 2560]
+
+    def _real_extract(self, url):
+        locale, video_id = re.match(self._VALID_URL, url).groups()
+        webpage = self._download_webpage(url, video_id)
+        title = get_element_by_class('video-header', webpage).strip()
+        progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
+        streaming_base = 'http://legoprod-f.akamaihd.net/'
+        content_url = self._html_search_meta('contentUrl', webpage)
+        path = self._search_regex(
+            r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
+            content_url, 'video path', default=None)
+        if not path:
+            player_url = self._proto_relative_url(self._search_regex(
+                r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
+                webpage, 'player url', default=None))
+            if not player_url:
+                base_url = self._proto_relative_url(self._search_regex(
+                    r'data-baseurl="([^"]+)"', webpage, 'base url',
+                    default='http://www.lego.com/%s/mediaplayer/video/' % locale))
+                player_url = base_url + video_id
+            player_webpage = self._download_webpage(player_url, video_id)
+            video_data = self._parse_json(unescapeHTML(self._search_regex(
+                r"video='([^']+)'", player_webpage, 'video data')), video_id)
+            progressive_base = self._search_regex(
+                r'data-video-progressive-url="([^"]+)"',
+                player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
+            streaming_base = self._search_regex(
+                r'data-video-streaming-url="([^"]+)"',
+                player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
+            item_id = video_data['ItemId']
+
+            net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
+            base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
+            path = '/'.join([net_storage_path, base_path])
+        streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
+
+        formats = self._extract_akamai_formats(
+            '%si/s/public/%s_,%s,.mp4.csmil/master.m3u8' % (streaming_base, path, streaming_path), video_id)
+        m3u8_formats = list(filter(
+            lambda f: f.get('protocol') == 'm3u8_native' and f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
+            formats))
+        if len(m3u8_formats) == len(self._BITRATES):
+            self._sort_formats(m3u8_formats)
+            for bitrate, m3u8_format in zip(self._BITRATES, m3u8_formats):
+                progressive_base_url = '%spublic/%s_%d.' % (progressive_base, path, bitrate)
+                mp4_f = m3u8_format.copy()
+                mp4_f.update({
+                    'url': progressive_base_url + 'mp4',
+                    'format_id': m3u8_format['format_id'].replace('hls', 'mp4'),
+                    'protocol': 'http',
+                })
+                web_f = {
+                    'url': progressive_base_url + 'webm',
+                    'format_id': m3u8_format['format_id'].replace('hls', 'webm'),
+                    'width': m3u8_format['width'],
+                    'height': m3u8_format['height'],
+                    'tbr': m3u8_format.get('tbr'),
+                    'ext': 'webm',
+                }
+                formats.extend([web_f, mp4_f])
+        else:
+            for bitrate in self._BITRATES:
+                for ext in ('web', 'mp4'):
+                    formats.append({
+                        'format_id': '%s-%s' % (ext, bitrate),
+                        'url': '%spublic/%s_%d.%s' % (progressive_base, path, bitrate, ext),
+                        'tbr': bitrate,
+                        'ext': ext,
+                    })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': self._html_search_meta('description', webpage),
+            'thumbnail': self._html_search_meta('thumbnail', webpage),
+            'duration': parse_duration(self._html_search_meta('duration', webpage)),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/libraryofcongress.py b/youtube_dl/extractor/libraryofcongress.py

new file mode 100644 (file)

index 0000000..0a94366
--- /dev/null
+++ b/youtube_dl/extractor/libraryofcongress.py
@@ -0,0 +1,143 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+from ..utils import (
+    determine_ext,
+    float_or_none,
+    int_or_none,
+    parse_filesize,
+)
+
+
+class LibraryOfCongressIE(InfoExtractor):
+    IE_NAME = 'loc'
+    IE_DESC = 'Library of Congress'
+    _VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9]+)'
+    _TESTS = [{
+        # embedded via <div class="media-player"
+        'url': 'http://loc.gov/item/90716351/',
+        'md5': '353917ff7f0255aa6d4b80a034833de8',
+        'info_dict': {
+            'id': '90716351',
+            'ext': 'mp4',
+            'title': "Pa's trip to Mars",
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 0,
+            'view_count': int,
+        },
+    }, {
+        # webcast embedded via mediaObjectId
+        'url': 'https://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=5578',
+        'info_dict': {
+            'id': '5578',
+            'ext': 'mp4',
+            'title': 'Help! Preservation Training Needs Here, There & Everywhere',
+            'duration': 3765,
+            'view_count': int,
+            'subtitles': 'mincount:1',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # with direct download links
+        'url': 'https://www.loc.gov/item/78710669/',
+        'info_dict': {
+            'id': '78710669',
+            'ext': 'mp4',
+            'title': 'La vie et la passion de Jesus-Christ',
+            'duration': 0,
+            'view_count': int,
+            'formats': 'mincount:4',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        media_id = self._search_regex(
+            (r'id=(["\'])media-player-(?P<id>.+?)\1',
+             r'<video[^>]+id=(["\'])uuid-(?P<id>.+?)\1',
+             r'<video[^>]+data-uuid=(["\'])(?P<id>.+?)\1',
+             r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1'),
+            webpage, 'media id', group='id')
+
+        data = self._download_json(
+            'https://media.loc.gov/services/v1/media?id=%s&context=json' % media_id,
+            video_id)['mediaObject']
+
+        derivative = data['derivatives'][0]
+        media_url = derivative['derivativeUrl']
+
+        title = derivative.get('shortName') or data.get('shortName') or self._og_search_title(
+            webpage)
+
+        # Following algorithm was extracted from setAVSource js function
+        # found in webpage
+        media_url = media_url.replace('rtmp', 'https')
+
+        is_video = data.get('mediaType', 'v').lower() == 'v'
+        ext = determine_ext(media_url)
+        if ext not in ('mp4', 'mp3'):
+            media_url += '.mp4' if is_video else '.mp3'
+
+        if 'vod/mp4:' in media_url:
+            formats = [{
+                'url': media_url.replace('vod/mp4:', 'hls-vod/media/') + '.m3u8',
+                'format_id': 'hls',
+                'ext': 'mp4',
+                'protocol': 'm3u8_native',
+                'quality': 1,
+            }]
+        elif 'vod/mp3:' in media_url:
+            formats = [{
+                'url': media_url.replace('vod/mp3:', ''),
+                'vcodec': 'none',
+            }]
+
+        download_urls = set()
+        for m in re.finditer(
+                r'<option[^>]+value=(["\'])(?P<url>.+?)\1[^>]+data-file-download=[^>]+>\s*(?P<id>.+?)(?:(?:&nbsp;|\s+)\((?P<size>.+?)\))?\s*<', webpage):
+            format_id = m.group('id').lower()
+            if format_id == 'gif':
+                continue
+            download_url = m.group('url')
+            if download_url in download_urls:
+                continue
+            download_urls.add(download_url)
+            formats.append({
+                'url': download_url,
+                'format_id': format_id,
+                'filesize_approx': parse_filesize(m.group('size')),
+            })
+
+        self._sort_formats(formats)
+
+        duration = float_or_none(data.get('duration'))
+        view_count = int_or_none(data.get('viewCount'))
+
+        subtitles = {}
+        cc_url = data.get('ccUrl')
+        if cc_url:
+            subtitles.setdefault('en', []).append({
+                'url': cc_url,
+                'ext': 'ttml',
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': self._og_search_thumbnail(webpage, default=None),
+            'duration': duration,
+            'view_count': view_count,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/lifenews.py b/youtube_dl/extractor/lifenews.py

index ba2f80a757d071042b8d574721bde37a1b7006ba..afce2010eafadc3ceaab1eaa7d846e5e6360d547 100644 (file)
--- a/youtube_dl/extractor/lifenews.py
+++ b/youtube_dl/extractor/lifenews.py
@@ -1,54 +1,62 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
+from ..compat import (
+    compat_str,
+    compat_urlparse,
+)
  from ..utils import (
      determine_ext,
+    ExtractorError,
      int_or_none,
+    parse_iso8601,
      remove_end,
-    unified_strdate,
-    ExtractorError,
  )
  
  
  class LifeNewsIE(InfoExtractor):
-    IE_NAME = 'lifenews'
-    IE_DESC = 'LIFE | NEWS'
-    _VALID_URL = r'https?://lifenews\.ru/(?:mobile/)?(?P<section>news|video)/(?P<id>\d+)'
+    IE_NAME = 'life'
+    IE_DESC = 'Life.ru'
+    _VALID_URL = r'https?://life\.ru/t/[^/]+/(?P<id>\d+)'
  
      _TESTS = [{
          # single video embedded via video/source
-        'url': 'http://lifenews.ru/news/98736',
+        'url': 'https://life.ru/t/новости/98736',
          'md5': '77c95eaefaca216e32a76a343ad89d23',
          'info_dict': {
              'id': '98736',
              'ext': 'mp4',
              'title': 'Мужчина нашел дома архив оборонного завода',
              'description': 'md5:3b06b1b39b5e2bea548e403d99b8bf26',
+            'timestamp': 1344154740,
              'upload_date': '20120805',
+            'view_count': int,
          }
      }, {
          # single video embedded via iframe
-        'url': 'http://lifenews.ru/news/152125',
+        'url': 'https://life.ru/t/новости/152125',
          'md5': '77d19a6f0886cd76bdbf44b4d971a273',
          'info_dict': {
              'id': '152125',
              'ext': 'mp4',
              'title': 'В Сети появилось видео захвата «Правым сектором» колхозных полей ',
              'description': 'Жители двух поселков Днепропетровской области не простили радикалам угрозу лишения плодородных земель и пошли в лобовую. ',
+            'timestamp': 1427961840,
              'upload_date': '20150402',
+            'view_count': int,
          }
      }, {
          # two videos embedded via iframe
-        'url': 'http://lifenews.ru/news/153461',
+        'url': 'https://life.ru/t/новости/153461',
          'info_dict': {
              'id': '153461',
              'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве',
              'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
-            'upload_date': '20150505',
+            'timestamp': 1430825520,
+            'view_count': int,
          },
          'playlist': [{
              'md5': '9b6ef8bc0ffa25aebc8bdb40d89ab795',
@@ -57,6 +65,7 @@ class LifeNewsIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 1)',
                  'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
+                'timestamp': 1430825520,
                  'upload_date': '20150505',
              },
          }, {
@@ -66,28 +75,31 @@ class LifeNewsIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 2)',
                  'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
+                'timestamp': 1430825520,
                  'upload_date': '20150505',
              },
          }],
      }, {
-        'url': 'http://lifenews.ru/video/13035',
+        'url': 'https://life.ru/t/новости/213035',
+        'only_matching': True,
+    }, {
+        'url': 'https://life.ru/t/%D0%BD%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8/153461',
+        'only_matching': True,
+    }, {
+        'url': 'https://life.ru/t/новости/411489/manuel_vals_nazval_frantsiiu_tsieliu_nomier_odin_dlia_ighil',
          'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        section = mobj.group('section')
+        video_id = self._match_id(url)
  
-        webpage = self._download_webpage(
-            'http://lifenews.ru/%s/%s' % (section, video_id),
-            video_id, 'Downloading page')
+        webpage = self._download_webpage(url, video_id)
  
          video_urls = re.findall(
              r'<video[^>]+><source[^>]+src=["\'](.+?)["\']', webpage)
  
          iframe_links = re.findall(
-            r'<iframe[^>]+src=["\']((?:https?:)?//embed\.life\.ru/embed/.+?)["\']',
+            r'<iframe[^>]+src=["\']((?:https?:)?//embed\.life\.ru/(?:embed|video)/.+?)["\']',
              webpage)
  
          if not video_urls and not iframe_links:
@@ -95,26 +107,22 @@ class LifeNewsIE(InfoExtractor):
  
          title = remove_end(
              self._og_search_title(webpage),
-            ' - Первый по срочным новостям — LIFE | NEWS')
+            ' - Life.ru')
  
          description = self._og_search_description(webpage)
  
          view_count = self._html_search_regex(
-            r'<div class=\'views\'>\s*(\d+)\s*</div>', webpage, 'view count', fatal=False)
-        comment_count = self._html_search_regex(
-            r'=\'commentCount\'[^>]*>\s*(\d+)\s*<',
-            webpage, 'comment count', fatal=False)
+            r'<div[^>]+class=(["\']).*?\bhits-count\b.*?\1[^>]*>\s*(?P<value>\d+)\s*</div>',
+            webpage, 'view count', fatal=False, group='value')
  
-        upload_date = self._html_search_regex(
-            r'<time[^>]*datetime=\'([^\']+)\'', webpage, 'upload date', fatal=False)
-        if upload_date is not None:
-            upload_date = unified_strdate(upload_date)
+        timestamp = parse_iso8601(self._search_regex(
+            r'<time[^>]+datetime=(["\'])(?P<value>.+?)\1',
+            webpage, 'upload date', fatal=False, group='value'))
  
          common_info = {
              'description': description,
              'view_count': int_or_none(view_count),
-            'comment_count': int_or_none(comment_count),
-            'upload_date': upload_date,
+            'timestamp': timestamp,
          }
  
          def make_entry(video_id, video_url, index=None):
@@ -159,9 +167,9 @@ class LifeNewsIE(InfoExtractor):
  
  class LifeEmbedIE(InfoExtractor):
      IE_NAME = 'life:embed'
-    _VALID_URL = r'https?://embed\.life\.ru/embed/(?P<id>[\da-f]{32})'
+    _VALID_URL = r'https?://embed\.life\.ru/(?:embed|video)/(?P<id>[\da-f]{32})'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://embed.life.ru/embed/e50c2dec2867350528e2574c899b8291',
          'md5': 'b889715c9e49cb1981281d0e5458fbbe',
          'info_dict': {
@@ -170,29 +178,57 @@ class LifeEmbedIE(InfoExtractor):
              'title': 'e50c2dec2867350528e2574c899b8291',
              'thumbnail': 're:http://.*\.jpg',
          }
-    }
+    }, {
+        # with 1080p
+        'url': 'https://embed.life.ru/video/e50c2dec2867350528e2574c899b8291',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
  
+        thumbnail = None
          formats = []
-        for video_url in re.findall(r'"file"\s*:\s*"([^"]+)', webpage):
-            video_url = compat_urlparse.urljoin(url, video_url)
-            ext = determine_ext(video_url)
-            if ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(
-                    video_url, video_id, 'mp4', m3u8_id='m3u8'))
-            else:
-                formats.append({
-                    'url': video_url,
-                    'format_id': ext,
-                    'preference': 1,
-                })
+
+        def extract_m3u8(manifest_url):
+            formats.extend(self._extract_m3u8_formats(
+                manifest_url, video_id, 'mp4',
+                entry_protocol='m3u8_native', m3u8_id='m3u8'))
+
+        def extract_original(original_url):
+            formats.append({
+                'url': original_url,
+                'format_id': determine_ext(original_url, None),
+                'preference': 1,
+            })
+
+        playlist = self._parse_json(
+            self._search_regex(
+                r'options\s*=\s*({.+?});', webpage, 'options', default='{}'),
+            video_id).get('playlist', {})
+        if playlist:
+            master = playlist.get('master')
+            if isinstance(master, compat_str) and determine_ext(master) == 'm3u8':
+                extract_m3u8(compat_urlparse.urljoin(url, master))
+            original = playlist.get('original')
+            if isinstance(original, compat_str):
+                extract_original(original)
+            thumbnail = playlist.get('image')
+
+        # Old rendition fallback
+        if not formats:
+            for video_url in re.findall(r'"file"\s*:\s*"([^"]+)', webpage):
+                video_url = compat_urlparse.urljoin(url, video_url)
+                if determine_ext(video_url) == 'm3u8':
+                    extract_m3u8(video_url)
+                else:
+                    extract_original(video_url)
+
          self._sort_formats(formats)
  
-        thumbnail = self._search_regex(
+        thumbnail = thumbnail or self._search_regex(
              r'"image"\s*:\s*"([^"]+)', webpage, 'thumbnail', default=None)
  
          return {
diff --git a/youtube_dl/extractor/limelight.py b/youtube_dl/extractor/limelight.py

index 2599d45c37e3c7874e12227677962fae3a2fbf84..b7bfa7a6d524e4a5ebd190947b52a369a211e753 100644 (file)
--- a/youtube_dl/extractor/limelight.py
+++ b/youtube_dl/extractor/limelight.py
@@ -34,14 +34,16 @@ class LimelightBaseIE(InfoExtractor):
      def _extract_info(self, streams, mobile_urls, properties):
          video_id = properties['media_id']
          formats = []
-
+        urls = []
          for stream in streams:
              stream_url = stream.get('url')
-            if not stream_url:
+            if not stream_url or stream.get('drmProtected') or stream_url in urls:
                  continue
-            if '.f4m' in stream_url:
+            urls.append(stream_url)
+            ext = determine_ext(stream_url)
+            if ext == 'f4m':
                  formats.extend(self._extract_f4m_formats(
-                    stream_url, video_id, fatal=False))
+                    stream_url, video_id, f4m_id='hds', fatal=False))
              else:
                  fmt = {
                      'url': stream_url,
@@ -50,13 +52,21 @@ class LimelightBaseIE(InfoExtractor):
                      'fps': float_or_none(stream.get('videoFrameRate')),
                      'width': int_or_none(stream.get('videoWidthInPixels')),
                      'height': int_or_none(stream.get('videoHeightInPixels')),
-                    'ext': determine_ext(stream_url)
+                    'ext': ext,
                  }
-                rtmp = re.search(r'^(?P<url>rtmpe?://[^/]+/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url)
+                rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url)
                  if rtmp:
                      format_id = 'rtmp'
                      if stream.get('videoBitRate'):
                          format_id += '-%d' % int_or_none(stream['videoBitRate'])
+                    http_url = 'http://cpl.delvenetworks.com/' + rtmp.group('playpath')[4:]
+                    urls.append(http_url)
+                    http_fmt = fmt.copy()
+                    http_fmt.update({
+                        'url': http_url,
+                        'format_id': format_id.replace('rtmp', 'http'),
+                    })
+                    formats.append(http_fmt)
                      fmt.update({
                          'url': rtmp.group('url'),
                          'play_path': rtmp.group('playpath'),
@@ -68,18 +78,24 @@ class LimelightBaseIE(InfoExtractor):
  
          for mobile_url in mobile_urls:
              media_url = mobile_url.get('mobileUrl')
-            if not media_url:
-                continue
              format_id = mobile_url.get('targetMediaPlatform')
-            if determine_ext(media_url) == 'm3u8':
+            if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls:
+                continue
+            urls.append(media_url)
+            ext = determine_ext(media_url)
+            if ext == 'm3u8':
                  formats.extend(self._extract_m3u8_formats(
                      media_url, video_id, 'mp4', 'm3u8_native',
                      m3u8_id=format_id, fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    stream_url, video_id, f4m_id=format_id, fatal=False))
              else:
                  formats.append({
                      'url': media_url,
                      'format_id': format_id,
                      'preference': -1,
+                    'ext': ext,
                  })
  
          self._sort_formats(formats)
@@ -98,13 +114,19 @@ class LimelightBaseIE(InfoExtractor):
          } for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
  
          subtitles = {}
-        for caption in properties.get('captions', {}):
+        for caption in properties.get('captions', []):
              lang = caption.get('language_code')
              subtitles_url = caption.get('url')
              if lang and subtitles_url:
-                subtitles[lang] = [{
+                subtitles.setdefault(lang, []).append({
                      'url': subtitles_url,
-                }]
+                })
+        closed_captions_url = properties.get('closed_captions_url')
+        if closed_captions_url:
+            subtitles.setdefault('en', []).append({
+                'url': closed_captions_url,
+                'ext': 'ttml',
+            })
  
          return {
              'id': video_id,
@@ -123,12 +145,23 @@ class LimelightBaseIE(InfoExtractor):
  
  class LimelightMediaIE(LimelightBaseIE):
      IE_NAME = 'limelight'
-    _VALID_URL = r'(?:limelight:media:|https?://link\.videoplatform\.limelight\.com/media/\??\bmediaId=)(?P<id>[a-z0-9]{32})'
+    _VALID_URL = r'''(?x)
+                        (?:
+                            limelight:media:|
+                            https?://
+                                (?:
+                                    link\.videoplatform\.limelight\.com/media/|
+                                    assets\.delvenetworks\.com/player/loader\.swf
+                                )
+                                \?.*?\bmediaId=
+                        )
+                        (?P<id>[a-z0-9]{32})
+                    '''
      _TESTS = [{
          'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
          'info_dict': {
              'id': '3ffd040b522b4485b6d84effc750cd86',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'HaP and the HB Prince Trailer',
              'description': 'md5:8005b944181778e313d95c1237ddb640',
              'thumbnail': 're:^https?://.*\.jpeg$',
@@ -137,27 +170,26 @@ class LimelightMediaIE(LimelightBaseIE):
              'upload_date': '20090604',
          },
          'params': {
-            # rtmp download
+            # m3u8 download
              'skip_download': True,
          },
      }, {
          # video with subtitles
          'url': 'limelight:media:a3e00274d4564ec4a9b29b9466432335',
+        'md5': '2fa3bad9ac321e23860ca23bc2c69e3d',
          'info_dict': {
              'id': 'a3e00274d4564ec4a9b29b9466432335',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': '3Play Media Overview Video',
-            'description': '',
              'thumbnail': 're:^https?://.*\.jpeg$',
              'duration': 78.101,
              'timestamp': 1338929955,
              'upload_date': '20120605',
              'subtitles': 'mincount:9',
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
+    }, {
+        'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
+        'only_matching': True,
      }]
      _PLAYLIST_SERVICE_PATH = 'media'
      _API_PATH = 'media'
@@ -176,15 +208,29 @@ class LimelightMediaIE(LimelightBaseIE):
  
  class LimelightChannelIE(LimelightBaseIE):
      IE_NAME = 'limelight:channel'
-    _VALID_URL = r'(?:limelight:channel:|https?://link\.videoplatform\.limelight\.com/media/\??\bchannelId=)(?P<id>[a-z0-9]{32})'
-    _TEST = {
+    _VALID_URL = r'''(?x)
+                        (?:
+                            limelight:channel:|
+                            https?://
+                                (?:
+                                    link\.videoplatform\.limelight\.com/media/|
+                                    assets\.delvenetworks\.com/player/loader\.swf
+                                )
+                                \?.*?\bchannelId=
+                        )
+                        (?P<id>[a-z0-9]{32})
+                    '''
+    _TESTS = [{
          'url': 'http://link.videoplatform.limelight.com/media/?channelId=ab6a524c379342f9b23642917020c082',
          'info_dict': {
              'id': 'ab6a524c379342f9b23642917020c082',
              'title': 'Javascript Sample Code',
          },
          'playlist_mincount': 3,
-    }
+    }, {
+        'url': 'http://assets.delvenetworks.com/player/loader.swf?channelId=ab6a524c379342f9b23642917020c082',
+        'only_matching': True,
+    }]
      _PLAYLIST_SERVICE_PATH = 'channel'
      _API_PATH = 'channels'
  
@@ -207,15 +253,29 @@ class LimelightChannelIE(LimelightBaseIE):
  
  class LimelightChannelListIE(LimelightBaseIE):
      IE_NAME = 'limelight:channel_list'
-    _VALID_URL = r'(?:limelight:channel_list:|https?://link\.videoplatform\.limelight\.com/media/\?.*?\bchannelListId=)(?P<id>[a-z0-9]{32})'
-    _TEST = {
+    _VALID_URL = r'''(?x)
+                        (?:
+                            limelight:channel_list:|
+                            https?://
+                                (?:
+                                    link\.videoplatform\.limelight\.com/media/|
+                                    assets\.delvenetworks\.com/player/loader\.swf
+                                )
+                                \?.*?\bchannelListId=
+                        )
+                        (?P<id>[a-z0-9]{32})
+                    '''
+    _TESTS = [{
          'url': 'http://link.videoplatform.limelight.com/media/?channelListId=301b117890c4465c8179ede21fd92e2b',
          'info_dict': {
              'id': '301b117890c4465c8179ede21fd92e2b',
              'title': 'Website - Hero Player',
          },
          'playlist_mincount': 2,
-    }
+    }, {
+        'url': 'https://assets.delvenetworks.com/player/loader.swf?channelListId=301b117890c4465c8179ede21fd92e2b',
+        'only_matching': True,
+    }]
      _PLAYLIST_SERVICE_PATH = 'channel_list'
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/litv.py b/youtube_dl/extractor/litv.py

new file mode 100644 (file)

index 0000000..ded717c
--- /dev/null
+++ b/youtube_dl/extractor/litv.py
@@ -0,0 +1,148 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    smuggle_url,
+    unsmuggle_url,
+)
+
+
+class LiTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?litv\.tv/(?:vod|promo)/[^/]+/(?:content\.do)?\?.*?\b(?:content_)?id=(?P<id>[^&]+)'
+
+    _URL_TEMPLATE = 'https://www.litv.tv/vod/%s/content.do?id=%s'
+
+    _TESTS = [{
+        'url': 'https://www.litv.tv/vod/drama/content.do?brc_id=root&id=VOD00041610&isUHEnabled=true&autoPlay=1',
+        'info_dict': {
+            'id': 'VOD00041606',
+            'title': '花千骨',
+        },
+        'playlist_count': 50,
+    }, {
+        'url': 'https://www.litv.tv/vod/drama/content.do?brc_id=root&id=VOD00041610&isUHEnabled=true&autoPlay=1',
+        'md5': '969e343d9244778cb29acec608e53640',
+        'info_dict': {
+            'id': 'VOD00041610',
+            'ext': 'mp4',
+            'title': '花千骨第1集',
+            'thumbnail': 're:https?://.*\.jpg$',
+            'description': 'md5:c7017aa144c87467c4fb2909c4b05d6f',
+            'episode_number': 1,
+        },
+        'params': {
+            'noplaylist': True,
+        },
+        'skip': 'Georestricted to Taiwan',
+    }, {
+        'url': 'https://www.litv.tv/promo/miyuezhuan/?content_id=VOD00044841&',
+        'md5': '88322ea132f848d6e3e18b32a832b918',
+        'info_dict': {
+            'id': 'VOD00044841',
+            'ext': 'mp4',
+            'title': '芈月傳第1集　霸星芈月降世楚國',
+            'description': '楚威王二年，太史令唐昧夜觀星象，發現霸星即將現世。王后得知霸星的預言後，想盡辦法不讓孩子順利出生，幸得莒姬相護化解危機。沒想到眾人期待下出生的霸星卻是位公主，楚威王對此失望至極。楚王后命人將女嬰丟棄河中，居然奇蹟似的被少司命像攔下，楚威王認為此女非同凡響，為她取名芈月。',
+        },
+        'skip': 'Georestricted to Taiwan',
+    }]
+
+    def _extract_playlist(self, season_list, video_id, program_info, prompt=True):
+        episode_title = program_info['title']
+        content_id = season_list['contentId']
+
+        if prompt:
+            self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (content_id, video_id))
+
+        all_episodes = [
+            self.url_result(smuggle_url(
+                self._URL_TEMPLATE % (program_info['contentType'], episode['contentId']),
+                {'force_noplaylist': True}))  # To prevent infinite recursion
+            for episode in season_list['episode']]
+
+        return self.playlist_result(all_episodes, content_id, episode_title)
+
+    def _real_extract(self, url):
+        url, data = unsmuggle_url(url, {})
+
+        video_id = self._match_id(url)
+
+        noplaylist = self._downloader.params.get('noplaylist')
+        noplaylist_prompt = True
+        if 'force_noplaylist' in data:
+            noplaylist = data['force_noplaylist']
+            noplaylist_prompt = False
+
+        webpage = self._download_webpage(url, video_id)
+
+        program_info = self._parse_json(self._search_regex(
+            'var\s+programInfo\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'),
+            video_id)
+
+        season_list = list(program_info.get('seasonList', {}).values())
+        if season_list:
+            if not noplaylist:
+                return self._extract_playlist(
+                    season_list[0], video_id, program_info,
+                    prompt=noplaylist_prompt)
+
+            if noplaylist_prompt:
+                self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
+
+        # In browsers `getMainUrl` request is always issued. Usually this
+        # endpoint gives the same result as the data embedded in the webpage.
+        # If georestricted, there are no embedded data, so an extra request is
+        # necessary to get the error code
+        if 'assetId' not in program_info:
+            program_info = self._download_json(
+                'https://www.litv.tv/vod/ajax/getProgramInfo', video_id,
+                query={'contentId': video_id},
+                headers={'Accept': 'application/json'})
+        video_data = self._parse_json(self._search_regex(
+            r'uiHlsUrl\s*=\s*testBackendData\(([^;]+)\);',
+            webpage, 'video data', default='{}'), video_id)
+        if not video_data:
+            payload = {
+                'assetId': program_info['assetId'],
+                'watchDevices': program_info['watchDevices'],
+                'contentType': program_info['contentType'],
+            }
+            video_data = self._download_json(
+                'https://www.litv.tv/vod/getMainUrl', video_id,
+                data=json.dumps(payload).encode('utf-8'),
+                headers={'Content-Type': 'application/json'})
+
+        if not video_data.get('fullpath'):
+            error_msg = video_data.get('errorMessage')
+            if error_msg == 'vod.error.outsideregionerror':
+                self.raise_geo_restricted('This video is available in Taiwan only')
+            if error_msg:
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, error_msg), expected=True)
+            raise ExtractorError('Unexpected result from %s' % self.IE_NAME)
+
+        formats = self._extract_m3u8_formats(
+            video_data['fullpath'], video_id, ext='mp4',
+            entry_protocol='m3u8_native', m3u8_id='hls')
+        for a_format in formats:
+            # LiTV HLS segments doesn't like compressions
+            a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = True
+
+        title = program_info['title'] + program_info.get('secondaryMark', '')
+        description = program_info.get('description')
+        thumbnail = program_info.get('imageFile')
+        categories = [item['name'] for item in program_info.get('category', [])]
+        episode = int_or_none(program_info.get('episode'))
+
+        return {
+            'id': video_id,
+            'formats': formats,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'categories': categories,
+            'episode_number': episode,
+        }
diff --git a/youtube_dl/extractor/liveleak.py b/youtube_dl/extractor/liveleak.py

index 4684994e1726fc1785de3b8df6061f2eaf278ed8..ea0565ac05099aab8c05609aee4140a1b4c2c1c7 100644 (file)
--- a/youtube_dl/extractor/liveleak.py
+++ b/youtube_dl/extractor/liveleak.py
@@ -17,7 +17,8 @@ class LiveLeakIE(InfoExtractor):
              'ext': 'flv',
              'description': 'extremely bad day for this guy..!',
              'uploader': 'ljfriel2',
-            'title': 'Most unlucky car accident'
+            'title': 'Most unlucky car accident',
+            'thumbnail': 're:^https?://.*\.jpg$'
          }
      }, {
          'url': 'http://www.liveleak.com/view?i=f93_1390833151',
@@ -28,6 +29,7 @@ class LiveLeakIE(InfoExtractor):
              'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
              'uploader': 'ARD_Stinkt',
              'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
+            'thumbnail': 're:^https?://.*\.jpg$'
          }
      }, {
          'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
@@ -49,10 +51,19 @@ class LiveLeakIE(InfoExtractor):
              'ext': 'mp4',
              'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
              'uploader': 'bony333',
-            'title': 'Crazy Hungarian tourist films close call waterspout in Croatia'
+            'title': 'Crazy Hungarian tourist films close call waterspout in Croatia',
+            'thumbnail': 're:^https?://.*\.jpg$'
          }
      }]
  
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src="https?://(?:\w+\.)?liveleak\.com/ll_embed\?(?:.*?)i=(?P<id>[\w_]+)(?:.*)',
+            webpage)
+        if mobj:
+            return 'http://www.liveleak.com/view?i=%s' % mobj.group('id')
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
@@ -64,6 +75,7 @@ class LiveLeakIE(InfoExtractor):
          age_limit = int_or_none(self._search_regex(
              r'you confirm that you are ([0-9]+) years and over.',
              webpage, 'age limit', default=None))
+        video_thumbnail = self._og_search_thumbnail(webpage)
  
          sources_raw = self._search_regex(
              r'(?s)sources:\s*(\[.*?\]),', webpage, 'video URLs', default=None)
@@ -116,4 +128,5 @@ class LiveLeakIE(InfoExtractor):
              'uploader': video_uploader,
              'formats': formats,
              'age_limit': age_limit,
+            'thumbnail': video_thumbnail,
          }
diff --git a/youtube_dl/extractor/livestream.py b/youtube_dl/extractor/livestream.py

index eada7c299238953baa9fd3d8219b2754aa7f9356..bc7894bf13ed29963aa1dad7880cf8549be1ca77 100644 (file)
--- a/youtube_dl/extractor/livestream.py
+++ b/youtube_dl/extractor/livestream.py
@@ -150,7 +150,7 @@ class LivestreamIE(InfoExtractor):
          }
  
      def _extract_stream_info(self, stream_info):
-        broadcast_id = stream_info['broadcast_id']
+        broadcast_id = compat_str(stream_info['broadcast_id'])
          is_live = stream_info.get('is_live')
  
          formats = []
@@ -203,9 +203,10 @@ class LivestreamIE(InfoExtractor):
              if not videos_info:
                  break
              for v in videos_info:
+                v_id = compat_str(v['id'])
                  entries.append(self.url_result(
-                    'http://livestream.com/accounts/%s/events/%s/videos/%s' % (account_id, event_id, v['id']),
-                    'Livestream', v['id'], v['caption']))
+                    'http://livestream.com/accounts/%s/events/%s/videos/%s' % (account_id, event_id, v_id),
+                    'Livestream', v_id, v.get('caption')))
              last_video = videos_info[-1]['id']
          return self.playlist_result(entries, event_id, event_data['full_name'])
  
diff --git a/youtube_dl/extractor/localnews8.py b/youtube_dl/extractor/localnews8.py

new file mode 100644 (file)

index 0000000..aad3961
--- /dev/null
+++ b/youtube_dl/extractor/localnews8.py
@@ -0,0 +1,47 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class LocalNews8IE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?localnews8\.com/(?:[^/]+/)*(?P<display_id>[^/]+)/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.localnews8.com/news/rexburg-business-turns-carbon-fiber-scraps-into-wedding-rings/35183304',
+        'md5': 'be4d48aea61aa2bde7be2ee47691ad20',
+        'info_dict': {
+            'id': '35183304',
+            'display_id': 'rexburg-business-turns-carbon-fiber-scraps-into-wedding-rings',
+            'ext': 'mp4',
+            'title': 'Rexburg business turns carbon fiber scraps into wedding ring',
+            'description': 'The process was first invented by Lamborghini and less than a dozen companies around the world use it.',
+            'duration': 153,
+            'timestamp': 1441844822,
+            'upload_date': '20150910',
+            'uploader_id': 'api',
+        }
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        webpage = self._download_webpage(url, display_id)
+
+        partner_id = self._search_regex(
+            r'partnerId\s*[:=]\s*(["\'])(?P<id>\d+)\1',
+            webpage, 'partner id', group='id')
+        kaltura_id = self._search_regex(
+            r'videoIdString\s*[:=]\s*(["\'])kaltura:(?P<id>[0-9a-z_]+)\1',
+            webpage, 'videl id', group='id')
+
+        return {
+            '_type': 'url_transparent',
+            'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
+            'ie_key': 'Kaltura',
+            'id': video_id,
+            'display_id': display_id,
+        }
diff --git a/youtube_dl/extractor/lrt.py b/youtube_dl/extractor/lrt.py

index 1072405b30c7663d19ddc4df86f858d94952fda5..f5c997ef4c79398734b2bd5feb09fd335eaf5ade 100644 (file)
--- a/youtube_dl/extractor/lrt.py
+++ b/youtube_dl/extractor/lrt.py
@@ -1,8 +1,11 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      int_or_none,
      parse_duration,
      remove_end,
@@ -12,8 +15,10 @@ from ..utils import (
  class LRTIE(InfoExtractor):
      IE_NAME = 'lrt.lt'
      _VALID_URL = r'https?://(?:www\.)?lrt\.lt/mediateka/irasas/(?P<id>[0-9]+)'
-    _TEST = {
+    _TESTS = [{
+        # m3u8 download
          'url': 'http://www.lrt.lt/mediateka/irasas/54391/',
+        'md5': 'fe44cf7e4ab3198055f2c598fc175cb0',
          'info_dict': {
              'id': '54391',
              'ext': 'mp4',
@@ -23,20 +28,45 @@ class LRTIE(InfoExtractor):
              'view_count': int,
              'like_count': int,
          },
-        'params': {
-            'skip_download': True,  # m3u8 download
+    }, {
+        # direct mp3 download
+        'url': 'http://www.lrt.lt/mediateka/irasas/1013074524/',
+        'md5': '389da8ca3cad0f51d12bed0c844f6a0a',
+        'info_dict': {
+            'id': '1013074524',
+            'ext': 'mp3',
+            'title': 'Kita tema 2016-09-05 15:05',
+            'description': 'md5:1b295a8fc7219ed0d543fc228c931fb5',
+            'duration': 3008,
+            'view_count': int,
+            'like_count': int,
          },
-    }
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
          title = remove_end(self._og_search_title(webpage), ' - LRT')
-        m3u8_url = self._search_regex(
-            r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*location\.hash\.substring\(1\)',
-            webpage, 'm3u8 url', group='url')
-        formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
+
+        formats = []
+        for _, file_url in re.findall(
+                r'file\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage):
+            ext = determine_ext(file_url)
+            if ext not in ('m3u8', 'mp3'):
+                continue
+            # mp3 served as m3u8 produces stuttered media file
+            if ext == 'm3u8' and '.mp3' in file_url:
+                continue
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    file_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    fatal=False))
+            elif ext == 'mp3':
+                formats.append({
+                    'url': file_url,
+                    'vcodec': 'none',
+                })
          self._sort_formats(formats)
  
          thumbnail = self._og_search_thumbnail(webpage)
diff --git a/youtube_dl/extractor/lynda.py b/youtube_dl/extractor/lynda.py

index 6556274790c771c26b353b9c26e5f9eab187eff4..f4dcfd93fa760878566568636d9c2b864b6c7556 100644 (file)
--- a/youtube_dl/extractor/lynda.py
+++ b/youtube_dl/extractor/lynda.py
@@ -1,106 +1,106 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import (
+    compat_HTTPError,
+    compat_str,
+    compat_urlparse,
+)
  from ..utils import (
      ExtractorError,
-    clean_html,
      int_or_none,
-    sanitized_Request,
      urlencode_postdata,
  )
  
  
  class LyndaBaseIE(InfoExtractor):
-    _LOGIN_URL = 'https://www.lynda.com/login/login.aspx'
+    _SIGNIN_URL = 'https://www.lynda.com/signin'
+    _PASSWORD_URL = 'https://www.lynda.com/signin/password'
+    _USER_URL = 'https://www.lynda.com/signin/user'
      _ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.'
      _NETRC_MACHINE = 'lynda'
  
      def _real_initialize(self):
          self._login()
  
+    @staticmethod
+    def _check_error(json_string, key_or_keys):
+        keys = [key_or_keys] if isinstance(key_or_keys, compat_str) else key_or_keys
+        for key in keys:
+            error = json_string.get(key)
+            if error:
+                raise ExtractorError('Unable to login: %s' % error, expected=True)
+
+    def _login_step(self, form_html, fallback_action_url, extra_form_data, note, referrer_url):
+        action_url = self._search_regex(
+            r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_html,
+            'post url', default=fallback_action_url, group='url')
+
+        if not action_url.startswith('http'):
+            action_url = compat_urlparse.urljoin(self._SIGNIN_URL, action_url)
+
+        form_data = self._hidden_inputs(form_html)
+        form_data.update(extra_form_data)
+
+        try:
+            response = self._download_json(
+                action_url, None, note,
+                data=urlencode_postdata(form_data),
+                headers={
+                    'Referer': referrer_url,
+                    'X-Requested-With': 'XMLHttpRequest',
+                })
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
+                response = self._parse_json(e.cause.read().decode('utf-8'), None)
+                self._check_error(response, ('email', 'password'))
+            raise
+
+        self._check_error(response, 'ErrorMessage')
+
+        return response, action_url
+
      def _login(self):
          username, password = self._get_login_info()
          if username is None:
              return
  
-        login_form = {
-            'username': username.encode('utf-8'),
-            'password': password.encode('utf-8'),
-            'remember': 'false',
-            'stayPut': 'false'
-        }
-        request = sanitized_Request(
-            self._LOGIN_URL, urlencode_postdata(login_form))
-        login_page = self._download_webpage(
-            request, None, 'Logging in as %s' % username)
-
-        # Not (yet) logged in
-        m = re.search(r'loginResultJson\s*=\s*\'(?P<json>[^\']+)\';', login_page)
-        if m is not None:
-            response = m.group('json')
-            response_json = json.loads(response)
-            state = response_json['state']
-
-            if state == 'notlogged':
-                raise ExtractorError(
-                    'Unable to login, incorrect username and/or password',
-                    expected=True)
-
-            # This is when we get popup:
-            # > You're already logged in to lynda.com on two devices.
-            # > If you log in here, we'll log you out of another device.
-            # So, we need to confirm this.
-            if state == 'conflicted':
-                confirm_form = {
-                    'username': '',
-                    'password': '',
-                    'resolve': 'true',
-                    'remember': 'false',
-                    'stayPut': 'false',
-                }
-                request = sanitized_Request(
-                    self._LOGIN_URL, urlencode_postdata(confirm_form))
-                login_page = self._download_webpage(
-                    request, None,
-                    'Confirming log in and log out from another device')
-
-        if all(not re.search(p, login_page) for p in ('isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
-            if 'login error' in login_page:
-                mobj = re.search(
-                    r'(?s)<h1[^>]+class="topmost">(?P<title>[^<]+)</h1>\s*<div>(?P<description>.+?)</div>',
-                    login_page)
-                if mobj:
-                    raise ExtractorError(
-                        'lynda returned error: %s - %s'
-                        % (mobj.group('title'), clean_html(mobj.group('description'))),
-                        expected=True)
-            raise ExtractorError('Unable to log in')
-
-    def _logout(self):
-        username, _ = self._get_login_info()
-        if username is None:
+        # Step 1: download signin page
+        signin_page = self._download_webpage(
+            self._SIGNIN_URL, None, 'Downloading signin page')
+
+        # Already logged in
+        if any(re.search(p, signin_page) for p in (
+                'isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
              return
  
-        self._download_webpage(
-            'http://www.lynda.com/ajax/logout.aspx', None,
-            'Logging out', 'Unable to log out', fatal=False)
+        # Step 2: submit email
+        signin_form = self._search_regex(
+            r'(?s)(<form[^>]+data-form-name=["\']signin["\'][^>]*>.+?</form>)',
+            signin_page, 'signin form')
+        signin_page, signin_url = self._login_step(
+            signin_form, self._PASSWORD_URL, {'email': username},
+            'Submitting email', self._SIGNIN_URL)
+
+        # Step 3: submit password
+        password_form = signin_page['body']
+        self._login_step(
+            password_form, self._USER_URL, {'email': username, 'password': password},
+            'Submitting password', signin_url)
  
  
  class LyndaIE(LyndaBaseIE):
      IE_NAME = 'lynda'
      IE_DESC = 'lynda.com videos'
-    _VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
-    _NETRC_MACHINE = 'lynda'
+    _VALID_URL = r'https?://(?:www\.)?lynda\.com/(?:[^/]+/[^/]+/(?P<course_id>\d+)|player/embed)/(?P<id>\d+)'
  
      _TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'
  
      _TESTS = [{
-        'url': 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
-        'md5': 'ecfc6862da89489161fb9cd5f5a6fac1',
+        'url': 'https://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
+        # md5 is unstable
          'info_dict': {
              'id': '114408',
              'ext': 'mp4',
@@ -112,19 +112,71 @@ class LyndaIE(LyndaBaseIE):
          'only_matching': True,
      }]
  
+    def _raise_unavailable(self, video_id):
+        self.raise_login_required(
+            'Video %s is only available for members' % video_id)
+
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        course_id = mobj.group('course_id')
+
+        query = {
+            'videoId': video_id,
+            'type': 'video',
+        }
  
          video = self._download_json(
-            'http://www.lynda.com/ajax/player?videoId=%s&type=video' % video_id,
-            video_id, 'Downloading video JSON')
+            'https://www.lynda.com/ajax/player', video_id,
+            'Downloading video JSON', fatal=False, query=query)
+
+        # Fallback scenario
+        if not video:
+            query['courseId'] = course_id
+
+            play = self._download_json(
+                'https://www.lynda.com/ajax/course/%s/%s/play'
+                % (course_id, video_id), video_id, 'Downloading play JSON')
+
+            if not play:
+                self._raise_unavailable(video_id)
+
+            formats = []
+            for formats_dict in play:
+                urls = formats_dict.get('urls')
+                if not isinstance(urls, dict):
+                    continue
+                cdn = formats_dict.get('name')
+                for format_id, format_url in urls.items():
+                    if not format_url:
+                        continue
+                    formats.append({
+                        'url': format_url,
+                        'format_id': '%s-%s' % (cdn, format_id) if cdn else format_id,
+                        'height': int_or_none(format_id),
+                    })
+            self._sort_formats(formats)
+
+            conviva = self._download_json(
+                'https://www.lynda.com/ajax/player/conviva', video_id,
+                'Downloading conviva JSON', query=query)
+
+            return {
+                'id': video_id,
+                'title': conviva['VideoTitle'],
+                'description': conviva.get('VideoDescription'),
+                'release_year': int_or_none(conviva.get('ReleaseYear')),
+                'duration': int_or_none(conviva.get('Duration')),
+                'creator': conviva.get('Author'),
+                'formats': formats,
+            }
  
          if 'Status' in video:
              raise ExtractorError(
                  'lynda returned error: %s' % video['Message'], expected=True)
  
          if video.get('HasAccess') is False:
-            self.raise_login_required('Video %s is only available for members' % video_id)
+            self._raise_unavailable(video_id)
  
          video_id = compat_str(video.get('ID') or video_id)
          duration = int_or_none(video.get('DurationInSeconds'))
@@ -148,7 +200,7 @@ class LyndaIE(LyndaBaseIE):
              for prioritized_stream_id, prioritized_stream in prioritized_streams.items():
                  formats.extend([{
                      'url': video_url,
-                    'width': int_or_none(format_id),
+                    'height': int_or_none(format_id),
                      'format_id': '%s-%s' % (prioritized_stream_id, format_id),
                  } for format_id, video_url in prioritized_stream.items()])
  
@@ -187,7 +239,7 @@ class LyndaIE(LyndaBaseIE):
              return srt
  
      def _get_subtitles(self, video_id):
-        url = 'http://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
+        url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
          subs = self._download_json(url, None, False)
          if subs:
              return {'en': [{'ext': 'srt', 'data': self._fix_subtitles(subs)}]}
@@ -209,11 +261,9 @@ class LyndaCourseIE(LyndaBaseIE):
          course_id = mobj.group('courseid')
  
          course = self._download_json(
-            'http://www.lynda.com/ajax/player?courseId=%s&type=course' % course_id,
+            'https://www.lynda.com/ajax/player?courseId=%s&type=course' % course_id,
              course_id, 'Downloading course JSON')
  
-        self._logout()
-
          if course.get('Status') == 'NotFound':
              raise ExtractorError(
                  'Course %s does not exist' % course_id, expected=True)
@@ -233,7 +283,7 @@ class LyndaCourseIE(LyndaBaseIE):
                  if video_id:
                      entries.append({
                          '_type': 'url_transparent',
-                        'url': 'http://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
+                        'url': 'https://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
                          'ie_key': LyndaIE.ie_key(),
                          'chapter': chapter.get('Title'),
                          'chapter_number': int_or_none(chapter.get('ChapterIndex')),
@@ -246,5 +296,6 @@ class LyndaCourseIE(LyndaBaseIE):
                  % unaccessible_videos + self._ACCOUNT_CREDENTIALS_HINT)
  
          course_title = course.get('Title')
+        course_description = course.get('Description')
  
-        return self.playlist_result(entries, course_id, course_title)
+        return self.playlist_result(entries, course_id, course_title, course_description)
diff --git a/youtube_dl/extractor/m6.py b/youtube_dl/extractor/m6.py

index d5945ad66b3a784263fb1c5106534081b1f04913..9806875e8d87f2a75e7689fbe0e0fabd6d7eeafe 100644 (file)
--- a/youtube_dl/extractor/m6.py
+++ b/youtube_dl/extractor/m6.py
@@ -1,8 +1,6 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  
  
@@ -23,34 +21,5 @@ class M6IE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        rss = self._download_xml('http://ws.m6.fr/v1/video/info/m6/bonus/%s' % video_id, video_id,
-                                 'Downloading video RSS')
-
-        title = rss.find('./channel/item/title').text
-        description = rss.find('./channel/item/description').text
-        thumbnail = rss.find('./channel/item/visuel_clip_big').text
-        duration = int(rss.find('./channel/item/duration').text)
-        view_count = int(rss.find('./channel/item/nombre_vues').text)
-
-        formats = []
-        for format_id in ['lq', 'sd', 'hq', 'hd']:
-            video_url = rss.find('./channel/item/url_video_%s' % format_id)
-            if video_url is None:
-                continue
-            formats.append({
-                'url': video_url.text,
-                'format_id': format_id,
-            })
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'view_count': view_count,
-            'formats': formats,
-        }
+        video_id = self._match_id(url)
+        return self.url_result('6play:%s' % video_id, 'SixPlay', video_id)
diff --git a/youtube_dl/extractor/macgamestore.py b/youtube_dl/extractor/macgamestore.py

index 3cd4a3a192ce3f6b611f6b3f4f3d928b75c9bba0..43db9929ca805fa7917824cf1bfd466f5721509e 100644 (file)
--- a/youtube_dl/extractor/macgamestore.py
+++ b/youtube_dl/extractor/macgamestore.py
@@ -7,7 +7,7 @@ from ..utils import ExtractorError
  class MacGameStoreIE(InfoExtractor):
      IE_NAME = 'macgamestore'
      IE_DESC = 'MacGameStore trailers'
-    _VALID_URL = r'https?://www\.macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)'
  
      _TEST = {
          'url': 'http://www.macgamestore.com/mediaviewer.php?trailer=2450',
diff --git a/youtube_dl/extractor/mailru.py b/youtube_dl/extractor/mailru.py

index 9a7098c43c600a3cc3ed697252bc784d9a9cf5b7..f7cc3c83289f1101207c385d5bfed2055c7b7f67 100644 (file)
--- a/youtube_dl/extractor/mailru.py
+++ b/youtube_dl/extractor/mailru.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/malemotion.py b/youtube_dl/extractor/malemotion.py

deleted file mode 100644 (file)

index 92511a6..0000000
--- a/youtube_dl/extractor/malemotion.py
+++ /dev/null
@@ -1,46 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
-
-
-class MalemotionIE(InfoExtractor):
-    _VALID_URL = r'https?://malemotion\.com/video/(.+?)\.(?P<id>.+?)(#|$)'
-    _TEST = {
-        'url': 'http://malemotion.com/video/bete-de-concours.ltc',
-        'md5': '3013e53a0afbde2878bc39998c33e8a5',
-        'info_dict': {
-            'id': 'ltc',
-            'ext': 'mp4',
-            'title': 'Bête de Concours',
-            'age_limit': 18,
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        video_url = compat_urllib_parse_unquote(self._search_regex(
-            r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL'))
-        video_title = self._html_search_regex(
-            r'<title>(.*?)</title', webpage, 'title')
-        video_thumbnail = self._search_regex(
-            r'<video .+?poster="(.+?)"', webpage, 'thumbnail', fatal=False)
-
-        formats = [{
-            'url': video_url,
-            'ext': 'mp4',
-            'format_id': 'mp4',
-            'preference': 1,
-        }]
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'formats': formats,
-            'title': video_title,
-            'thumbnail': video_thumbnail,
-            'age_limit': 18,
-        }
diff --git a/youtube_dl/extractor/mangomolo.py b/youtube_dl/extractor/mangomolo.py

new file mode 100644 (file)

index 0000000..1885ac7
--- /dev/null
+++ b/youtube_dl/extractor/mangomolo.py
@@ -0,0 +1,54 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import base64
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_unquote
+from ..utils import (
+    int_or_none,
+)
+
+
+class MangomoloBaseIE(InfoExtractor):
+    def _get_real_id(self, page_id):
+        return page_id
+
+    def _real_extract(self, url):
+        page_id = self._get_real_id(self._match_id(url))
+        webpage = self._download_webpage(url, page_id)
+        hidden_inputs = self._hidden_inputs(webpage)
+        m3u8_entry_protocol = 'm3u8' if self._IS_LIVE else 'm3u8_native'
+
+        format_url = self._html_search_regex(
+            [
+                r'file\s*:\s*"(https?://[^"]+?/playlist.m3u8)',
+                r'<a[^>]+href="(rtsp://[^"]+)"'
+            ], webpage, 'format url')
+        formats = self._extract_wowza_formats(
+            format_url, page_id, m3u8_entry_protocol, ['smil'])
+        self._sort_formats(formats)
+
+        return {
+            'id': page_id,
+            'title': self._live_title(page_id) if self._IS_LIVE else page_id,
+            'uploader_id': hidden_inputs.get('userid'),
+            'duration': int_or_none(hidden_inputs.get('duration')),
+            'is_live': self._IS_LIVE,
+            'formats': formats,
+        }
+
+
+class MangomoloVideoIE(MangomoloBaseIE):
+    IE_NAME = 'mangomolo:video'
+    _VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/video\?.*?\bid=(?P<id>\d+)'
+    _IS_LIVE = False
+
+
+class MangomoloLiveIE(MangomoloBaseIE):
+    IE_NAME = 'mangomolo:live'
+    _VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/index\?.*?\bchannelid=(?P<id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)'
+    _IS_LIVE = True
+
+    def _get_real_id(self, page_id):
+        return base64.b64decode(compat_urllib_parse_unquote(page_id).encode()).decode()
diff --git a/youtube_dl/extractor/matchtv.py b/youtube_dl/extractor/matchtv.py

index 80a0d7013b064b6a919303c80430576f61d36e7c..33b0b539fa9dfde80274d983aa003ea7b39e6622 100644 (file)
--- a/youtube_dl/extractor/matchtv.py
+++ b/youtube_dl/extractor/matchtv.py
@@ -4,16 +4,12 @@ from __future__ import unicode_literals
  import random
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_urlencode
-from ..utils import (
-    sanitized_Request,
-    xpath_text,
-)
+from ..utils import xpath_text
  
  
  class MatchTVIE(InfoExtractor):
-    _VALID_URL = r'https?://matchtv\.ru/?#live-player'
-    _TEST = {
+    _VALID_URL = r'https?://matchtv\.ru(?:/on-air|/?#live-player)'
+    _TESTS = [{
          'url': 'http://matchtv.ru/#live-player',
          'info_dict': {
              'id': 'matchtv-live',
@@ -24,12 +20,16 @@ class MatchTVIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
-    }
+    }, {
+        'url': 'http://matchtv.ru/on-air/',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = 'matchtv-live'
-        request = sanitized_Request(
-            'http://player.matchtv.ntvplus.tv/player/smil?%s' % compat_urllib_parse_urlencode({
+        video_url = self._download_json(
+            'http://player.matchtv.ntvplus.tv/player/smil', video_id,
+            query={
                  'ts': '',
                  'quality': 'SD',
                  'contentId': '561d2c0df7159b37178b4567',
@@ -40,11 +40,10 @@ class MatchTVIE(InfoExtractor):
                  'contentType': 'channel',
                  'timeShift': '0',
                  'platform': 'portal',
-            }),
+            },
              headers={
                  'Referer': 'http://player.matchtv.ntvplus.tv/embed-player/NTVEmbedPlayer.swf',
-            })
-        video_url = self._download_json(request, video_id)['data']['videoUrl']
+            })['data']['videoUrl']
          f4m_url = xpath_text(self._download_xml(video_url, video_id), './to')
          formats = self._extract_f4m_formats(f4m_url, video_id)
          self._sort_formats(formats)
diff --git a/youtube_dl/extractor/mdr.py b/youtube_dl/extractor/mdr.py

index 2338e7f96f36bea7246e7357302cbcbcac39ad8a..2100583df46ab7955846f8e3b08467d13ed3440e 100644 (file)
--- a/youtube_dl/extractor/mdr.py
+++ b/youtube_dl/extractor/mdr.py
@@ -49,8 +49,8 @@ class MDRIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Beutolomäus und der geheime Weihnachtswunsch',
              'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
-            'timestamp': 1419047100,
-            'upload_date': '20141220',
+            'timestamp': 1450950000,
+            'upload_date': '20151224',
              'duration': 4628,
              'uploader': 'KIKA',
          },
@@ -71,8 +71,8 @@ class MDRIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          data_url = self._search_regex(
-            r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>\\?/.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
-            webpage, 'data url', default=None, group='url').replace('\/', '/')
+            r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
+            webpage, 'data url', group='url').replace('\/', '/')
  
          doc = self._download_xml(
              compat_urlparse.urljoin(url, data_url), video_id)
diff --git a/youtube_dl/extractor/meta.py b/youtube_dl/extractor/meta.py

new file mode 100644 (file)

index 0000000..cdb46e1
--- /dev/null
+++ b/youtube_dl/extractor/meta.py
@@ -0,0 +1,73 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .pladform import PladformIE
+from ..utils import (
+    unescapeHTML,
+    int_or_none,
+    ExtractorError,
+)
+
+
+class METAIE(InfoExtractor):
+    _VALID_URL = r'https?://video\.meta\.ua/(?:iframe/)?(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://video.meta.ua/5502115.video',
+        'md5': '71b6f3ee274bef16f1ab410f7f56b476',
+        'info_dict': {
+            'id': '5502115',
+            'ext': 'mp4',
+            'title': 'Sony Xperia Z camera test [HQ]',
+            'description': 'Xperia Z shoots video in FullHD HDR.',
+            'uploader_id': 'nomobile',
+            'uploader': 'CHЁZA.TV',
+            'upload_date': '20130211',
+        },
+        'add_ie': ['Youtube'],
+    }, {
+        'url': 'http://video.meta.ua/iframe/5502115',
+        'only_matching': True,
+    }, {
+        # pladform embed
+        'url': 'http://video.meta.ua/7121015.video',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        st_html5 = self._search_regex(
+            r"st_html5\s*=\s*'#([^']+)'", webpage, 'uppod html5 st', default=None)
+
+        if st_html5:
+            # uppod st decryption algorithm is reverse engineered from function un(s) at uppod.js
+            json_str = ''
+            for i in range(0, len(st_html5), 3):
+                json_str += '&#x0%s;' % st_html5[i:i + 3]
+            uppod_data = self._parse_json(unescapeHTML(json_str), video_id)
+            error = uppod_data.get('customnotfound')
+            if error:
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
+
+            video_url = uppod_data['file']
+            info = {
+                'id': video_id,
+                'url': video_url,
+                'title': uppod_data.get('comment') or self._og_search_title(webpage),
+                'description': self._og_search_description(webpage, default=None),
+                'thumbnail': uppod_data.get('poster') or self._og_search_thumbnail(webpage),
+                'duration': int_or_none(self._og_search_property(
+                    'video:duration', webpage, default=None)),
+            }
+            if 'youtube.com/' in video_url:
+                info.update({
+                    '_type': 'url_transparent',
+                    'ie_key': 'Youtube',
+                })
+            return info
+
+        pladform_url = PladformIE._extract_url(webpage)
+        if pladform_url:
+            return self.url_result(pladform_url)
diff --git a/youtube_dl/extractor/metacafe.py b/youtube_dl/extractor/metacafe.py

index 61dadb7a7de9cbd6134e4a285e655487276c4bce..e6e7659a1de0ebe86f48a4128192de5d14d6d586 100644 (file)
--- a/youtube_dl/extractor/metacafe.py
+++ b/youtube_dl/extractor/metacafe.py
@@ -11,13 +11,14 @@ from ..utils import (
      determine_ext,
      ExtractorError,
      int_or_none,
-    sanitized_Request,
      urlencode_postdata,
+    get_element_by_attribute,
+    mimetype2ext,
  )
  
  
  class MetacafeIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*'
+    _VALID_URL = r'https?://(?:www\.)?metacafe\.com/watch/(?P<video_id>[^/]+)/(?P<display_id>[^/?#]+)'
      _DISCLAIMER = 'http://www.metacafe.com/family_filter/'
      _FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user'
      IE_NAME = 'metacafe'
@@ -47,6 +48,7 @@ class MetacafeIE(InfoExtractor):
                  'uploader': 'ign',
                  'description': 'Sony released a massive FAQ on the PlayStation Blog detailing the PS4\'s capabilities and limitations.',
              },
+            'skip': 'Page is temporarily unavailable.',
          },
          # AnyClip video
          {
@@ -55,8 +57,8 @@ class MetacafeIE(InfoExtractor):
                  'id': 'an-dVVXnuY7Jh77J',
                  'ext': 'mp4',
                  'title': 'The Andromeda Strain (1971): Stop the Bomb Part 3',
-                'uploader': 'anyclip',
-                'description': 'md5:38c711dd98f5bb87acf973d573442e67',
+                'uploader': 'AnyClip',
+                'description': 'md5:cbef0460d31e3807f6feb4e7a5952e5b',
              },
          },
          # age-restricted video
@@ -81,6 +83,9 @@ class MetacafeIE(InfoExtractor):
                  'title': 'Open: This is Face the Nation, February 9',
                  'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476',
                  'duration': 96,
+                'uploader': 'CBSI-NEW',
+                'upload_date': '20140209',
+                'timestamp': 1391959800,
              },
              'params': {
                  # rtmp download
@@ -107,28 +112,25 @@ class MetacafeIE(InfoExtractor):
      def report_disclaimer(self):
          self.to_screen('Retrieving disclaimer')
  
-    def _real_initialize(self):
+    def _confirm_age(self):
          # Retrieve disclaimer
          self.report_disclaimer()
          self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer')
  
          # Confirm age
-        disclaimer_form = {
-            'filters': '0',
-            'submit': "Continue - I'm over 18",
-        }
-        request = sanitized_Request(self._FILTER_POST, urlencode_postdata(disclaimer_form))
-        request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          self.report_age_confirmation()
-        self._download_webpage(request, None, False, 'Unable to confirm age')
+        self._download_webpage(
+            self._FILTER_POST, None, False, 'Unable to confirm age',
+            data=urlencode_postdata({
+                'filters': '0',
+                'submit': "Continue - I'm over 18",
+            }), headers={
+                'Content-Type': 'application/x-www-form-urlencoded',
+            })
  
      def _real_extract(self, url):
          # Extract id and simplified title from URL
-        mobj = re.match(self._VALID_URL, url)
-        if mobj is None:
-            raise ExtractorError('Invalid URL: %s' % url)
-
-        video_id = mobj.group(1)
+        video_id, display_id = re.match(self._VALID_URL, url).groups()
  
          # the video may come from an external site
          m_external = re.match('^(\w{2})-(.*)$', video_id)
@@ -141,15 +143,24 @@ class MetacafeIE(InfoExtractor):
              if prefix == 'cb':
                  return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
  
-        # Retrieve video webpage to extract further information
-        req = sanitized_Request('http://www.metacafe.com/watch/%s/' % video_id)
+        # self._confirm_age()
  
          # AnyClip videos require the flashversion cookie so that we get the link
          # to the mp4 file
-        mobj_an = re.match(r'^an-(.*?)$', video_id)
-        if mobj_an:
-            req.headers['Cookie'] = 'flashVersion=0;'
-        webpage = self._download_webpage(req, video_id)
+        headers = {}
+        if video_id.startswith('an-'):
+            headers['Cookie'] = 'flashVersion=0;'
+
+        # Retrieve video webpage to extract further information
+        webpage = self._download_webpage(url, video_id, headers=headers)
+
+        error = get_element_by_attribute(
+            'class', 'notfound-page-title', webpage)
+        if error:
+            raise ExtractorError(error, expected=True)
+
+        video_title = self._html_search_meta(
+            ['og:title', 'twitter:title'], webpage, 'title', default=None) or self._search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
  
          # Extract URL, uploader and title from webpage
          self.report_extraction(video_id)
@@ -213,20 +224,40 @@ class MetacafeIE(InfoExtractor):
                          'player_url': player_url,
                          'ext': play_path.partition(':')[0],
                      })
+        if video_url is None:
+            flashvars = self._parse_json(self._search_regex(
+                r'flashvars\s*=\s*({.*});', webpage, 'flashvars',
+                default=None), video_id, fatal=False)
+            if flashvars:
+                video_url = []
+                for source in flashvars.get('sources'):
+                    source_url = source.get('src')
+                    if not source_url:
+                        continue
+                    ext = mimetype2ext(source.get('type')) or determine_ext(source_url)
+                    if ext == 'm3u8':
+                        video_url.extend(self._extract_m3u8_formats(
+                            source_url, video_id, 'mp4',
+                            'm3u8_native', m3u8_id='hls', fatal=False))
+                    else:
+                        video_url.append({
+                            'url': source_url,
+                            'ext': ext,
+                        })
  
          if video_url is None:
              raise ExtractorError('Unsupported video type')
  
-        video_title = self._html_search_regex(
-            r'(?im)<title>(.*) - Video</title>', webpage, 'title')
-        description = self._og_search_description(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
+        description = self._html_search_meta(
+            ['og:description', 'twitter:description', 'description'],
+            webpage, 'title', fatal=False)
+        thumbnail = self._html_search_meta(
+            ['og:image', 'twitter:image'], webpage, 'title', fatal=False)
          video_uploader = self._html_search_regex(
              r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
              webpage, 'uploader nickname', fatal=False)
          duration = int_or_none(
-            self._html_search_meta('video:duration', webpage))
-
+            self._html_search_meta('video:duration', webpage, default=None))
          age_limit = (
              18
              if re.search(r'(?:"contentRating":|"rating",)"restricted"', webpage)
@@ -239,10 +270,11 @@ class MetacafeIE(InfoExtractor):
                  'url': video_url,
                  'ext': video_ext,
              }]
-
          self._sort_formats(formats)
+
          return {
              'id': video_id,
+            'display_id': display_id,
              'description': description,
              'uploader': video_uploader,
              'title': video_title,
diff --git a/youtube_dl/extractor/metacritic.py b/youtube_dl/extractor/metacritic.py

index e30320569805aedaa6694ae54f9086909593f7a4..7d468d78bab45ac4a83bd8aa531dfd67b42c6eb6 100644 (file)
--- a/youtube_dl/extractor/metacritic.py
+++ b/youtube_dl/extractor/metacritic.py
@@ -9,9 +9,9 @@ from ..utils import (
  
  
  class MetacriticIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?metacritic\.com/.+?/trailers/(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
          'info_dict': {
              'id': '3698222',
@@ -20,7 +20,17 @@ class MetacriticIE(InfoExtractor):
              'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
              'duration': 221,
          },
-    }
+        'skip': 'Not providing trailers anymore',
+    }, {
+        'url': 'http://www.metacritic.com/game/playstation-4/tales-from-the-borderlands-a-telltale-game-series/trailers/5740315',
+        'info_dict': {
+            'id': '5740315',
+            'ext': 'mp4',
+            'title': 'Tales from the Borderlands - Finale: The Vault of the Traveler',
+            'description': 'In the final episode of the season, all hell breaks loose. Jack is now in control of Helios\' systems, and he\'s ready to reclaim his rightful place as king of Hyperion (with or without you).',
+            'duration': 114,
+        },
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/mgtv.py b/youtube_dl/extractor/mgtv.py

new file mode 100644 (file)

index 0000000..e0bb5d2
--- /dev/null
+++ b/youtube_dl/extractor/mgtv.py
@@ -0,0 +1,70 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class MGTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
+    IE_DESC = '芒果TV'
+
+    _TESTS = [{
+        'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
+        'md5': '1bdadcf760a0b90946ca68ee9a2db41a',
+        'info_dict': {
+            'id': '3116640',
+            'ext': 'mp4',
+            'title': '我是歌手第四季双年巅峰会：韩红李玟“双王”领军对抗',
+            'description': '我是歌手第四季双年巅峰会',
+            'duration': 7461,
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+    }, {
+        # no tbr extracted from stream_url
+        'url': 'http://www.mgtv.com/v/1/1/f/3324755.html',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        api_data = self._download_json(
+            'http://v.api.mgtv.com/player/video', video_id,
+            query={'video_id': video_id},
+            headers=self.geo_verification_headers())['data']
+        info = api_data['info']
+
+        formats = []
+        for idx, stream in enumerate(api_data['stream']):
+            stream_url = stream.get('url')
+            if not stream_url:
+                continue
+            tbr = int_or_none(self._search_regex(
+                r'(\d+)\.mp4', stream_url, 'tbr', default=None))
+
+            def extract_format(stream_url, format_id, idx, query={}):
+                format_info = self._download_json(
+                    stream_url, video_id,
+                    note='Download video info for format %s' % (format_id or '#%d' % idx),
+                    query=query)
+                return {
+                    'format_id': format_id,
+                    'url': format_info['info'],
+                    'ext': 'mp4',
+                    'tbr': tbr,
+                }
+
+            formats.append(extract_format(
+                stream_url, 'hls-%d' % tbr if tbr else None, idx * 2))
+            formats.append(extract_format(stream_url.replace(
+                '/playlist.m3u8', ''), 'http-%d' % tbr if tbr else None, idx * 2 + 1, {'pno': 1031}))
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': info['title'].strip(),
+            'formats': formats,
+            'description': info.get('desc'),
+            'duration': int_or_none(info.get('duration')),
+            'thumbnail': info.get('thumb'),
+        }
diff --git a/youtube_dl/extractor/miaopai.py b/youtube_dl/extractor/miaopai.py

new file mode 100644 (file)

index 0000000..f9e35ac
--- /dev/null
+++ b/youtube_dl/extractor/miaopai.py
@@ -0,0 +1,40 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class MiaoPaiIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?miaopai\.com/show/(?P<id>[-A-Za-z0-9~_]+)'
+    _TEST = {
+        'url': 'http://www.miaopai.com/show/n~0hO7sfV1nBEw4Y29-Hqg__.htm',
+        'md5': '095ed3f1cd96b821add957bdc29f845b',
+        'info_dict': {
+            'id': 'n~0hO7sfV1nBEw4Y29-Hqg__',
+            'ext': 'mp4',
+            'title': '西游记音乐会的秒拍视频',
+            'thumbnail': 're:^https?://.*/n~0hO7sfV1nBEw4Y29-Hqg___m.jpg',
+        }
+    }
+
+    _USER_AGENT_IPAD = 'Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(
+            url, video_id, headers={'User-Agent': self._USER_AGENT_IPAD})
+
+        title = self._html_search_regex(
+            r'<title>([^<]+)</title>', webpage, 'title')
+        thumbnail = self._html_search_regex(
+            r'<div[^>]+class=(?P<q1>[\'"]).*\bvideo_img\b.*(?P=q1)[^>]+data-url=(?P<q2>[\'"])(?P<url>[^\'"]+)(?P=q2)',
+            webpage, 'thumbnail', fatal=False, group='url')
+        videos = self._parse_html5_media_entries(url, webpage, video_id)
+        info = videos[0]
+
+        info.update({
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+        })
+        return info
diff --git a/youtube_dl/extractor/microsoftvirtualacademy.py b/youtube_dl/extractor/microsoftvirtualacademy.py

new file mode 100644 (file)

index 0000000..8e0aee0
--- /dev/null
+++ b/youtube_dl/extractor/microsoftvirtualacademy.py
@@ -0,0 +1,195 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_xpath,
+)
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    smuggle_url,
+    unsmuggle_url,
+    xpath_text,
+)
+
+
+class MicrosoftVirtualAcademyBaseIE(InfoExtractor):
+    def _extract_base_url(self, course_id, display_id):
+        return self._download_json(
+            'https://api-mlxprod.microsoft.com/services/products/anonymous/%s' % course_id,
+            display_id, 'Downloading course base URL')
+
+    def _extract_chapter_and_title(self, title):
+        if not title:
+            return None, None
+        m = re.search(r'(?P<chapter>\d+)\s*\|\s*(?P<title>.+)', title)
+        return (int(m.group('chapter')), m.group('title')) if m else (None, title)
+
+
+class MicrosoftVirtualAcademyIE(MicrosoftVirtualAcademyBaseIE):
+    IE_NAME = 'mva'
+    IE_DESC = 'Microsoft Virtual Academy videos'
+    _VALID_URL = r'(?:%s:|https?://(?:mva\.microsoft|(?:www\.)?microsoftvirtualacademy)\.com/[^/]+/training-courses/[^/?#&]+-)(?P<course_id>\d+)(?::|\?l=)(?P<id>[\da-zA-Z]+_\d+)' % IE_NAME
+
+    _TESTS = [{
+        'url': 'https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788?l=gfVXISmEB_6804984382',
+        'md5': '7826c44fc31678b12ad8db11f6b5abb9',
+        'info_dict': {
+            'id': 'gfVXISmEB_6804984382',
+            'ext': 'mp4',
+            'title': 'Course Introduction',
+            'formats': 'mincount:3',
+            'subtitles': {
+                'en': [{
+                    'ext': 'ttml',
+                }],
+            },
+        }
+    }, {
+        'url': 'mva:11788:gfVXISmEB_6804984382',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        url, smuggled_data = unsmuggle_url(url, {})
+
+        mobj = re.match(self._VALID_URL, url)
+        course_id = mobj.group('course_id')
+        video_id = mobj.group('id')
+
+        base_url = smuggled_data.get('base_url') or self._extract_base_url(course_id, video_id)
+
+        settings = self._download_xml(
+            '%s/content/content_%s/videosettings.xml?v=1' % (base_url, video_id),
+            video_id, 'Downloading video settings XML')
+
+        _, title = self._extract_chapter_and_title(xpath_text(
+            settings, './/Title', 'title', fatal=True))
+
+        formats = []
+
+        for sources in settings.findall(compat_xpath('.//MediaSources')):
+            sources_type = sources.get('videoType')
+            for source in sources.findall(compat_xpath('./MediaSource')):
+                video_url = source.text
+                if not video_url or not video_url.startswith('http'):
+                    continue
+                if sources_type == 'smoothstreaming':
+                    formats.extend(self._extract_ism_formats(
+                        video_url, video_id, 'mss', fatal=False))
+                    continue
+                video_mode = source.get('videoMode')
+                height = int_or_none(self._search_regex(
+                    r'^(\d+)[pP]$', video_mode or '', 'height', default=None))
+                codec = source.get('codec')
+                acodec, vcodec = [None] * 2
+                if codec:
+                    codecs = codec.split(',')
+                    if len(codecs) == 2:
+                        acodec, vcodec = codecs
+                    elif len(codecs) == 1:
+                        vcodec = codecs[0]
+                formats.append({
+                    'url': video_url,
+                    'format_id': video_mode,
+                    'height': height,
+                    'acodec': acodec,
+                    'vcodec': vcodec,
+                })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for source in settings.findall(compat_xpath('.//MarkerResourceSource')):
+            subtitle_url = source.text
+            if not subtitle_url:
+                continue
+            subtitles.setdefault('en', []).append({
+                'url': '%s/%s' % (base_url, subtitle_url),
+                'ext': source.get('type'),
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'subtitles': subtitles,
+            'formats': formats
+        }
+
+
+class MicrosoftVirtualAcademyCourseIE(MicrosoftVirtualAcademyBaseIE):
+    IE_NAME = 'mva:course'
+    IE_DESC = 'Microsoft Virtual Academy courses'
+    _VALID_URL = r'(?:%s:|https?://(?:mva\.microsoft|(?:www\.)?microsoftvirtualacademy)\.com/[^/]+/training-courses/(?P<display_id>[^/?#&]+)-)(?P<id>\d+)' % IE_NAME
+
+    _TESTS = [{
+        'url': 'https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788',
+        'info_dict': {
+            'id': '11788',
+            'title': 'Microsoft Azure Fundamentals: Virtual Machines',
+        },
+        'playlist_count': 36,
+    }, {
+        # with emphasized chapters
+        'url': 'https://mva.microsoft.com/en-US/training-courses/developing-windows-10-games-with-construct-2-16335',
+        'info_dict': {
+            'id': '16335',
+            'title': 'Developing Windows 10 Games with Construct 2',
+        },
+        'playlist_count': 10,
+    }, {
+        'url': 'https://www.microsoftvirtualacademy.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788',
+        'only_matching': True,
+    }, {
+        'url': 'mva:course:11788',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if MicrosoftVirtualAcademyIE.suitable(url) else super(
+            MicrosoftVirtualAcademyCourseIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        course_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        base_url = self._extract_base_url(course_id, display_id)
+
+        manifest = self._download_json(
+            '%s/imsmanifestlite.json' % base_url,
+            display_id, 'Downloading course manifest JSON')['manifest']
+
+        organization = manifest['organizations']['organization'][0]
+
+        entries = []
+        for chapter in organization['item']:
+            chapter_number, chapter_title = self._extract_chapter_and_title(chapter.get('title'))
+            chapter_id = chapter.get('@identifier')
+            for item in chapter.get('item', []):
+                item_id = item.get('@identifier')
+                if not item_id:
+                    continue
+                metadata = item.get('resource', {}).get('metadata') or {}
+                if metadata.get('learningresourcetype') != 'Video':
+                    continue
+                _, title = self._extract_chapter_and_title(item.get('title'))
+                duration = parse_duration(metadata.get('duration'))
+                description = metadata.get('description')
+                entries.append({
+                    '_type': 'url_transparent',
+                    'url': smuggle_url(
+                        'mva:%s:%s' % (course_id, item_id), {'base_url': base_url}),
+                    'title': title,
+                    'description': description,
+                    'duration': duration,
+                    'chapter': chapter_title,
+                    'chapter_number': chapter_number,
+                    'chapter_id': chapter_id,
+                })
+
+        title = organization.get('title') or manifest.get('metadata', {}).get('title')
+
+        return self.playlist_result(entries, course_id, title)
diff --git a/youtube_dl/extractor/ministrygrid.py b/youtube_dl/extractor/ministrygrid.py

index 949ad11db2ecd0c53e5cb4c361bc43aa779cb1e6..10190d5f6e1f3f55b3274855c7614bea62b620e5 100644 (file)
--- a/youtube_dl/extractor/ministrygrid.py
+++ b/youtube_dl/extractor/ministrygrid.py
@@ -1,8 +1,5 @@
  from __future__ import unicode_literals
  
-import json
-import re
-
  from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
@@ -11,7 +8,7 @@ from ..utils import (
  
  
  class MinistryGridIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.ministrygrid.com/([^/?#]*/)*(?P<id>[^/#?]+)/?(?:$|[?#])'
+    _VALID_URL = r'https?://(?:www\.)?ministrygrid\.com/([^/?#]*/)*(?P<id>[^/#?]+)/?(?:$|[?#])'
  
      _TEST = {
          'url': 'http://www.ministrygrid.com/training-viewer/-/training/t4g-2014-conference/the-gospel-by-numbers-4/the-gospel-by-numbers',
@@ -20,21 +17,28 @@ class MinistryGridIE(InfoExtractor):
              'id': '3453494717001',
              'ext': 'mp4',
              'title': 'The Gospel by Numbers',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'upload_date': '20140410',
              'description': 'Coming soon from T4G 2014!',
-            'uploader': 'LifeWay Christian Resources (MG)',
+            'uploader_id': '2034960640001',
+            'timestamp': 1397145591,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          },
+        'add_ie': ['TDSLifeway'],
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
-        portlets_json = self._search_regex(
-            r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list')
-        portlets = json.loads(portlets_json)
+        portlets = self._parse_json(self._search_regex(
+            r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list'),
+            video_id)
          pl_id = self._search_regex(
-            r'<!--\s*p_l_id - ([0-9]+)<br>', webpage, 'p_l_id')
+            r'getPlid:function\(\){return"(\d+)"}', webpage, 'p_l_id')
  
          for i, portlet in enumerate(portlets):
              portlet_url = 'http://www.ministrygrid.com/c/portal/render_portlet?p_l_id=%s&p_p_id=%s' % (pl_id, portlet)
@@ -46,12 +50,8 @@ class MinistryGridIE(InfoExtractor):
                  r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe',
                  default=None)
              if video_iframe_url:
-                surl = smuggle_url(
-                    video_iframe_url, {'force_videoid': video_id})
-                return {
-                    '_type': 'url',
-                    'id': video_id,
-                    'url': surl,
-                }
+                return self.url_result(
+                    smuggle_url(video_iframe_url, {'force_videoid': video_id}),
+                    video_id=video_id)
  
          raise ExtractorError('Could not find video iframe in any portlets')
diff --git a/youtube_dl/extractor/miomio.py b/youtube_dl/extractor/miomio.py

index 170ebd9eb9e285f91e4b8bd85c05b13f745039a8..ec1b4c4fea111ded48f530c7020dd9aabd38dbb8 100644 (file)
--- a/youtube_dl/extractor/miomio.py
+++ b/youtube_dl/extractor/miomio.py
@@ -4,6 +4,7 @@ from __future__ import unicode_literals
  import random
  
  from .common import InfoExtractor
+from ..compat import compat_urlparse
  from ..utils import (
      xpath_text,
      int_or_none,
@@ -18,13 +19,13 @@ class MioMioIE(InfoExtractor):
      _TESTS = [{
          # "type=video" in flashvars
          'url': 'http://www.miomio.tv/watch/cc88912/',
-        'md5': '317a5f7f6b544ce8419b784ca8edae65',
          'info_dict': {
              'id': '88912',
              'ext': 'flv',
              'title': '【SKY】字幕 铠武昭和VS平成 假面骑士大战FEAT战队 魔星字幕组 字幕',
              'duration': 5923,
          },
+        'skip': 'Unable to load videos',
      }, {
          'url': 'http://www.miomio.tv/watch/cc184024/',
          'info_dict': {
@@ -32,7 +33,7 @@ class MioMioIE(InfoExtractor):
              'title': '《动漫同人插画绘制》',
          },
          'playlist_mincount': 86,
-        'skip': 'This video takes time too long for retrieving the URL',
+        'skip': 'Unable to load videos',
      }, {
          'url': 'http://www.miomio.tv/watch/cc173113/',
          'info_dict': {
@@ -40,20 +41,19 @@ class MioMioIE(InfoExtractor):
              'title': 'The New Macbook 2015 上手试玩与简评'
          },
          'playlist_mincount': 2,
+        'skip': 'Unable to load videos',
+    }, {
+        # new 'h5' player
+        'url': 'http://www.miomio.tv/watch/cc273997/',
+        'md5': '0b27a4b4495055d826813f8c3a6b2070',
+        'info_dict': {
+            'id': '273997',
+            'ext': 'mp4',
+            'title': 'マツコの知らない世界【劇的進化SP！ビニール傘＆冷凍食品2016】 1_2 - 16 05 31',
+        },
      }]
  
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._html_search_meta(
-            'description', webpage, 'title', fatal=True)
-
-        mioplayer_path = self._search_regex(
-            r'src="(/mioplayer/[^"]+)"', webpage, 'ref_path')
-
-        http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
-
+    def _extract_mioplayer(self, webpage, video_id, title, http_headers):
          xml_config = self._search_regex(
              r'flashvars="type=(?:sina|video)&amp;(.+?)&amp;',
              webpage, 'xml config')
@@ -92,10 +92,34 @@ class MioMioIE(InfoExtractor):
                  'http_headers': http_headers,
              })
  
+        return entries
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        title = self._html_search_meta(
+            'description', webpage, 'title', fatal=True)
+
+        mioplayer_path = self._search_regex(
+            r'src="(/mioplayer(?:_h5)?/[^"]+)"', webpage, 'ref_path')
+
+        if '_h5' in mioplayer_path:
+            player_url = compat_urlparse.urljoin(url, mioplayer_path)
+            player_webpage = self._download_webpage(
+                player_url, video_id,
+                note='Downloading player webpage', headers={'Referer': url})
+            entries = self._parse_html5_media_entries(player_url, player_webpage, video_id)
+            http_headers = {'Referer': player_url}
+        else:
+            http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
+            entries = self._extract_mioplayer(webpage, video_id, title, http_headers)
+
          if len(entries) == 1:
              segment = entries[0]
              segment['id'] = video_id
              segment['title'] = title
+            segment['http_headers'] = http_headers
              return segment
  
          return {
diff --git a/youtube_dl/extractor/mitele.py b/youtube_dl/extractor/mitele.py

index 7b4581dc58415f508ca0d34d61a5cd96b0b08e31..c41ab1e91a7a2dd5655fcef7230e5ceadd305648 100644 (file)
--- a/youtube_dl/extractor/mitele.py
+++ b/youtube_dl/extractor/mitele.py
@@ -1,89 +1,160 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
+import uuid
+
  from .common import InfoExtractor
  from ..compat import (
+    compat_str,
      compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
-    get_element_by_attribute,
      int_or_none,
+    extract_attributes,
+    determine_ext,
+    smuggle_url,
+    parse_duration,
  )
  
  
+class MiTeleBaseIE(InfoExtractor):
+    def _get_player_info(self, url, webpage):
+        player_data = extract_attributes(self._search_regex(
+            r'(?s)(<ms-video-player.+?</ms-video-player>)',
+            webpage, 'ms video player'))
+        video_id = player_data['data-media-id']
+        config_url = compat_urlparse.urljoin(url, player_data['data-config'])
+        config = self._download_json(
+            config_url, video_id, 'Downloading config JSON')
+        mmc_url = config['services']['mmc']
+
+        duration = None
+        formats = []
+        for m_url in (mmc_url, mmc_url.replace('/flash.json', '/html5.json')):
+            mmc = self._download_json(
+                m_url, video_id, 'Downloading mmc JSON')
+            if not duration:
+                duration = int_or_none(mmc.get('duration'))
+            for location in mmc['locations']:
+                gat = self._proto_relative_url(location.get('gat'), 'http:')
+                bas = location.get('bas')
+                loc = location.get('loc')
+                ogn = location.get('ogn')
+                if None in (gat, bas, loc, ogn):
+                    continue
+                token_data = {
+                    'bas': bas,
+                    'icd': loc,
+                    'ogn': ogn,
+                    'sta': '0',
+                }
+                media = self._download_json(
+                    '%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
+                    video_id, 'Downloading %s JSON' % location['loc'])
+                file_ = media.get('file')
+                if not file_:
+                    continue
+                ext = determine_ext(file_)
+                if ext == 'f4m':
+                    formats.extend(self._extract_f4m_formats(
+                        file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
+                        video_id, f4m_id='hds', fatal=False))
+                elif ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        file_, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'formats': formats,
+            'thumbnail': player_data.get('data-poster') or config.get('poster', {}).get('imageUrl'),
+            'duration': duration,
+        }
+
+
  class MiTeleIE(InfoExtractor):
      IE_DESC = 'mitele.es'
-    _VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/'
+    _VALID_URL = r'https?://(?:www\.)?mitele\.es/programas-tv/(?:[^/]+/)(?P<id>[^/]+)/player'
  
      _TESTS = [{
-        'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
-        'md5': '0ff1a13aebb35d9bc14081ff633dd324',
+        'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player',
          'info_dict': {
-            'id': '0NF1jJnxS1Wu3pHrmvFyw2',
-            'display_id': 'programa-144',
-            'ext': 'flv',
+            'id': '57b0dfb9c715da65618b4afa',
+            'ext': 'mp4',
              'title': 'Tor, la web invisible',
              'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
+            'series': 'Diario de',
+            'season': 'La redacción',
+            'episode': 'Programa 144',
              'thumbnail': 're:(?i)^https?://.*\.jpg$',
              'duration': 2913,
          },
+        'add_ie': ['Ooyala'],
+    }, {
+        # no explicit title
+        'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player',
+        'info_dict': {
+            'id': '57b0de3dc915da14058b4876',
+            'ext': 'mp4',
+            'title': 'Cuarto Milenio Temporada 6 Programa 226',
+            'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
+            'series': 'Cuarto Milenio',
+            'season': 'Temporada 6',
+            'episode': 'Programa 226',
+            'thumbnail': 're:(?i)^https?://.*\.jpg$',
+            'duration': 7313,
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': ['Ooyala'],
      }]
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
-
-        config_url = self._search_regex(
-            r'data-config\s*=\s*"([^"]+)"', webpage, 'data config url')
-        config_url = compat_urlparse.urljoin(url, config_url)
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
  
-        config = self._download_json(
-            config_url, display_id, 'Downloading config JSON')
-
-        mmc = self._download_json(
-            config['services']['mmc'], display_id, 'Downloading mmc JSON')
-
-        formats = []
-        for location in mmc['locations']:
-            gat = self._proto_relative_url(location.get('gat'), 'http:')
-            bas = location.get('bas')
-            loc = location.get('loc')
-            ogn = location.get('ogn')
-            if None in (gat, bas, loc, ogn):
-                continue
-            token_data = {
-                'bas': bas,
-                'icd': loc,
-                'ogn': ogn,
-                'sta': '0',
-            }
-            media = self._download_json(
-                '%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
-                display_id, 'Downloading %s JSON' % location['loc'])
-            file_ = media.get('file')
-            if not file_:
-                continue
-            formats.extend(self._extract_f4m_formats(
-                file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
-                display_id, f4m_id=loc))
-        self._sort_formats(formats)
+        gigya_url = self._search_regex(r'<gigya-api>[^>]*</gigya-api>[^>]*<script\s*src="([^"]*)">[^>]*</script>', webpage, 'gigya', default=None)
+        gigya_sc = self._download_webpage(compat_urlparse.urljoin(r'http://www.mitele.es/', gigya_url), video_id, 'Downloading gigya script')
+        # Get a appKey/uuid for getting the session key
+        appKey_var = self._search_regex(r'value\("appGridApplicationKey",([0-9a-f]+)\)', gigya_sc, 'appKey variable')
+        appKey = self._search_regex(r'var %s="([0-9a-f]+)"' % appKey_var, gigya_sc, 'appKey')
+        uid = compat_str(uuid.uuid4())
+        session_url = 'https://appgrid-api.cloud.accedo.tv/session?appKey=%s&uuid=%s' % (appKey, uid)
+        session_json = self._download_json(session_url, video_id, 'Downloading session keys')
+        sessionKey = compat_str(session_json['sessionKey'])
  
-        title = self._search_regex(
-            r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>', webpage, 'title')
+        paths_url = 'https://appgrid-api.cloud.accedo.tv/metadata/general_configuration,%20web_configuration?sessionKey=' + sessionKey
+        paths = self._download_json(paths_url, video_id, 'Downloading paths JSON')
+        ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
+        data_p = (
+            'http://' + ooyala_s['base_url'] + ooyala_s['full_path'] + ooyala_s['provider_id'] +
+            '/docs/' + video_id + '?include_titles=Series,Season&product_name=test&format=full')
+        data = self._download_json(data_p, video_id, 'Downloading data JSON')
+        source = data['hits']['hits'][0]['_source']
+        embedCode = source['offers'][0]['embed_codes'][0]
  
-        video_id = self._search_regex(
-            r'data-media-id\s*=\s*"([^"]+)"', webpage,
-            'data media id', default=None) or display_id
-        thumbnail = config.get('poster', {}).get('imageUrl')
-        duration = int_or_none(mmc.get('duration'))
+        titles = source['localizable_titles'][0]
+        title = titles.get('title_medium') or titles['title_long']
+        episode = titles['title_sort_name']
+        description = titles['summary_long']
+        titles_series = source['localizable_titles_series'][0]
+        series = titles_series['title_long']
+        titles_season = source['localizable_titles_season'][0]
+        season = titles_season['title_medium']
+        duration = parse_duration(source['videos'][0]['duration'])
  
          return {
+            '_type': 'url_transparent',
+            # for some reason only HLS is supported
+            'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8'}),
              'id': video_id,
-            'display_id': display_id,
              'title': title,
-            'description': get_element_by_attribute('class', 'text', webpage),
-            'thumbnail': thumbnail,
+            'description': description,
+            'series': series,
+            'season': season,
+            'episode': episode,
              'duration': duration,
-            'formats': formats,
+            'thumbnail': source['images'][0]['url'],
          }
diff --git a/youtube_dl/extractor/mixcloud.py b/youtube_dl/extractor/mixcloud.py

index 101497118275b7f1b5bf0564048f1dc9fc4b878b..560fe188b675a619785332eea285484fa85154bf 100644 (file)
--- a/youtube_dl/extractor/mixcloud.py
+++ b/youtube_dl/extractor/mixcloud.py
@@ -1,26 +1,35 @@
  from __future__ import unicode_literals
  
+import base64
+import functools
+import itertools
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
+from ..compat import (
+    compat_chr,
+    compat_ord,
+    compat_urllib_parse_unquote,
+    compat_urlparse,
+)
  from ..utils import (
+    clean_html,
      ExtractorError,
-    HEADRequest,
+    OnDemandPagedList,
      parse_count,
      str_to_int,
  )
  
  
  class MixcloudIE(InfoExtractor):
-    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/([^/]+)'
+    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
      IE_NAME = 'mixcloud'
  
      _TESTS = [{
          'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/',
          'info_dict': {
              'id': 'dholbach-cryptkeeper',
-            'ext': 'mp3',
+            'ext': 'm4a',
              'title': 'Cryptkeeper',
              'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
              'uploader': 'Daniel Holbach',
@@ -38,22 +47,22 @@ class MixcloudIE(InfoExtractor):
              'description': 'md5:2b8aec6adce69f9d41724647c65875e8',
              'uploader': 'Gilles Peterson Worldwide',
              'uploader_id': 'gillespeterson',
-            'thumbnail': 're:https?://.*/images/',
+            'thumbnail': 're:https?://.*',
              'view_count': int,
              'like_count': int,
          },
      }]
  
-    def _check_url(self, url, track_id, ext):
-        try:
-            # We only want to know if the request succeed
-            # don't download the whole file
-            self._request_webpage(
-                HEADRequest(url), track_id,
-                'Trying %s URL' % ext)
-            return True
-        except ExtractorError:
-            return False
+    # See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
+    @staticmethod
+    def _decrypt_play_info(play_info):
+        KEY = 'pleasedontdownloadourmusictheartistswontgetpaid'
+
+        play_info = base64.b64decode(play_info.encode('ascii'))
+
+        return ''.join([
+            compat_chr(compat_ord(ch) ^ compat_ord(KEY[idx % len(KEY)]))
+            for idx, ch in enumerate(play_info)])
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -63,14 +72,19 @@ class MixcloudIE(InfoExtractor):
  
          webpage = self._download_webpage(url, track_id)
  
-        preview_url = self._search_regex(
-            r'\s(?:data-preview-url|m-preview)="([^"]+)"', webpage, 'preview url')
-        song_url = re.sub(r'audiocdn(\d+)', r'stream\1', preview_url)
-        song_url = song_url.replace('/previews/', '/c/originals/')
-        if not self._check_url(song_url, track_id, 'mp3'):
-            song_url = song_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/')
-            if not self._check_url(song_url, track_id, 'm4a'):
-                raise ExtractorError('Unable to extract track url')
+        message = self._html_search_regex(
+            r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
+            webpage, 'error message', default=None)
+
+        encrypted_play_info = self._search_regex(
+            r'm-play-info="([^"]+)"', webpage, 'play info')
+        play_info = self._parse_json(
+            self._decrypt_play_info(encrypted_play_info), track_id)
+
+        if message and 'stream_url' not in play_info:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
+
+        song_url = play_info['stream_url']
  
          PREFIX = (
              r'm-play-on-spacebar[^>]+'
@@ -88,11 +102,11 @@ class MixcloudIE(InfoExtractor):
          description = self._og_search_description(webpage)
          like_count = parse_count(self._search_regex(
              r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)',
-            webpage, 'like count', fatal=False))
+            webpage, 'like count', default=None))
          view_count = str_to_int(self._search_regex(
              [r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
               r'/listeners/?">([0-9,.]+)</a>'],
-            webpage, 'play count', fatal=False))
+            webpage, 'play count', default=None))
  
          return {
              'id': track_id,
@@ -105,3 +119,201 @@ class MixcloudIE(InfoExtractor):
              'view_count': view_count,
              'like_count': like_count,
          }
+
+
+class MixcloudPlaylistBaseIE(InfoExtractor):
+    _PAGE_SIZE = 24
+
+    def _find_urls_in_page(self, page):
+        for url in re.findall(r'm-play-button m-url="(?P<url>[^"]+)"', page):
+            yield self.url_result(
+                compat_urlparse.urljoin('https://www.mixcloud.com', clean_html(url)),
+                MixcloudIE.ie_key())
+
+    def _fetch_tracks_page(self, path, video_id, page_name, current_page, real_page_number=None):
+        real_page_number = real_page_number or current_page + 1
+        return self._download_webpage(
+            'https://www.mixcloud.com/%s/' % path, video_id,
+            note='Download %s (page %d)' % (page_name, current_page + 1),
+            errnote='Unable to download %s' % page_name,
+            query={'page': real_page_number, 'list': 'main', '_ajax': '1'},
+            headers={'X-Requested-With': 'XMLHttpRequest'})
+
+    def _tracks_page_func(self, page, video_id, page_name, current_page):
+        resp = self._fetch_tracks_page(page, video_id, page_name, current_page)
+
+        for item in self._find_urls_in_page(resp):
+            yield item
+
+    def _get_user_description(self, page_content):
+        return self._html_search_regex(
+            r'<div[^>]+class="description-text"[^>]*>(.+?)</div>',
+            page_content, 'user description', fatal=False)
+
+
+class MixcloudUserIE(MixcloudPlaylistBaseIE):
+    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/(?P<type>uploads|favorites|listens)?/?$'
+    IE_NAME = 'mixcloud:user'
+
+    _TESTS = [{
+        'url': 'http://www.mixcloud.com/dholbach/',
+        'info_dict': {
+            'id': 'dholbach_uploads',
+            'title': 'Daniel Holbach (uploads)',
+            'description': 'md5:327af72d1efeb404a8216c27240d1370',
+        },
+        'playlist_mincount': 11,
+    }, {
+        'url': 'http://www.mixcloud.com/dholbach/uploads/',
+        'info_dict': {
+            'id': 'dholbach_uploads',
+            'title': 'Daniel Holbach (uploads)',
+            'description': 'md5:327af72d1efeb404a8216c27240d1370',
+        },
+        'playlist_mincount': 11,
+    }, {
+        'url': 'http://www.mixcloud.com/dholbach/favorites/',
+        'info_dict': {
+            'id': 'dholbach_favorites',
+            'title': 'Daniel Holbach (favorites)',
+            'description': 'md5:327af72d1efeb404a8216c27240d1370',
+        },
+        'params': {
+            'playlist_items': '1-100',
+        },
+        'playlist_mincount': 100,
+    }, {
+        'url': 'http://www.mixcloud.com/dholbach/listens/',
+        'info_dict': {
+            'id': 'dholbach_listens',
+            'title': 'Daniel Holbach (listens)',
+            'description': 'md5:327af72d1efeb404a8216c27240d1370',
+        },
+        'params': {
+            'playlist_items': '1-100',
+        },
+        'playlist_mincount': 100,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        user_id = mobj.group('user')
+        list_type = mobj.group('type')
+
+        # if only a profile URL was supplied, default to download all uploads
+        if list_type is None:
+            list_type = 'uploads'
+
+        video_id = '%s_%s' % (user_id, list_type)
+
+        profile = self._download_webpage(
+            'https://www.mixcloud.com/%s/' % user_id, video_id,
+            note='Downloading user profile',
+            errnote='Unable to download user profile')
+
+        username = self._og_search_title(profile)
+        description = self._get_user_description(profile)
+
+        entries = OnDemandPagedList(
+            functools.partial(
+                self._tracks_page_func,
+                '%s/%s' % (user_id, list_type), video_id, 'list of %s' % list_type),
+            self._PAGE_SIZE, use_cache=True)
+
+        return self.playlist_result(
+            entries, video_id, '%s (%s)' % (username, list_type), description)
+
+
+class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
+    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/playlists/(?P<playlist>[^/]+)/?$'
+    IE_NAME = 'mixcloud:playlist'
+
+    _TESTS = [{
+        'url': 'https://www.mixcloud.com/RedBullThre3style/playlists/tokyo-finalists-2015/',
+        'info_dict': {
+            'id': 'RedBullThre3style_tokyo-finalists-2015',
+            'title': 'National Champions 2015',
+            'description': 'md5:6ff5fb01ac76a31abc9b3939c16243a3',
+        },
+        'playlist_mincount': 16,
+    }, {
+        'url': 'https://www.mixcloud.com/maxvibes/playlists/jazzcat-on-ness-radio/',
+        'info_dict': {
+            'id': 'maxvibes_jazzcat-on-ness-radio',
+            'title': 'Jazzcat on Ness Radio',
+            'description': 'md5:7bbbf0d6359a0b8cda85224be0f8f263',
+        },
+        'playlist_mincount': 23
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        user_id = mobj.group('user')
+        playlist_id = mobj.group('playlist')
+        video_id = '%s_%s' % (user_id, playlist_id)
+
+        profile = self._download_webpage(
+            url, user_id,
+            note='Downloading playlist page',
+            errnote='Unable to download playlist page')
+
+        description = self._get_user_description(profile)
+        playlist_title = self._html_search_regex(
+            r'<span[^>]+class="[^"]*list-playlist-title[^"]*"[^>]*>(.*?)</span>',
+            profile, 'playlist title')
+
+        entries = OnDemandPagedList(
+            functools.partial(
+                self._tracks_page_func,
+                '%s/playlists/%s' % (user_id, playlist_id), video_id, 'tracklist'),
+            self._PAGE_SIZE)
+
+        return self.playlist_result(entries, video_id, playlist_title, description)
+
+
+class MixcloudStreamIE(MixcloudPlaylistBaseIE):
+    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/stream/?$'
+    IE_NAME = 'mixcloud:stream'
+
+    _TEST = {
+        'url': 'https://www.mixcloud.com/FirstEar/stream/',
+        'info_dict': {
+            'id': 'FirstEar',
+            'title': 'First Ear',
+            'description': 'Curators of good music\nfirstearmusic.com',
+        },
+        'playlist_mincount': 192,
+    }
+
+    def _real_extract(self, url):
+        user_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, user_id)
+
+        entries = []
+        prev_page_url = None
+
+        def _handle_page(page):
+            entries.extend(self._find_urls_in_page(page))
+            return self._search_regex(
+                r'm-next-page-url="([^"]+)"', page,
+                'next page URL', default=None)
+
+        next_page_url = _handle_page(webpage)
+
+        for idx in itertools.count(0):
+            if not next_page_url or prev_page_url == next_page_url:
+                break
+
+            prev_page_url = next_page_url
+            current_page = int(self._search_regex(
+                r'\?page=(\d+)', next_page_url, 'next page number'))
+
+            next_page_url = _handle_page(self._fetch_tracks_page(
+                '%s/stream' % user_id, user_id, 'stream', idx,
+                real_page_number=current_page))
+
+        username = self._og_search_title(webpage)
+        description = self._get_user_description(webpage)
+
+        return self.playlist_result(entries, user_id, username, description)
diff --git a/youtube_dl/extractor/moevideo.py b/youtube_dl/extractor/moevideo.py

index 978d5d5bfeaf5ff64b7279343876a3177c43339a..91ee9c4e95204718cb069fe1dc36908821b7af6d 100644 (file)
--- a/youtube_dl/extractor/moevideo.py
+++ b/youtube_dl/extractor/moevideo.py
@@ -35,7 +35,8 @@ class MoeVideoIE(InfoExtractor):
                  'height': 360,
                  'duration': 179,
                  'filesize': 17822500,
-            }
+            },
+            'skip': 'Video has been removed',
          },
          {
              'url': 'http://playreplay.net/video/77107.7f325710a627383d40540d8e991a',
diff --git a/youtube_dl/extractor/mofosex.py b/youtube_dl/extractor/mofosex.py

index e47c8011924cb0f5ecddefd33b35debd0324d5a9..e3bbe5aa8997694f62a07d8a2e0c383aa64daae1 100644 (file)
--- a/youtube_dl/extractor/mofosex.py
+++ b/youtube_dl/extractor/mofosex.py
@@ -1,53 +1,56 @@
  from __future__ import unicode_literals
  
-import os
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_unquote,
-    compat_urllib_parse_urlparse,
+from ..utils import (
+    int_or_none,
+    str_to_int,
+    unified_strdate,
  )
-from ..utils import sanitized_Request
+from .keezmovies import KeezMoviesIE
  
  
-class MofosexIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>mofosex\.com/videos/(?P<id>[0-9]+)/.*?\.html)'
-    _TEST = {
-        'url': 'http://www.mofosex.com/videos/5018/japanese-teen-music-video.html',
-        'md5': '1b2eb47ac33cc75d4a80e3026b613c5a',
+class MofosexIE(KeezMoviesIE):
+    _VALID_URL = r'https?://(?:www\.)?mofosex\.com/videos/(?P<id>\d+)/(?P<display_id>[^/?#&.]+)\.html'
+    _TESTS = [{
+        'url': 'http://www.mofosex.com/videos/318131/amateur-teen-playing-and-masturbating-318131.html',
+        'md5': '39a15853632b7b2e5679f92f69b78e91',
          'info_dict': {
-            'id': '5018',
+            'id': '318131',
+            'display_id': 'amateur-teen-playing-and-masturbating-318131',
              'ext': 'mp4',
-            'title': 'Japanese Teen Music Video',
+            'title': 'amateur teen playing and masturbating',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'upload_date': '20121114',
+            'view_count': int,
+            'like_count': int,
+            'dislike_count': int,
              'age_limit': 18,
          }
-    }
+    }, {
+        # This video is no longer available
+        'url': 'http://www.mofosex.com/videos/5018/japanese-teen-music-video.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        url = 'http://www.' + mobj.group('url')
-
-        req = sanitized_Request(url)
-        req.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(req, video_id)
-
-        video_title = self._html_search_regex(r'<h1>(.+?)<', webpage, 'title')
-        video_url = compat_urllib_parse_unquote(self._html_search_regex(r'flashvars.video_url = \'([^\']+)', webpage, 'video_url'))
-        path = compat_urllib_parse_urlparse(video_url).path
-        extension = os.path.splitext(path)[1][1:]
-        format = path.split('/')[5].split('_')[:2]
-        format = '-'.join(format)
-
-        age_limit = self._rta_search(webpage)
-
-        return {
-            'id': video_id,
-            'title': video_title,
-            'url': video_url,
-            'ext': extension,
-            'format': format,
-            'format_id': format,
-            'age_limit': age_limit,
-        }
+        webpage, info = self._extract_info(url)
+
+        view_count = str_to_int(self._search_regex(
+            r'VIEWS:</span>\s*([\d,.]+)', webpage, 'view count', fatal=False))
+        like_count = int_or_none(self._search_regex(
+            r'id=["\']amountLikes["\'][^>]*>(\d+)', webpage,
+            'like count', fatal=False))
+        dislike_count = int_or_none(self._search_regex(
+            r'id=["\']amountDislikes["\'][^>]*>(\d+)', webpage,
+            'like count', fatal=False))
+        upload_date = unified_strdate(self._html_search_regex(
+            r'Added:</span>([^<]+)', webpage, 'upload date', fatal=False))
+
+        info.update({
+            'view_count': view_count,
+            'like_count': like_count,
+            'dislike_count': dislike_count,
+            'upload_date': upload_date,
+            'thumbnail': self._og_search_thumbnail(webpage),
+        })
+
+        return info
diff --git a/youtube_dl/extractor/mooshare.py b/youtube_dl/extractor/mooshare.py

deleted file mode 100644 (file)

index a85109a..0000000
--- a/youtube_dl/extractor/mooshare.py
+++ /dev/null
@@ -1,110 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    sanitized_Request,
-    urlencode_postdata,
-)
-
-
-class MooshareIE(InfoExtractor):
-    IE_NAME = 'mooshare'
-    IE_DESC = 'Mooshare.biz'
-    _VALID_URL = r'https?://(?:www\.)?mooshare\.biz/(?P<id>[\da-z]{12})'
-
-    _TESTS = [
-        {
-            'url': 'http://mooshare.biz/8dqtk4bjbp8g',
-            'md5': '4e14f9562928aecd2e42c6f341c8feba',
-            'info_dict': {
-                'id': '8dqtk4bjbp8g',
-                'ext': 'mp4',
-                'title': 'Comedy Football 2011 - (part 1-2)',
-                'duration': 893,
-            },
-        },
-        {
-            'url': 'http://mooshare.biz/aipjtoc4g95j',
-            'info_dict': {
-                'id': 'aipjtoc4g95j',
-                'ext': 'mp4',
-                'title': 'Orange Caramel  Dashing Through the Snow',
-                'duration': 212,
-            },
-            'params': {
-                # rtmp download
-                'skip_download': True,
-            }
-        }
-    ]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        page = self._download_webpage(url, video_id, 'Downloading page')
-
-        if re.search(r'>Video Not Found or Deleted<', page) is not None:
-            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
-
-        hash_key = self._html_search_regex(r'<input type="hidden" name="hash" value="([^"]+)">', page, 'hash')
-        title = self._html_search_regex(r'(?m)<div class="blockTitle">\s*<h2>Watch ([^<]+)</h2>', page, 'title')
-
-        download_form = {
-            'op': 'download1',
-            'id': video_id,
-            'hash': hash_key,
-        }
-
-        request = sanitized_Request(
-            'http://mooshare.biz/%s' % video_id, urlencode_postdata(download_form))
-        request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-
-        self._sleep(5, video_id)
-
-        video_page = self._download_webpage(request, video_id, 'Downloading video page')
-
-        thumbnail = self._html_search_regex(r'image:\s*"([^"]+)",', video_page, 'thumbnail', fatal=False)
-        duration_str = self._html_search_regex(r'duration:\s*"(\d+)",', video_page, 'duration', fatal=False)
-        duration = int(duration_str) if duration_str is not None else None
-
-        formats = []
-
-        # SD video
-        mobj = re.search(r'(?m)file:\s*"(?P<url>[^"]+)",\s*provider:', video_page)
-        if mobj is not None:
-            formats.append({
-                'url': mobj.group('url'),
-                'format_id': 'sd',
-                'format': 'SD',
-            })
-
-        # HD video
-        mobj = re.search(r'\'hd-2\': { file: \'(?P<url>[^\']+)\' },', video_page)
-        if mobj is not None:
-            formats.append({
-                'url': mobj.group('url'),
-                'format_id': 'hd',
-                'format': 'HD',
-            })
-
-        # rtmp video
-        mobj = re.search(r'(?m)file: "(?P<playpath>[^"]+)",\s*streamer: "(?P<rtmpurl>rtmp://[^"]+)",', video_page)
-        if mobj is not None:
-            formats.append({
-                'url': mobj.group('rtmpurl'),
-                'play_path': mobj.group('playpath'),
-                'rtmp_live': False,
-                'ext': 'mp4',
-                'format_id': 'rtmp',
-                'format': 'HD',
-            })
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/motorsport.py b/youtube_dl/extractor/motorsport.py

index 370328b362c2a0661925d054be121a7216dc94c7..c9d1ab64dc36f12940ff6fee6f92524ce4ae3f2e 100644 (file)
--- a/youtube_dl/extractor/motorsport.py
+++ b/youtube_dl/extractor/motorsport.py
@@ -9,7 +9,7 @@ from ..compat import (
  
  class MotorsportIE(InfoExtractor):
      IE_DESC = 'motorsport.com'
-    _VALID_URL = r'https?://www\.motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
+    _VALID_URL = r'https?://(?:www\.)?motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
      _TEST = {
          'url': 'http://www.motorsport.com/f1/video/main-gallery/red-bull-racing-2014-rules-explained/',
          'info_dict': {
diff --git a/youtube_dl/extractor/movieclips.py b/youtube_dl/extractor/movieclips.py

index 1564cb71f6844a3e83fd958ddd0c192be835fc3a..30c206f9b61e22d3e029a68979643fc6ee7de635 100644 (file)
--- a/youtube_dl/extractor/movieclips.py
+++ b/youtube_dl/extractor/movieclips.py
@@ -2,39 +2,48 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import sanitized_Request
+from ..utils import (
+    smuggle_url,
+    float_or_none,
+    parse_iso8601,
+    update_url_query,
+)
  
  
  class MovieClipsIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www.)?movieclips\.com/videos/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?movieclips\.com/videos/.+-(?P<id>\d+)(?:\?|$)'
      _TEST = {
-        'url': 'http://www.movieclips.com/videos/warcraft-trailer-1-561180739597?autoPlay=true&playlistId=5',
+        'url': 'http://www.movieclips.com/videos/warcraft-trailer-1-561180739597',
+        'md5': '42b5a0352d4933a7bd54f2104f481244',
          'info_dict': {
              'id': 'pKIGmG83AqD9',
-            'display_id': 'warcraft-trailer-1-561180739597',
              'ext': 'mp4',
              'title': 'Warcraft Trailer 1',
              'description': 'Watch Trailer 1 from Warcraft (2016). Legendary’s WARCRAFT is a 3D epic adventure of world-colliding conflict based.',
              'thumbnail': 're:^https?://.*\.jpg$',
+            'timestamp': 1446843055,
+            'upload_date': '20151106',
+            'uploader': 'Movieclips',
          },
          'add_ie': ['ThePlatform'],
      }
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        req = sanitized_Request(url)
-        # it doesn't work if it thinks the browser it's too old
-        req.add_header('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/43.0 (Chrome)')
-        webpage = self._download_webpage(req, display_id)
-        theplatform_link = self._html_search_regex(r'src="(http://player.theplatform.com/p/.*?)"', webpage, 'theplatform link')
-        title = self._html_search_regex(r'<title[^>]*>([^>]+)-\s*\d+\s*|\s*Movieclips.com</title>', webpage, 'title')
-        description = self._html_search_meta('description', webpage)
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        video = next(v for v in self._parse_json(self._search_regex(
+            r'var\s+__REACT_ENGINE__\s*=\s*({.+});',
+            webpage, 'react engine'), video_id)['playlist']['videos'] if v['id'] == video_id)
  
          return {
              '_type': 'url_transparent',
-            'url': theplatform_link,
-            'title': title,
-            'display_id': display_id,
-            'description': description,
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(update_url_query(
+                video['contentUrl'], {'mbr': 'true'}), {'force_smil_url': True}),
+            'title': self._og_search_title(webpage),
+            'description': self._html_search_meta('description', webpage),
+            'duration': float_or_none(video.get('duration')),
+            'timestamp': parse_iso8601(video.get('dateCreated')),
+            'thumbnail': video.get('defaultImage'),
+            'uploader': video.get('provider'),
          }
diff --git a/youtube_dl/extractor/moviezine.py b/youtube_dl/extractor/moviezine.py

index f130b75c416ad3fe2e8d4ac3221799d1eb4aa1b9..478e3996743d1eca8434a786b58c4bd799a7dc55 100644 (file)
--- a/youtube_dl/extractor/moviezine.py
+++ b/youtube_dl/extractor/moviezine.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -7,7 +7,7 @@ from .common import InfoExtractor
  
  
  class MoviezineIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.moviezine\.se/video/(?P<id>[^?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?moviezine\.se/video/(?P<id>[^?#]+)'
  
      _TEST = {
          'url': 'http://www.moviezine.se/video/205866',
diff --git a/youtube_dl/extractor/movingimage.py b/youtube_dl/extractor/movingimage.py

new file mode 100644 (file)

index 0000000..bb789c3
--- /dev/null
+++ b/youtube_dl/extractor/movingimage.py
@@ -0,0 +1,52 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    unescapeHTML,
+    parse_duration,
+)
+
+
+class MovingImageIE(InfoExtractor):
+    _VALID_URL = r'https?://movingimage\.nls\.uk/film/(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://movingimage.nls.uk/film/3561',
+        'md5': '4caa05c2b38453e6f862197571a7be2f',
+        'info_dict': {
+            'id': '3561',
+            'ext': 'mp4',
+            'title': 'SHETLAND WOOL',
+            'description': 'md5:c5afca6871ad59b4271e7704fe50ab04',
+            'duration': 900,
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        formats = self._extract_m3u8_formats(
+            self._html_search_regex(r'file\s*:\s*"([^"]+)"', webpage, 'm3u8 manifest URL'),
+            video_id, ext='mp4', entry_protocol='m3u8_native')
+
+        def search_field(field_name, fatal=False):
+            return self._search_regex(
+                r'<span\s+class="field_title">%s:</span>\s*<span\s+class="field_content">([^<]+)</span>' % field_name,
+                webpage, 'title', fatal=fatal)
+
+        title = unescapeHTML(search_field('Title', fatal=True)).strip('()[]')
+        description = unescapeHTML(search_field('Description'))
+        duration = parse_duration(search_field('Running time'))
+        thumbnail = self._search_regex(
+            r"image\s*:\s*'([^']+)'", webpage, 'thumbnail', fatal=False)
+
+        return {
+            'id': video_id,
+            'formats': formats,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'thumbnail': thumbnail,
+        }
diff --git a/youtube_dl/extractor/msn.py b/youtube_dl/extractor/msn.py

new file mode 100644 (file)

index 0000000..d75ce8b
--- /dev/null
+++ b/youtube_dl/extractor/msn.py
@@ -0,0 +1,121 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    determine_ext,
+    ExtractorError,
+    int_or_none,
+    unescapeHTML,
+)
+
+
+class MSNIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
+    _TESTS = [{
+        'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/criminal-minds-shemar-moore-shares-a-touching-goodbye-message/vp-BBqQYNE',
+        'md5': '8442f66c116cbab1ff7098f986983458',
+        'info_dict': {
+            'id': 'BBqQYNE',
+            'display_id': 'criminal-minds-shemar-moore-shares-a-touching-goodbye-message',
+            'ext': 'mp4',
+            'title': 'Criminal Minds - Shemar Moore Shares A Touching Goodbye Message',
+            'description': 'md5:e8e89b897b222eb33a6b5067a8f1bc25',
+            'duration': 104,
+            'uploader': 'CBS Entertainment',
+            'uploader_id': 'IT0X5aoJ6bJgYerJXSDCgFmYPB1__54v',
+        },
+    }, {
+        'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.msn.com/en-ae/video/watch/obama-a-lot-of-people-will-be-disappointed/vi-AAhxUMH',
+        'only_matching': True,
+    }, {
+        # geo restricted
+        'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/the-first-fart-makes-you-laugh-the-last-fart-makes-you-cry/vp-AAhzIBU',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.msn.com/en-ae/entertainment/bollywood/watch-how-salman-khan-reacted-when-asked-if-he-would-apologize-for-his-‘raped-woman’-comment/vi-AAhvzW6',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id, display_id = mobj.group('id', 'display_id')
+
+        webpage = self._download_webpage(url, display_id)
+
+        video = self._parse_json(
+            self._search_regex(
+                r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1',
+                webpage, 'video data', default='{}', group='data'),
+            display_id, transform_source=unescapeHTML)
+
+        if not video:
+            error = unescapeHTML(self._search_regex(
+                r'data-error=(["\'])(?P<error>.+?)\1',
+                webpage, 'error', group='error'))
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
+
+        title = video['title']
+
+        formats = []
+        for file_ in video.get('videoFiles', []):
+            format_url = file_.get('url')
+            if not format_url:
+                continue
+            ext = determine_ext(format_url)
+            if ext == 'ism':
+                formats.extend(self._extract_ism_formats(
+                    format_url + '/Manifest', display_id, 'mss', fatal=False))
+            if 'm3u8' in format_url:
+                # m3u8_native should not be used here until
+                # https://github.com/rg3/youtube-dl/issues/9913 is fixed
+                m3u8_formats = self._extract_m3u8_formats(
+                    format_url, display_id, 'mp4',
+                    m3u8_id='hls', fatal=False)
+                # Despite metadata in m3u8 all video+audio formats are
+                # actually video-only (no audio)
+                for f in m3u8_formats:
+                    if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
+                        f['acodec'] = 'none'
+                formats.extend(m3u8_formats)
+            else:
+                formats.append({
+                    'url': format_url,
+                    'ext': 'mp4',
+                    'format_id': 'http',
+                    'width': int_or_none(file_.get('width')),
+                    'height': int_or_none(file_.get('height')),
+                })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for file_ in video.get('files', []):
+            format_url = file_.get('url')
+            format_code = file_.get('formatCode')
+            if not format_url or not format_code:
+                continue
+            if compat_str(format_code) == '3100':
+                subtitles.setdefault(file_.get('culture', 'en'), []).append({
+                    'ext': determine_ext(format_url, 'ttml'),
+                    'url': format_url,
+                })
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': video.get('description'),
+            'thumbnail': video.get('headlineImage', {}).get('url'),
+            'duration': int_or_none(video.get('durationSecs')),
+            'uploader': video.get('sourceFriendly'),
+            'uploader_id': video.get('providerId'),
+            'creator': video.get('creator'),
+            'subtitles': subtitles,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/mtv.py b/youtube_dl/extractor/mtv.py

index 640ee3d9339c48e2b3fef0ade15ee8ebcae8b292..74a3a035e771803154b6685fcd4cbfd3dbb20a9b 100644 (file)
--- a/youtube_dl/extractor/mtv.py
+++ b/youtube_dl/extractor/mtv.py
@@ -4,8 +4,8 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse_urlencode,
      compat_str,
+    compat_xpath,
  )
  from ..utils import (
      ExtractorError,
@@ -13,10 +13,13 @@ from ..utils import (
      fix_xml_ampersands,
      float_or_none,
      HEADRequest,
+    RegexNotFoundError,
      sanitized_Request,
+    strip_or_none,
+    timeconvert,
      unescapeHTML,
+    update_url_query,
      url_basename,
-    RegexNotFoundError,
      xpath_text,
  )
  
@@ -33,14 +36,19 @@ class MTVServicesInfoExtractor(InfoExtractor):
      def _id_from_uri(uri):
          return uri.split(':')[-1]
  
-    # This was originally implemented for ComedyCentral, but it also works here
      @staticmethod
-    def _transform_rtmp_url(rtmp_video_url):
+    def _remove_template_parameter(url):
+        # Remove the templates, like &device={device}
+        return re.sub(r'&[^=]*?={.*?}(?=(&|$))', '', url)
+
+    # This was originally implemented for ComedyCentral, but it also works here
+    @classmethod
+    def _transform_rtmp_url(cls, rtmp_video_url):
          m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp\..+?/.*)$', rtmp_video_url)
          if not m:
-            return rtmp_video_url
+            return {'rtmp': rtmp_video_url}
          base = 'http://viacommtvstrmfs.fplive.net/'
-        return base + m.group('finalid')
+        return {'http': base + m.group('finalid')}
  
      def _get_feed_url(self, uri):
          return self._FEED_URL
@@ -84,13 +92,14 @@ class MTVServicesInfoExtractor(InfoExtractor):
                  rtmp_video_url = rendition.find('./src').text
                  if rtmp_video_url.endswith('siteunavail.png'):
                      continue
-                formats.append({
-                    'ext': ext,
-                    'url': self._transform_rtmp_url(rtmp_video_url),
-                    'format_id': rendition.get('bitrate'),
+                new_urls = self._transform_rtmp_url(rtmp_video_url)
+                formats.extend([{
+                    'ext': 'flv' if new_url.startswith('rtmp') else ext,
+                    'url': new_url,
+                    'format_id': '-'.join(filter(None, [kind, rendition.get('bitrate')])),
                      'width': int(rendition.get('width')),
                      'height': int(rendition.get('height')),
-                })
+                } for kind, new_url in new_urls.items()])
              except (KeyError, TypeError):
                  raise ExtractorError('Invalid rendition field.')
          self._sort_formats(formats)
@@ -113,9 +122,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
          video_id = self._id_from_uri(uri)
          self.report_extraction(video_id)
          content_el = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content')))
-        mediagen_url = content_el.attrib['url']
-        # Remove the templates, like &device={device}
-        mediagen_url = re.sub(r'&[^=]*?={.*?}(?=(&|$))', '', mediagen_url)
+        mediagen_url = self._remove_template_parameter(content_el.attrib['url'])
          if 'acceptMethods' not in mediagen_url:
              mediagen_url += '&' if '?' in mediagen_url else '?'
              mediagen_url += 'acceptMethods=fms'
@@ -131,7 +138,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
              message += item.text
              raise ExtractorError(message, expected=True)
  
-        description = xpath_text(itemdoc, 'description')
+        description = strip_or_none(xpath_text(itemdoc, 'description'))
+
+        timestamp = timeconvert(xpath_text(itemdoc, 'pubDate'))
  
          title_el = None
          if title_el is None:
@@ -139,9 +148,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
                  itemdoc, './/{http://search.yahoo.com/mrss/}category',
                  'scheme', 'urn:mtvn:video_title')
          if title_el is None:
-            title_el = itemdoc.find('.//{http://search.yahoo.com/mrss/}title')
+            title_el = itemdoc.find(compat_xpath('.//{http://search.yahoo.com/mrss/}title'))
          if title_el is None:
-            title_el = itemdoc.find('.//title') or itemdoc.find('./title')
+            title_el = itemdoc.find(compat_xpath('.//title'))
              if title_el.text is None:
                  title_el = None
  
@@ -165,26 +174,32 @@ class MTVServicesInfoExtractor(InfoExtractor):
              'thumbnail': self._get_thumbnail_url(uri, itemdoc),
              'description': description,
              'duration': float_or_none(content_el.attrib.get('duration')),
+            'timestamp': timestamp,
          }
  
      def _get_feed_query(self, uri):
          data = {'uri': uri}
          if self._LANG:
              data['lang'] = self._LANG
-        return compat_urllib_parse_urlencode(data)
+        return data
  
      def _get_videos_info(self, uri):
          video_id = self._id_from_uri(uri)
          feed_url = self._get_feed_url(uri)
-        info_url = feed_url + '?' + self._get_feed_query(uri)
+        info_url = update_url_query(feed_url, self._get_feed_query(uri))
          return self._get_videos_info_from_url(info_url, video_id)
  
      def _get_videos_info_from_url(self, url, video_id):
          idoc = self._download_xml(
              url, video_id,
              'Downloading info', transform_source=fix_xml_ampersands)
+
+        title = xpath_text(idoc, './channel/title')
+        description = xpath_text(idoc, './channel/description')
+
          return self.playlist_result(
-            [self._get_video_info(item) for item in idoc.findall('.//item')])
+            [self._get_video_info(item) for item in idoc.findall('.//item')],
+            playlist_title=title, playlist_description=description)
  
      def _extract_mgid(self, webpage):
          try:
@@ -230,6 +245,8 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
              'ext': 'mp4',
              'title': 'Peter Dinklage Sums Up \'Game Of Thrones\' In 45 Seconds',
              'description': '"Sexy sexy sexy, stabby stabby stabby, beautiful language," says Peter Dinklage as he tries summarizing "Game of Thrones" in under a minute.',
+            'timestamp': 1400126400,
+            'upload_date': '20140515',
          },
      }
  
@@ -242,13 +259,9 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
  
      def _get_feed_url(self, uri):
          video_id = self._id_from_uri(uri)
-        site_id = uri.replace(video_id, '')
-        config_url = ('http://media.mtvnservices.com/pmt/e1/players/{0}/'
-                      'context4/context5/config.xml'.format(site_id))
-        config_doc = self._download_xml(config_url, video_id)
-        feed_node = config_doc.find('.//feed')
-        feed_url = feed_node.text.strip().split('?')[0]
-        return feed_url
+        config = self._download_json(
+            'http://media.mtvnservices.com/pmt/e1/access/index.html?uri=%s&configtype=edge' % uri, video_id)
+        return self._remove_template_parameter(config['feedWithQueryParams'])
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -257,6 +270,29 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
  
  
  class MTVIE(MTVServicesInfoExtractor):
+    IE_NAME = 'mtv'
+    _VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|full-episodes)/(?P<id>[^/?#.]+)'
+    _FEED_URL = 'http://www.mtv.com/feeds/mrss/'
+
+    _TESTS = [{
+        'url': 'http://www.mtv.com/video-clips/vl8qof/unlocking-the-truth-trailer',
+        'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
+        'info_dict': {
+            'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
+            'ext': 'mp4',
+            'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
+            'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
+            'timestamp': 1468846800,
+            'upload_date': '20160718',
+        },
+    }, {
+        'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101',
+        'only_matching': True,
+    }]
+
+
+class MTVVideoIE(MTVServicesInfoExtractor):
+    IE_NAME = 'mtv:video'
      _VALID_URL = r'''(?x)^https?://
          (?:(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$|
             m\.mtv\.com/videos/video\.rbml\?.*?id=(?P<mgid>[^&]+))'''
@@ -272,6 +308,8 @@ class MTVIE(MTVServicesInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Taylor Swift - "Ours (VH1 Storytellers)"',
                  'description': 'Album: Taylor Swift performs "Ours" for VH1 Storytellers at Harvey Mudd College.',
+                'timestamp': 1352610000,
+                'upload_date': '20121111',
              },
          },
      ]
@@ -298,20 +336,6 @@ class MTVIE(MTVServicesInfoExtractor):
          return self._get_videos_info(uri)
  
  
-class MTVIggyIE(MTVServicesInfoExtractor):
-    IE_NAME = 'mtviggy.com'
-    _VALID_URL = r'https?://www\.mtviggy\.com/videos/.+'
-    _TEST = {
-        'url': 'http://www.mtviggy.com/videos/arcade-fire-behind-the-scenes-at-the-biggest-music-experiment-yet/',
-        'info_dict': {
-            'id': '984696',
-            'ext': 'mp4',
-            'title': 'Arcade Fire: Behind the Scenes at the Biggest Music Experiment Yet',
-        }
-    }
-    _FEED_URL = 'http://all.mtvworldverticals.com/feed-xml/'
-
-
  class MTVDEIE(MTVServicesInfoExtractor):
      IE_NAME = 'mtv.de'
      _VALID_URL = r'https?://(?:www\.)?mtv\.de/(?:artists|shows|news)/(?:[^/]+/)*(?P<id>\d+)-[^/#?]+/*(?:[#?].*)?$'
@@ -319,7 +343,7 @@ class MTVDEIE(MTVServicesInfoExtractor):
          'url': 'http://www.mtv.de/artists/10571-cro/videos/61131-traum',
          'info_dict': {
              'id': 'music_video-a50bc5f0b3aa4b3190aa',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'MusicVideo_cro-traum',
              'description': 'Cro - Traum',
          },
@@ -327,20 +351,21 @@ class MTVDEIE(MTVServicesInfoExtractor):
              # rtmp download
              'skip_download': True,
          },
+        'skip': 'Blocked at Travis CI',
      }, {
          # mediagen URL without query (e.g. http://videos.mtvnn.com/mediagen/e865da714c166d18d6f80893195fcb97)
          'url': 'http://www.mtv.de/shows/933-teen-mom-2/staffeln/5353/folgen/63565-enthullungen',
          'info_dict': {
              'id': 'local_playlist-f5ae778b9832cc837189',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'Episode_teen-mom-2_shows_season-5_episode-1_full-episode_part1',
          },
          'params': {
              # rtmp download
              'skip_download': True,
          },
+        'skip': 'Blocked at Travis CI',
      }, {
-        # single video in pagePlaylist with different id
          'url': 'http://www.mtv.de/news/77491-mtv-movies-spotlight-pixels-teil-3',
          'info_dict': {
              'id': 'local_playlist-4e760566473c4c8c5344',
@@ -352,6 +377,7 @@ class MTVDEIE(MTVServicesInfoExtractor):
              # rtmp download
              'skip_download': True,
          },
+        'skip': 'Das Video kann zur Zeit nicht abgespielt werden.',
      }]
  
      def _real_extract(self, url):
@@ -364,11 +390,14 @@ class MTVDEIE(MTVServicesInfoExtractor):
                  r'window\.pagePlaylist\s*=\s*(\[.+?\]);\n', webpage, 'page playlist'),
              video_id)
  
+        def _mrss_url(item):
+            return item['mrss'] + item.get('mrssvars', '')
+
          # news pages contain single video in playlist with different id
          if len(playlist) == 1:
-            return self._get_videos_info_from_url(playlist[0]['mrss'], video_id)
+            return self._get_videos_info_from_url(_mrss_url(playlist[0]), video_id)
  
          for item in playlist:
              item_id = item.get('id')
              if item_id and compat_str(item_id) == video_id:
-                return self._get_videos_info_from_url(item['mrss'], video_id)
+                return self._get_videos_info_from_url(_mrss_url(item), video_id)
diff --git a/youtube_dl/extractor/muenchentv.py b/youtube_dl/extractor/muenchentv.py

index b4e8ad17e9003940e5753349e30b4857c4c8aa6a..d9f17613633d245283f5f5745acca2feb273cbf5 100644 (file)
--- a/youtube_dl/extractor/muenchentv.py
+++ b/youtube_dl/extractor/muenchentv.py
@@ -36,7 +36,7 @@ class MuenchenTVIE(InfoExtractor):
          title = self._live_title(self._og_search_title(webpage))
  
          data_js = self._search_regex(
-            r'(?s)\nplaylist:\s*(\[.*?}\]),related:',
+            r'(?s)\nplaylist:\s*(\[.*?}\]),',
              webpage, 'playlist configuration')
          data_json = js_to_json(data_js)
          data = json.loads(data_json)[0]
diff --git a/youtube_dl/extractor/musicplayon.py b/youtube_dl/extractor/musicplayon.py

index 50d92b50ae5ec2fa49e45cc64aea7f08cc21ccea..1854d59a5307a5b22f2efdda08a2b6c944aa8c50 100644 (file)
--- a/youtube_dl/extractor/musicplayon.py
+++ b/youtube_dl/extractor/musicplayon.py
@@ -1,17 +1,21 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..compat import compat_urlparse
+from ..utils import (
+    int_or_none,
+    js_to_json,
+    mimetype2ext,
+)
  
  
  class MusicPlayOnIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=100&play)=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=\d+&play)=(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://en.musicplayon.com/play?v=433377',
+        'md5': '00cdcdea1726abdf500d1e7fd6dd59bb',
          'info_dict': {
              'id': '433377',
              'ext': 'mp4',
@@ -20,15 +24,16 @@ class MusicPlayOnIE(InfoExtractor):
              'duration': 342,
              'uploader': 'ultrafish',
          },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        },
-    }
+    }, {
+        'url': 'http://en.musicplayon.com/play?pl=102&play=442629',
+        'only_matching': True,
+    }]
+
+    _URL_TEMPLATE = 'http://en.musicplayon.com/play?v=%s'
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
+        url = self._URL_TEMPLATE % video_id
  
          page = self._download_webpage(url, video_id)
  
@@ -40,28 +45,14 @@ class MusicPlayOnIE(InfoExtractor):
          uploader = self._html_search_regex(
              r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False)
  
-        formats = [
-            {
-                'url': 'http://media0-eu-nl.musicplayon.com/stream-mobile?id=%s&type=.mp4' % video_id,
-                'ext': 'mp4',
-            }
-        ]
-
-        manifest = self._download_webpage(
-            'http://en.musicplayon.com/manifest.m3u8?v=%s' % video_id, video_id, 'Downloading manifest')
-
-        for entry in manifest.split('#')[1:]:
-            if entry.startswith('EXT-X-STREAM-INF:'):
-                meta, url, _ = entry.split('\n')
-                params = dict(param.split('=') for param in meta.split(',')[1:])
-                formats.append({
-                    'url': url,
-                    'ext': 'mp4',
-                    'tbr': int(params['BANDWIDTH']),
-                    'width': int(params['RESOLUTION'].split('x')[1]),
-                    'height': int(params['RESOLUTION'].split('x')[-1]),
-                    'format_note': params['NAME'].replace('"', '').strip(),
-                })
+        sources = self._parse_json(
+            self._search_regex(r'setup\[\'_sources\'\]\s*=\s*([^;]+);', page, 'video sources'),
+            video_id, transform_source=js_to_json)
+        formats = [{
+            'url': compat_urlparse.urljoin(url, source['src']),
+            'ext': mimetype2ext(source.get('type')),
+            'format_note': source.get('data-res'),
+        } for source in sources]
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/muzu.py b/youtube_dl/extractor/muzu.py

deleted file mode 100644 (file)

index cbc8004..0000000
--- a/youtube_dl/extractor/muzu.py
+++ /dev/null
@@ -1,63 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..compat import compat_urllib_parse_urlencode
-
-
-class MuzuTVIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.muzu\.tv/(.+?)/(.+?)/(?P<id>\d+)'
-    IE_NAME = 'muzu.tv'
-
-    _TEST = {
-        'url': 'http://www.muzu.tv/defected/marcashken-featuring-sos-cat-walk-original-mix-music-video/1981454/',
-        'md5': '98f8b2c7bc50578d6a0364fff2bfb000',
-        'info_dict': {
-            'id': '1981454',
-            'ext': 'mp4',
-            'title': 'Cat Walk (Original Mix)',
-            'description': 'md5:90e868994de201b2570e4e5854e19420',
-            'uploader': 'MarcAshken featuring SOS',
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        info_data = compat_urllib_parse_urlencode({
-            'format': 'json',
-            'url': url,
-        })
-        info = self._download_json(
-            'http://www.muzu.tv/api/oembed/?%s' % info_data,
-            video_id, 'Downloading video info')
-
-        player_info = self._download_json(
-            'http://player.muzu.tv/player/playerInit?ai=%s' % video_id,
-            video_id, 'Downloading player info')
-        video_info = player_info['videos'][0]
-        for quality in ['1080', '720', '480', '360']:
-            if video_info.get('v%s' % quality):
-                break
-
-        data = compat_urllib_parse_urlencode({
-            'ai': video_id,
-            # Even if each time you watch a video the hash changes,
-            # it seems to work for different videos, and it will work
-            # even if you use any non empty string as a hash
-            'viewhash': 'VBNff6djeV4HV5TRPW5kOHub2k',
-            'device': 'web',
-            'qv': quality,
-        })
-        video_url_info = self._download_json(
-            'http://player.muzu.tv/player/requestVideo?%s' % data,
-            video_id, 'Downloading video url')
-        video_url = video_url_info['url']
-
-        return {
-            'id': video_id,
-            'title': info['title'],
-            'url': video_url,
-            'thumbnail': info['thumbnail_url'],
-            'description': info['description'],
-            'uploader': info['author_name'],
-        }
diff --git a/youtube_dl/extractor/mwave.py b/youtube_dl/extractor/mwave.py

index 66b5231979ce8399c38074a9a14a2b4ea2114ee8..fea1caf478b2a862ae3a028b4a80041b734a5e1b 100644 (file)
--- a/youtube_dl/extractor/mwave.py
+++ b/youtube_dl/extractor/mwave.py
@@ -9,10 +9,11 @@ from ..utils import (
  
  
  class MwaveIE(InfoExtractor):
-    _VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
-    _TEST = {
+    _VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
+    _URL_TEMPLATE = 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=%s'
+    _TESTS = [{
          'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
-        'md5': 'c930e27b7720aaa3c9d0018dfc8ff6cc',
+        # md5 is unstable
          'info_dict': {
              'id': '168859',
              'ext': 'flv',
@@ -22,7 +23,10 @@ class MwaveIE(InfoExtractor):
              'duration': 206,
              'view_count': int,
          }
-    }
+    }, {
+        'url': 'http://mwave.interest.me/en/mnettv/videodetail.m?searchVideoDetailVO.clip_id=176199',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -56,3 +60,31 @@ class MwaveIE(InfoExtractor):
              'view_count': int_or_none(vod_info.get('hit')),
              'formats': formats,
          }
+
+
+class MwaveMeetGreetIE(InfoExtractor):
+    _VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?meetgreet/view/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://mwave.interest.me/meetgreet/view/256',
+        'info_dict': {
+            'id': '173294',
+            'ext': 'flv',
+            'title': '[MEET&GREET] Park BoRam',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'Mwave',
+            'duration': 3634,
+            'view_count': int,
+        }
+    }, {
+        'url': 'http://mwave.interest.me/en/meetgreet/view/256',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        clip_id = self._html_search_regex(
+            r'<iframe[^>]+src="/mnettv/ifr_clip\.m\?searchVideoDetailVO\.clip_id=(\d+)',
+            webpage, 'clip ID')
+        clip_url = MwaveIE._URL_TEMPLATE % clip_id
+        return self.url_result(clip_url, 'Mwave', clip_id)
diff --git a/youtube_dl/extractor/myspace.py b/youtube_dl/extractor/myspace.py

index 83414a2325586d7319c06247fa037c42bb2b199a..ab32e632e34375561980f168834443754f606383 100644 (file)
--- a/youtube_dl/extractor/myspace.py
+++ b/youtube_dl/extractor/myspace.py
@@ -1,14 +1,14 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    parse_iso8601,
  )
-from ..utils import ExtractorError
  
  
  class MySpaceIE(InfoExtractor):
@@ -24,6 +24,8 @@ class MySpaceIE(InfoExtractor):
                  'description': 'This country quartet was all smiles while playing a sold out show at the Pacific Amphitheatre in Orange County, California.',
                  'uploader': 'Five Minutes to the Stage',
                  'uploader_id': 'fiveminutestothestage',
+                'timestamp': 1414108751,
+                'upload_date': '20141023',
              },
              'params': {
                  # rtmp download
@@ -64,7 +66,7 @@ class MySpaceIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Starset - First Light',
                  'description': 'md5:2d5db6c9d11d527683bcda818d332414',
-                'uploader': 'Jacob Soren',
+                'uploader': 'Yumi K',
                  'uploader_id': 'SorenPromotions',
                  'upload_date': '20140725',
              }
@@ -78,6 +80,19 @@ class MySpaceIE(InfoExtractor):
          player_url = self._search_regex(
              r'playerSwf":"([^"?]*)', webpage, 'player URL')
  
+        def rtmp_format_from_stream_url(stream_url, width=None, height=None):
+            rtmp_url, play_path = stream_url.split(';', 1)
+            return {
+                'format_id': 'rtmp',
+                'url': rtmp_url,
+                'play_path': play_path,
+                'player_url': player_url,
+                'protocol': 'rtmp',
+                'ext': 'flv',
+                'width': width,
+                'height': height,
+            }
+
          if mobj.group('mediatype').startswith('music/song'):
              # songs don't store any useful info in the 'context' variable
              song_data = self._search_regex(
@@ -93,8 +108,8 @@ class MySpaceIE(InfoExtractor):
                  return self._search_regex(
                      r'''data-%s=([\'"])(?P<data>.*?)\1''' % name,
                      song_data, name, default='', group='data')
-            streamUrl = search_data('stream-url')
-            if not streamUrl:
+            stream_url = search_data('stream-url')
+            if not stream_url:
                  vevo_id = search_data('vevo-id')
                  youtube_id = search_data('youtube-id')
                  if vevo_id:
@@ -106,36 +121,47 @@ class MySpaceIE(InfoExtractor):
                  else:
                      raise ExtractorError(
                          'Found song but don\'t know how to download it')
-            info = {
+            return {
                  'id': video_id,
                  'title': self._og_search_title(webpage),
                  'uploader': search_data('artist-name'),
                  'uploader_id': search_data('artist-username'),
                  'thumbnail': self._og_search_thumbnail(webpage),
+                'duration': int_or_none(search_data('duration')),
+                'formats': [rtmp_format_from_stream_url(stream_url)]
              }
          else:
-            context = json.loads(self._search_regex(
-                r'context = ({.*?});', webpage, 'context'))
-            video = context['video']
-            streamUrl = video['streamUrl']
-            info = {
-                'id': compat_str(video['mediaId']),
+            video = self._parse_json(self._search_regex(
+                r'context = ({.*?});', webpage, 'context'),
+                video_id)['video']
+            formats = []
+            hls_stream_url = video.get('hlsStreamUrl')
+            if hls_stream_url:
+                formats.append({
+                    'format_id': 'hls',
+                    'url': hls_stream_url,
+                    'protocol': 'm3u8_native',
+                    'ext': 'mp4',
+                })
+            stream_url = video.get('streamUrl')
+            if stream_url:
+                formats.append(rtmp_format_from_stream_url(
+                    stream_url,
+                    int_or_none(video.get('width')),
+                    int_or_none(video.get('height'))))
+            self._sort_formats(formats)
+            return {
+                'id': video_id,
                  'title': video['title'],
-                'description': video['description'],
-                'thumbnail': video['imageUrl'],
-                'uploader': video['artistName'],
-                'uploader_id': video['artistUsername'],
+                'description': video.get('description'),
+                'thumbnail': video.get('imageUrl'),
+                'uploader': video.get('artistName'),
+                'uploader_id': video.get('artistUsername'),
+                'duration': int_or_none(video.get('duration')),
+                'timestamp': parse_iso8601(video.get('dateAdded')),
+                'formats': formats,
              }
  
-        rtmp_url, play_path = streamUrl.split(';', 1)
-        info.update({
-            'url': rtmp_url,
-            'play_path': play_path,
-            'player_url': player_url,
-            'ext': 'flv',
-        })
-        return info
-
  
  class MySpaceAlbumIE(InfoExtractor):
      IE_NAME = 'MySpace:album'
diff --git a/youtube_dl/extractor/myspass.py b/youtube_dl/extractor/myspass.py

index 1ca7b1a9e958c221f44c48bced04c314c0957f8c..2afe535b5de0804927f2798850572caf9267b044 100644 (file)
--- a/youtube_dl/extractor/myspass.py
+++ b/youtube_dl/extractor/myspass.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class MySpassIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.myspass\.de/.*'
+    _VALID_URL = r'https?://(?:www\.)?myspass\.de/.*'
      _TEST = {
          'url': 'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
          'md5': '0b49f4844a068f8b33f4b7c88405862b',
diff --git a/youtube_dl/extractor/myvidster.py b/youtube_dl/extractor/myvidster.py

index 731c245428103b3ea96f5c396b063afadac82702..2117d302d6493e995a522a2726d312d46e76bda2 100644 (file)
--- a/youtube_dl/extractor/myvidster.py
+++ b/youtube_dl/extractor/myvidster.py
@@ -13,7 +13,7 @@ class MyVidsterIE(InfoExtractor):
              'id': '3685814',
              'title': 'md5:7d8427d6d02c4fbcef50fe269980c749',
              'upload_date': '20141027',
-            'uploader_id': 'utkualp',
+            'uploader': 'utkualp',
              'ext': 'mp4',
              'age_limit': 18,
          },
diff --git a/youtube_dl/extractor/nationalgeographic.py b/youtube_dl/extractor/nationalgeographic.py

index d5e53365cc52d93da99953a774871e862ca3cc2a..b91d865286e47affdc66c138dde9507963d62733 100644 (file)
--- a/youtube_dl/extractor/nationalgeographic.py
+++ b/youtube_dl/extractor/nationalgeographic.py
@@ -1,33 +1,48 @@
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
+from .adobepass import AdobePassIE
+from .theplatform import ThePlatformIE
  from ..utils import (
      smuggle_url,
      url_basename,
+    update_url_query,
+    get_element_by_class,
  )
  
  
-class NationalGeographicIE(InfoExtractor):
+class NationalGeographicVideoIE(InfoExtractor):
+    IE_NAME = 'natgeo:video'
      _VALID_URL = r'https?://video\.nationalgeographic\.com/.*?'
  
      _TESTS = [
          {
              'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo',
+            'md5': '730855d559abbad6b42c2be1fa584917',
              'info_dict': {
-                'id': '4DmDACA6Qtk_',
-                'ext': 'flv',
+                'id': '0000014b-70a1-dd8c-af7f-f7b559330001',
+                'ext': 'mp4',
                  'title': 'Mating Crabs Busted by Sharks',
                  'description': 'md5:16f25aeffdeba55aaa8ec37e093ad8b3',
+                'timestamp': 1423523799,
+                'upload_date': '20150209',
+                'uploader': 'NAGS',
              },
              'add_ie': ['ThePlatform'],
          },
          {
              'url': 'http://video.nationalgeographic.com/wild/when-sharks-attack/the-real-jaws',
+            'md5': '6a3105eb448c070503b3105fb9b320b5',
              'info_dict': {
-                'id': '_JeBD_D7PlS5',
-                'ext': 'flv',
+                'id': 'ngc-I0IauNSWznb_UV008GxSbwY35BZvgi2e',
+                'ext': 'mp4',
                  'title': 'The Real Jaws',
                  'description': 'md5:8d3e09d9d53a85cd397b4b21b2c77be6',
+                'timestamp': 1433772632,
+                'upload_date': '20150608',
+                'uploader': 'NAGS',
              },
              'add_ie': ['ThePlatform'],
          },
@@ -37,18 +52,132 @@ class NationalGeographicIE(InfoExtractor):
          name = url_basename(url)
  
          webpage = self._download_webpage(url, name)
-        feed_url = self._search_regex(
-            r'data-feed-url="([^"]+)"', webpage, 'feed url')
          guid = self._search_regex(
              r'id="(?:videoPlayer|player-container)"[^>]+data-guid="([^"]+)"',
              webpage, 'guid')
  
-        feed = self._download_xml('%s?byGuid=%s' % (feed_url, guid), name)
-        content = feed.find('.//{http://search.yahoo.com/mrss/}content')
-        theplatform_id = url_basename(content.attrib.get('url'))
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(
+                'http://link.theplatform.com/s/ngs/media/guid/2423130747/%s?mbr=true' % guid,
+                {'force_smil_url': True}),
+            'id': guid,
+        }
+
+
+class NationalGeographicIE(ThePlatformIE, AdobePassIE):
+    IE_NAME = 'natgeo'
+    _VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/(?:videos|episodes)/(?P<id>[^/?]+)'
+
+    _TESTS = [
+        {
+            'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
+            'md5': '518c9aa655686cf81493af5cc21e2a04',
+            'info_dict': {
+                'id': 'vKInpacll2pC',
+                'ext': 'mp4',
+                'title': 'Uncovering a Universal Knowledge',
+                'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a',
+                'timestamp': 1458680907,
+                'upload_date': '20160322',
+                'uploader': 'NEWA-FNG-NGTV',
+            },
+            'add_ie': ['ThePlatform'],
+        },
+        {
+            'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
+            'md5': 'c4912f656b4cbe58f3e000c489360989',
+            'info_dict': {
+                'id': 'Pok5lWCkiEFA',
+                'ext': 'mp4',
+                'title': 'The Stunning Red Bird of Paradise',
+                'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c',
+                'timestamp': 1459362152,
+                'upload_date': '20160330',
+                'uploader': 'NEWA-FNG-NGTV',
+            },
+            'add_ie': ['ThePlatform'],
+        },
+        {
+            'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/',
+            'only_matching': True,
+        }
+    ]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        release_url = self._search_regex(
+            r'video_auth_playlist_url\s*=\s*"([^"]+)"',
+            webpage, 'release url')
+        theplatform_path = self._search_regex(r'https?://link.theplatform.com/s/([^?]+)', release_url, 'theplatform path')
+        video_id = theplatform_path.split('/')[-1]
+        query = {
+            'mbr': 'true',
+        }
+        is_auth = self._search_regex(r'video_is_auth\s*=\s*"([^"]+)"', webpage, 'is auth', fatal=False)
+        if is_auth == 'auth':
+            auth_resource_id = self._search_regex(
+                r"video_auth_resourceId\s*=\s*'([^']+)'",
+                webpage, 'auth resource id')
+            query['auth'] = self._extract_mvpd_auth(url, video_id, 'natgeo', auth_resource_id)
+
+        formats = []
+        subtitles = {}
+        for key, value in (('switch', 'http'), ('manifest', 'm3u')):
+            tp_query = query.copy()
+            tp_query.update({
+                key: value,
+            })
+            tp_formats, tp_subtitles = self._extract_theplatform_smil(
+                update_url_query(release_url, tp_query), video_id, 'Downloading %s SMIL data' % value)
+            formats.extend(tp_formats)
+            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
+        self._sort_formats(formats)
+
+        info = self._extract_theplatform_metadata(theplatform_path, display_id)
+        info.update({
+            'id': video_id,
+            'formats': formats,
+            'subtitles': subtitles,
+            'display_id': display_id,
+        })
+        return info
+
  
-        return self.url_result(smuggle_url(
-            'http://link.theplatform.com/s/ngs/%s?formats=MPEG4&manifest=f4m' % theplatform_id,
-            # For some reason, the normal links don't work and we must force
-            # the use of f4m
-            {'force_smil_url': True}))
+class NationalGeographicEpisodeGuideIE(InfoExtractor):
+    IE_NAME = 'natgeo:episodeguide'
+    _VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?(?P<id>[^/]+)/episode-guide'
+    _TESTS = [
+        {
+            'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episode-guide/',
+            'info_dict': {
+                'id': 'the-story-of-god-with-morgan-freeman-season-1',
+                'title': 'The Story of God with Morgan Freeman - Season 1',
+            },
+            'playlist_mincount': 6,
+        },
+        {
+            'url': 'http://channel.nationalgeographic.com/underworld-inc/episode-guide/?s=2',
+            'info_dict': {
+                'id': 'underworld-inc-season-2',
+                'title': 'Underworld, Inc. - Season 2',
+            },
+            'playlist_mincount': 7,
+        },
+    ]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        show = get_element_by_class('show', webpage)
+        selected_season = self._search_regex(
+            r'<div[^>]+class="select-seasons[^"]*".*?<a[^>]*>(.*?)</a>',
+            webpage, 'selected season')
+        entries = [
+            self.url_result(self._proto_relative_url(entry_url), 'NationalGeographic')
+            for entry_url in re.findall('(?s)<div[^>]+class="col-inner"[^>]*?>.*?<a[^>]+href="([^"]+)"', webpage)]
+        return self.playlist_result(
+            entries, '%s-%s' % (display_id, selected_season.lower().replace(' ', '-')),
+            '%s - %s' % (show, selected_season))
diff --git a/youtube_dl/extractor/naver.py b/youtube_dl/extractor/naver.py

index 6d6f69b440a4b91d95c42210b2e597aca99144f6..055070ff54fd8990c2e58ab1d6df037b19f3a029 100644 (file)
--- a/youtube_dl/extractor/naver.py
+++ b/youtube_dl/extractor/naver.py
@@ -1,15 +1,13 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlencode,
-    compat_urlparse,
-)
  from ..utils import (
      ExtractorError,
+    int_or_none,
+    update_url_query,
  )
  
  
@@ -51,48 +49,74 @@ class NaverIE(InfoExtractor):
              if error:
                  raise ExtractorError(error, expected=True)
              raise ExtractorError('couldn\'t extract vid and key')
-        vid = m_id.group(1)
-        key = m_id.group(2)
-        query = compat_urllib_parse_urlencode({'vid': vid, 'inKey': key, })
-        query_urls = compat_urllib_parse_urlencode({
-            'masterVid': vid,
-            'protocol': 'p2p',
-            'inKey': key,
-        })
-        info = self._download_xml(
-            'http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?' + query,
-            video_id, 'Downloading video info')
-        urls = self._download_xml(
-            'http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?' + query_urls,
-            video_id, 'Downloading video formats info')
-
+        video_data = self._download_json(
+            'http://play.rmcnmv.naver.com/vod/play/v2.0/' + m_id.group(1),
+            video_id, query={
+                'key': m_id.group(2),
+            })
+        meta = video_data['meta']
+        title = meta['subject']
          formats = []
-        for format_el in urls.findall('EncodingOptions/EncodingOption'):
-            domain = format_el.find('Domain').text
-            uri = format_el.find('uri').text
-            f = {
-                'url': compat_urlparse.urljoin(domain, uri),
-                'ext': 'mp4',
-                'width': int(format_el.find('width').text),
-                'height': int(format_el.find('height').text),
-            }
-            if domain.startswith('rtmp'):
-                # urlparse does not support custom schemes
-                # https://bugs.python.org/issue18828
-                f.update({
-                    'url': domain + uri,
-                    'ext': 'flv',
-                    'rtmp_protocol': '1',  # rtmpt
+
+        def extract_formats(streams, stream_type, query={}):
+            for stream in streams:
+                stream_url = stream.get('source')
+                if not stream_url:
+                    continue
+                stream_url = update_url_query(stream_url, query)
+                encoding_option = stream.get('encodingOption', {})
+                bitrate = stream.get('bitrate', {})
+                formats.append({
+                    'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')),
+                    'url': stream_url,
+                    'width': int_or_none(encoding_option.get('width')),
+                    'height': int_or_none(encoding_option.get('height')),
+                    'vbr': int_or_none(bitrate.get('video')),
+                    'abr': int_or_none(bitrate.get('audio')),
+                    'filesize': int_or_none(stream.get('size')),
+                    'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
                  })
-            formats.append(f)
+
+        extract_formats(video_data.get('videos', {}).get('list', []), 'H264')
+        for stream_set in video_data.get('streams', []):
+            query = {}
+            for param in stream_set.get('keys', []):
+                query[param['name']] = param['value']
+            stream_type = stream_set.get('type')
+            videos = stream_set.get('videos')
+            if videos:
+                extract_formats(videos, stream_type, query)
+            elif stream_type == 'HLS':
+                stream_url = stream_set.get('source')
+                if not stream_url:
+                    continue
+                formats.extend(self._extract_m3u8_formats(
+                    update_url_query(stream_url, query), video_id,
+                    'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
          self._sort_formats(formats)
  
+        subtitles = {}
+        for caption in video_data.get('captions', {}).get('list', []):
+            caption_url = caption.get('source')
+            if not caption_url:
+                continue
+            subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({
+                'url': caption_url,
+            })
+
+        upload_date = self._search_regex(
+            r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
+            webpage, 'upload date', fatal=False)
+        if upload_date:
+            upload_date = upload_date.replace('.', '')
+
          return {
              'id': video_id,
-            'title': info.find('Subject').text,
+            'title': title,
              'formats': formats,
+            'subtitles': subtitles,
              'description': self._og_search_description(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
-            'upload_date': info.find('WriteDate').text.replace('.', ''),
-            'view_count': int(info.find('PlayCount').text),
+            'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage),
+            'view_count': int_or_none(meta.get('count')),
+            'upload_date': upload_date,
          }
diff --git a/youtube_dl/extractor/nba.py b/youtube_dl/extractor/nba.py

index d896b0d04810655c1d7c993819b88e7b32029832..53561961c12611eeead082ed662e44a75e38acbf 100644 (file)
--- a/youtube_dl/extractor/nba.py
+++ b/youtube_dl/extractor/nba.py
@@ -1,25 +1,20 @@
  from __future__ import unicode_literals
  
  import functools
-import os.path
  import re
  
-from .common import InfoExtractor
+from .turner import TurnerBaseIE
  from ..compat import (
      compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
-    int_or_none,
      OnDemandPagedList,
-    parse_duration,
      remove_start,
-    xpath_text,
-    xpath_attr,
  )
  
  
-class NBAIE(InfoExtractor):
+class NBAIE(TurnerBaseIE):
      _VALID_URL = r'https?://(?:watch\.|www\.)?nba\.com/(?P<path>(?:[^/]+/)+(?P<id>[^?]*?))/?(?:/index\.html)?(?:\?.*)?$'
      _TESTS = [{
          'url': 'http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html',
@@ -44,28 +39,30 @@ class NBAIE(InfoExtractor):
          'url': 'http://watch.nba.com/video/channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba',
          'md5': 'b2b39b81cf28615ae0c3360a3f9668c4',
          'info_dict': {
-            'id': '0041400301-cle-atl-recap',
+            'id': 'channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba',
              'ext': 'mp4',
              'title': 'Hawks vs. Cavaliers Game 1',
              'description': 'md5:8094c3498d35a9bd6b1a8c396a071b4d',
              'duration': 228,
              'timestamp': 1432134543,
              'upload_date': '20150520',
-        }
+        },
+        'expected_warnings': ['Unable to download f4m manifest'],
      }, {
          'url': 'http://www.nba.com/clippers/news/doc-rivers-were-not-trading-blake',
          'info_dict': {
-            'id': '1455672027478-Doc_Feb16_720',
+            'id': 'teams/clippers/2016/02/17/1455672027478-Doc_Feb16_720.mov-297324',
              'ext': 'mp4',
              'title': 'Practice: Doc Rivers - 2/16/16',
              'description': 'Head Coach Doc Rivers addresses the media following practice.',
-            'upload_date': '20160217',
+            'upload_date': '20160216',
              'timestamp': 1455672000,
          },
          'params': {
              # m3u8 download
              'skip_download': True,
          },
+        'expected_warnings': ['Unable to download f4m manifest'],
      }, {
          'url': 'http://www.nba.com/timberwolves/wiggins-shootaround#',
          'info_dict': {
@@ -80,7 +77,7 @@ class NBAIE(InfoExtractor):
      }, {
          'url': 'http://www.nba.com/timberwolves/wiggins-shootaround#',
          'info_dict': {
-            'id': 'Wigginsmp4',
+            'id': 'teams/timberwolves/2014/12/12/Wigginsmp4-3462601',
              'ext': 'mp4',
              'title': 'Shootaround Access - Dec. 12 | Andrew Wiggins',
              'description': 'Wolves rookie Andrew Wiggins addresses the media after Friday\'s shootaround.',
@@ -92,6 +89,7 @@ class NBAIE(InfoExtractor):
              # m3u8 download
              'skip_download': True,
          },
+        'expected_warnings': ['Unable to download f4m manifest'],
      }]
  
      _PAGE_SIZE = 30
@@ -145,53 +143,12 @@ class NBAIE(InfoExtractor):
              if path.startswith('video/teams'):
                  path = 'video/channels/proxy/' + path[6:]
  
-        video_info = self._download_xml('http://www.nba.com/%s.xml' % path, video_id)
-        video_id = os.path.splitext(xpath_text(video_info, 'slug'))[0]
-        title = xpath_text(video_info, 'headline')
-        description = xpath_text(video_info, 'description')
-        duration = parse_duration(xpath_text(video_info, 'length'))
-        timestamp = int_or_none(xpath_attr(video_info, 'dateCreated', 'uts'))
-
-        thumbnails = []
-        for image in video_info.find('images'):
-            thumbnails.append({
-                'id': image.attrib.get('cut'),
-                'url': image.text,
-                'width': int_or_none(image.attrib.get('width')),
-                'height': int_or_none(image.attrib.get('height')),
+        return self._extract_cvp_info(
+            'http://www.nba.com/%s.xml' % path, video_id, {
+                'default': {
+                    'media_src': 'http://nba.cdn.turner.com/nba/big',
+                },
+                'm3u8': {
+                    'media_src': 'http://nbavod-f.akamaihd.net',
+                },
              })
-
-        formats = []
-        for video_file in video_info.findall('.//file'):
-            video_url = video_file.text
-            if video_url.startswith('/'):
-                continue
-            if video_url.endswith('.m3u8'):
-                formats.extend(self._extract_m3u8_formats(video_url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
-            elif video_url.endswith('.f4m'):
-                formats.extend(self._extract_f4m_formats(video_url + '?hdcore=3.4.1.1', video_id, f4m_id='hds', fatal=False))
-            else:
-                key = video_file.attrib.get('bitrate')
-                format_info = {
-                    'format_id': key,
-                    'url': video_url,
-                }
-                mobj = re.search(r'(\d+)x(\d+)(?:_(\d+))?', key)
-                if mobj:
-                    format_info.update({
-                        'width': int(mobj.group(1)),
-                        'height': int(mobj.group(2)),
-                        'tbr': int_or_none(mobj.group(3)),
-                    })
-                formats.append(format_info)
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'duration': duration,
-            'timestamp': timestamp,
-            'thumbnails': thumbnails,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/nbc.py b/youtube_dl/extractor/nbc.py

index a622f2212d8af38519b2906f6b27d5c0ad0dac57..7f1bd9229303ec0390c9d10937374a0cc986790b 100644 (file)
--- a/youtube_dl/extractor/nbc.py
+++ b/youtube_dl/extractor/nbc.py
@@ -9,15 +9,11 @@ from ..utils import (
      lowercase_escape,
      smuggle_url,
      unescapeHTML,
-    update_url_query,
-    int_or_none,
-    HEADRequest,
-    parse_iso8601,
  )
  
  
  class NBCIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)'
+    _VALID_URL = r'https?://(?:www\.)?nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)'
  
      _TESTS = [
          {
@@ -27,6 +23,9 @@ class NBCIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Jimmy Fallon Surprises Fans at Ben & Jerry\'s',
                  'description': 'Jimmy gives out free scoops of his new "Tonight Dough" ice cream flavor by surprising customers at the Ben & Jerry\'s scoop shop.',
+                'timestamp': 1424246400,
+                'upload_date': '20150218',
+                'uploader': 'NBCU-COM',
              },
              'params': {
                  # m3u8 download
@@ -50,6 +49,9 @@ class NBCIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Star Wars Teaser',
                  'description': 'md5:0b40f9cbde5b671a7ff62fceccc4f442',
+                'timestamp': 1417852800,
+                'upload_date': '20141206',
+                'uploader': 'NBCU-COM',
              },
              'params': {
                  # m3u8 download
@@ -61,6 +63,23 @@ class NBCIE(InfoExtractor):
              # This video has expired but with an escaped embedURL
              'url': 'http://www.nbc.com/parenthood/episode-guide/season-5/just-like-at-home/515',
              'only_matching': True,
+        },
+        {
+            # HLS streams requires the 'hdnea3' cookie
+            'url': 'http://www.nbc.com/Kings/video/goliath/n1806',
+            'info_dict': {
+                'id': 'n1806',
+                'ext': 'mp4',
+                'title': 'Goliath',
+                'description': 'When an unknown soldier saves the life of the King\'s son in battle, he\'s thrust into the limelight and politics of the kingdom.',
+                'timestamp': 1237100400,
+                'upload_date': '20090315',
+                'uploader': 'NBCU-COM',
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'skip': 'Only works from US',
          }
      ]
  
@@ -78,6 +97,7 @@ class NBCIE(InfoExtractor):
              theplatform_url = 'http:' + theplatform_url
          return {
              '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
              'url': smuggle_url(theplatform_url, {'source_url': url}),
              'id': video_id,
          }
@@ -93,6 +113,9 @@ class NBCSportsVPlayerIE(InfoExtractor):
              'ext': 'flv',
              'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d',
              'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson',
+            'timestamp': 1426270238,
+            'upload_date': '20150313',
+            'uploader': 'NBCU-SPORTS',
          }
      }, {
          'url': 'http://vplayer.nbcsports.com/p/BxmELC/nbc_embedshare/select/_hqLjQ95yx8Z',
@@ -115,7 +138,7 @@ class NBCSportsVPlayerIE(InfoExtractor):
  
  class NBCSportsIE(InfoExtractor):
      # Does not include https because its certificate is invalid
-    _VALID_URL = r'https?://www\.nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
+    _VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
  
      _TEST = {
          'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke',
@@ -124,6 +147,9 @@ class NBCSportsIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke',
              'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113',
+            'uploader': 'NBCU-SPORTS',
+            'upload_date': '20150330',
+            'timestamp': 1427726529,
          }
      }
  
@@ -134,10 +160,37 @@ class NBCSportsIE(InfoExtractor):
              NBCSportsVPlayerIE._extract_url(webpage), 'NBCSportsVPlayer')
  
  
+class CSNNEIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?csnne\.com/video/(?P<id>[0-9a-z-]+)'
+
+    _TEST = {
+        'url': 'http://www.csnne.com/video/snc-evening-update-wright-named-red-sox-no-5-starter',
+        'info_dict': {
+            'id': 'yvBLLUgQ8WU0',
+            'ext': 'mp4',
+            'title': 'SNC evening update: Wright named Red Sox\' No. 5 starter.',
+            'description': 'md5:1753cfee40d9352b19b4c9b3e589b9e3',
+            'timestamp': 1459369979,
+            'upload_date': '20160330',
+            'uploader': 'NBCU-SPORTS',
+        }
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': self._html_search_meta('twitter:player:stream', webpage),
+            'display_id': display_id,
+        }
+
+
  class NBCNewsIE(ThePlatformIE):
-    _VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/
+    _VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/
          (?:video/.+?/(?P<id>\d+)|
-        ([^/]+/)*(?P<display_id>[^/?]+))
+        ([^/]+/)*(?:.*-)?(?P<mpx_id>[^/?]+))
          '''
  
      _TESTS = [
@@ -159,13 +212,16 @@ class NBCNewsIE(ThePlatformIE):
                  'ext': 'mp4',
                  'title': 'How Twitter Reacted To The Snowden Interview',
                  'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
+                'uploader': 'NBCU-NEWS',
+                'timestamp': 1401363060,
+                'upload_date': '20140529',
              },
          },
          {
              'url': 'http://www.nbcnews.com/feature/dateline-full-episodes/full-episode-family-business-n285156',
              'md5': 'fdbf39ab73a72df5896b6234ff98518a',
              'info_dict': {
-                'id': 'Wjf9EDR3A_60',
+                'id': '529953347624',
                  'ext': 'mp4',
                  'title': 'FULL EPISODE: Family Business',
                  'description': 'md5:757988edbaae9d7be1d585eb5d55cc04',
@@ -180,6 +236,9 @@ class NBCNewsIE(ThePlatformIE):
                  'ext': 'mp4',
                  'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
                  'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
+                'timestamp': 1423104900,
+                'uploader': 'NBCU-NEWS',
+                'upload_date': '20150205',
              },
          },
          {
@@ -188,15 +247,50 @@ class NBCNewsIE(ThePlatformIE):
              'info_dict': {
                  'id': '529953347624',
                  'ext': 'mp4',
-                'title': 'Volkswagen U.S. Chief: We \'Totally Screwed Up\'',
-                'description': 'md5:d22d1281a24f22ea0880741bb4dd6301',
+                'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
+                'description': 'md5:c8be487b2d80ff0594c005add88d8351',
+                'upload_date': '20150922',
+                'timestamp': 1442917800,
+                'uploader': 'NBCU-NEWS',
+            },
+        },
+        {
+            'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
+            'md5': '118d7ca3f0bea6534f119c68ef539f71',
+            'info_dict': {
+                'id': '669831235788',
+                'ext': 'mp4',
+                'title': 'See the aurora borealis from space in stunning new NASA video',
+                'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
+                'upload_date': '20160420',
+                'timestamp': 1461152093,
+                'uploader': 'NBCU-NEWS',
+            },
+        },
+        {
+            'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
+            'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
+            'info_dict': {
+                'id': '314487875924',
+                'ext': 'mp4',
+                'title': 'The chaotic GOP immigration vote',
+                'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
+                'thumbnail': 're:^https?://.*\.jpg$',
+                'timestamp': 1406937606,
+                'upload_date': '20140802',
+                'uploader': 'NBCU-NEWS',
+                'categories': ['MSNBC/Topics/Franchise/Best of last night', 'MSNBC/Topics/General/Congress'],
              },
-            'expected_warnings': ['http-6000 is not available']
          },
          {
              'url': 'http://www.nbcnews.com/watch/dateline/full-episode--deadly-betrayal-386250819952',
              'only_matching': True,
          },
+        {
+            # From http://www.vulture.com/2016/06/letterman-couldnt-care-less-about-late-night.html
+            'url': 'http://www.nbcnews.com/widget/video-embed/701714499682',
+            'only_matching': True,
+        },
      ]
  
      def _real_extract(self, url):
@@ -216,103 +310,68 @@ class NBCNewsIE(ThePlatformIE):
              }
          else:
              # "feature" and "nightly-news" pages use theplatform.com
-            display_id = mobj.group('display_id')
-            webpage = self._download_webpage(url, display_id)
-            info = None
-            bootstrap_json = self._search_regex(
-                r'(?m)var\s+(?:bootstrapJson|playlistData)\s*=\s*({.+});?\s*$',
-                webpage, 'bootstrap json', default=None)
-            if bootstrap_json:
-                bootstrap = self._parse_json(bootstrap_json, display_id)
-                info = bootstrap['results'][0]['video']
-            else:
-                player_instance_json = self._search_regex(
-                    r'videoObj\s*:\s*({.+})', webpage, 'player instance')
-                info = self._parse_json(player_instance_json, display_id)
-            video_id = info['mpxId']
-            title = info['title']
-
-            subtitles = {}
-            caption_links = info.get('captionLinks')
-            if caption_links:
-                for (sub_key, sub_ext) in (('smpte-tt', 'ttml'), ('web-vtt', 'vtt'), ('srt', 'srt')):
-                    sub_url = caption_links.get(sub_key)
-                    if sub_url:
-                        subtitles.setdefault('en', []).append({
-                            'url': sub_url,
-                            'ext': sub_ext,
-                        })
-
-            formats = []
-            for video_asset in info['videoAssets']:
-                video_url = video_asset.get('publicUrl')
-                if not video_url:
-                    continue
-                container = video_asset.get('format')
-                asset_type = video_asset.get('assetType') or ''
-                if container == 'ISM' or asset_type == 'FireTV-Once':
-                    continue
-                elif asset_type == 'OnceURL':
-                    tp_formats, tp_subtitles = self._extract_theplatform_smil(
-                        video_url, video_id)
-                    formats.extend(tp_formats)
-                    subtitles = self._merge_subtitles(subtitles, tp_subtitles)
+            video_id = mobj.group('mpx_id')
+            if not video_id.isdigit():
+                webpage = self._download_webpage(url, video_id)
+                info = None
+                bootstrap_json = self._search_regex(
+                    [r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
+                     r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"'],
+                    webpage, 'bootstrap json', default=None)
+                bootstrap = self._parse_json(
+                    bootstrap_json, video_id, transform_source=unescapeHTML)
+                if 'results' in bootstrap:
+                    info = bootstrap['results'][0]['video']
+                elif 'video' in bootstrap:
+                    info = bootstrap['video']
                  else:
-                    tbr = int_or_none(video_asset.get('bitRate'), 1000)
-                    format_id = 'http%s' % ('-%d' % tbr if tbr else '')
-                    video_url = update_url_query(
-                        video_url, {'format': 'redirect'})
-                    # resolve the url so that we can check availability and detect the correct extension
-                    head = self._request_webpage(
-                        HEADRequest(video_url), video_id,
-                        'Checking %s url' % format_id,
-                        '%s is not available' % format_id,
-                        fatal=False)
-                    if head:
-                        video_url = head.geturl()
-                        formats.append({
-                            'format_id': format_id,
-                            'url': video_url,
-                            'width': int_or_none(video_asset.get('width')),
-                            'height': int_or_none(video_asset.get('height')),
-                            'tbr': tbr,
-                            'container': video_asset.get('format'),
-                        })
-            self._sort_formats(formats)
+                    info = bootstrap
+                video_id = info['mpxId']
  
              return {
+                '_type': 'url_transparent',
                  'id': video_id,
-                'title': title,
-                'description': info.get('description'),
-                'thumbnail': info.get('description'),
-                'thumbnail': info.get('thumbnail'),
-                'duration': int_or_none(info.get('duration')),
-                'timestamp': parse_iso8601(info.get('pubDate')),
-                'formats': formats,
-                'subtitles': subtitles,
+                # http://feed.theplatform.com/f/2E2eJC/nbcnews also works
+                'url': 'http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews?byId=%s' % video_id,
+                'ie_key': 'ThePlatformFeed',
              }
  
  
-class MSNBCIE(InfoExtractor):
-    # https URLs redirect to corresponding http ones
-    _VALID_URL = r'https?://www\.msnbc\.com/[^/]+/watch/(?P<id>[^/]+)'
+class NBCOlympicsIE(InfoExtractor):
+    _VALID_URL = r'https?://www\.nbcolympics\.com/video/(?P<id>[a-z-]+)'
+
      _TEST = {
-        'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
-        'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
+        # Geo-restricted to US
+        'url': 'http://www.nbcolympics.com/video/justin-roses-son-leo-was-tears-after-his-dad-won-gold',
+        'md5': '54fecf846d05429fbaa18af557ee523a',
          'info_dict': {
-            'id': 'n_hayes_Aimm_140801_272214',
+            'id': 'WjTBzDXx5AUq',
+            'display_id': 'justin-roses-son-leo-was-tears-after-his-dad-won-gold',
              'ext': 'mp4',
-            'title': 'The chaotic GOP immigration vote',
-            'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'timestamp': 1406937606,
-            'upload_date': '20140802',
-            'categories': ['MSNBC/Topics/Franchise/Best of last night', 'MSNBC/Topics/General/Congress'],
+            'title': 'Rose\'s son Leo was in tears after his dad won gold',
+            'description': 'Olympic gold medalist Justin Rose gets emotional talking to the impact his win in men\'s golf has already had on his children.',
+            'timestamp': 1471274964,
+            'upload_date': '20160815',
+            'uploader': 'NBCU-SPORTS',
          },
      }
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-        embed_url = self._html_search_meta('embedURL', webpage)
-        return self.url_result(embed_url)
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        drupal_settings = self._parse_json(self._search_regex(
+            r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
+            webpage, 'drupal settings'), display_id)
+
+        iframe_url = drupal_settings['vod']['iframe_url']
+        theplatform_url = iframe_url.replace(
+            'vplayer.nbcolympics.com', 'player.theplatform.com')
+
+        return {
+            '_type': 'url_transparent',
+            'url': theplatform_url,
+            'ie_key': ThePlatformIE.ie_key(),
+            'display_id': display_id,
+        }
diff --git a/youtube_dl/extractor/ndr.py b/youtube_dl/extractor/ndr.py

index 0cded6b5c3d0bbcb095de8672de70fa81b9f7fd1..e3b0da2e966eb9486ab5307a933c51d74f2a14ba 100644 (file)
--- a/youtube_dl/extractor/ndr.py
+++ b/youtube_dl/extractor/ndr.py
@@ -23,7 +23,7 @@ class NDRBaseIE(InfoExtractor):
  class NDRIE(NDRBaseIE):
      IE_NAME = 'ndr'
      IE_DESC = 'NDR.de - Norddeutscher Rundfunk'
-    _VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
+    _VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
      _TESTS = [{
          # httpVideo, same content id
          'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html',
@@ -105,7 +105,7 @@ class NDRIE(NDRBaseIE):
  class NJoyIE(NDRBaseIE):
      IE_NAME = 'njoy'
      IE_DESC = 'N-JOY'
-    _VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
+    _VALID_URL = r'https?://(?:www\.)?n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
      _TESTS = [{
          # httpVideo, same content id
          'url': 'http://www.n-joy.de/entertainment/comedy/comedy_contest/Benaissa-beim-NDR-Comedy-Contest,comedycontest2480.html',
@@ -238,7 +238,7 @@ class NDREmbedBaseIE(InfoExtractor):
  
  class NDREmbedIE(NDREmbedBaseIE):
      IE_NAME = 'ndr:embed'
-    _VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
+    _VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
      _TESTS = [{
          'url': 'http://www.ndr.de/fernsehen/sendungen/ndr_aktuell/ndraktuell28488-player.html',
          'md5': '8b9306142fe65bbdefb5ce24edb6b0a9',
@@ -332,7 +332,7 @@ class NDREmbedIE(NDREmbedBaseIE):
  
  class NJoyEmbedIE(NDREmbedBaseIE):
      IE_NAME = 'njoy:embed'
-    _VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
+    _VALID_URL = r'https?://(?:www\.)?n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
      _TESTS = [{
          # httpVideo
          'url': 'http://www.n-joy.de/events/reeperbahnfestival/doku948-player_image-bc168e87-5263-4d6d-bd27-bb643005a6de_theme-n-joy.html',
diff --git a/youtube_dl/extractor/ndtv.py b/youtube_dl/extractor/ndtv.py

index 2a1ca80df797f0abe63cc6327c5e283965865f70..96528f6499d1e02c5208e61fe8abd1f606b29392 100644 (file)
--- a/youtube_dl/extractor/ndtv.py
+++ b/youtube_dl/extractor/ndtv.py
@@ -1,19 +1,18 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
-    month_by_name,
      int_or_none,
+    remove_end,
+    unified_strdate,
  )
  
  
  class NDTVIE(InfoExtractor):
-    _VALID_URL = r'^https?://(?:www\.)?ndtv\.com/video/player/[^/]*/[^/]*/(?P<id>[a-z0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?ndtv\.com/video/(?:[^/]+/)+[^/?^&]+-(?P<id>\d+)'
  
      _TEST = {
-        'url': 'http://www.ndtv.com/video/player/news/ndtv-exclusive-don-t-need-character-certificate-from-rahul-gandhi-says-arvind-kejriwal/300710',
+        'url': 'http://www.ndtv.com/video/news/news/ndtv-exclusive-don-t-need-character-certificate-from-rahul-gandhi-says-arvind-kejriwal-300710',
          'md5': '39f992dbe5fb531c395d8bbedb1e5e88',
          'info_dict': {
              'id': '300710',
@@ -22,7 +21,7 @@ class NDTVIE(InfoExtractor):
              'description': 'md5:ab2d4b4a6056c5cb4caa6d729deabf02',
              'upload_date': '20131208',
              'duration': 1327,
-            'thumbnail': 'http://i.ndtvimg.com/video/images/vod/medium/2013-12/big_300710_1386518307.jpg',
+            'thumbnail': 're:https?://.*\.jpg',
          },
      }
  
@@ -30,36 +29,19 @@ class NDTVIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
+        title = remove_end(self._og_search_title(webpage), ' - NDTV')
+
          filename = self._search_regex(
              r"__filename='([^']+)'", webpage, 'video filename')
-        video_url = ('http://bitcast-b.bitgravity.com/ndtvod/23372/ndtv/%s' %
-                     filename)
+        video_url = 'http://bitcast-b.bitgravity.com/ndtvod/23372/ndtv/%s' % filename
  
          duration = int_or_none(self._search_regex(
              r"__duration='([^']+)'", webpage, 'duration', fatal=False))
  
-        date_m = re.search(r'''(?x)
-            <p\s+class="vod_dateline">\s*
-                Published\s+On:\s*
-                (?P<monthname>[A-Za-z]+)\s+(?P<day>[0-9]+),\s*(?P<year>[0-9]+)
-            ''', webpage)
-        upload_date = None
-
-        if date_m is not None:
-            month = month_by_name(date_m.group('monthname'))
-            if month is not None:
-                upload_date = '%s%02d%02d' % (
-                    date_m.group('year'), month, int(date_m.group('day')))
-
-        description = self._og_search_description(webpage)
-        READ_MORE = ' (Read more)'
-        if description.endswith(READ_MORE):
-            description = description[:-len(READ_MORE)]
+        upload_date = unified_strdate(self._html_search_meta(
+            'publish-date', webpage, 'upload date', fatal=False))
  
-        title = self._og_search_title(webpage)
-        TITLE_SUFFIX = ' - NDTV'
-        if title.endswith(TITLE_SUFFIX):
-            title = title[:-len(TITLE_SUFFIX)]
+        description = remove_end(self._og_search_description(webpage), ' (Read more)')
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/nerdist.py b/youtube_dl/extractor/nerdist.py

deleted file mode 100644 (file)

index c6dc34b..0000000
--- a/youtube_dl/extractor/nerdist.py
+++ /dev/null
@@ -1,80 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-
-from ..utils import (
-    determine_ext,
-    parse_iso8601,
-    xpath_text,
-)
-
-
-class NerdistIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?nerdist\.com/vepisode/(?P<id>[^/?#]+)'
-    _TEST = {
-        'url': 'http://www.nerdist.com/vepisode/exclusive-which-dc-characters-w',
-        'md5': '3698ed582931b90d9e81e02e26e89f23',
-        'info_dict': {
-            'display_id': 'exclusive-which-dc-characters-w',
-            'id': 'RPHpvJyr',
-            'ext': 'mp4',
-            'title': 'Your TEEN TITANS Revealed! Who\'s on the show?',
-            'thumbnail': 're:^https?://.*/thumbs/.*\.jpg$',
-            'description': 'Exclusive: Find out which DC Comics superheroes will star in TEEN TITANS Live-Action TV Show on Nerdist News with Jessica Chobot!',
-            'uploader': 'Eric Diaz',
-            'upload_date': '20150202',
-            'timestamp': 1422892808,
-        }
-    }
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-
-        video_id = self._search_regex(
-            r'''(?x)<script\s+(?:type="text/javascript"\s+)?
-                src="https?://content\.nerdist\.com/players/([a-zA-Z0-9_]+)-''',
-            webpage, 'video ID')
-        timestamp = parse_iso8601(self._html_search_meta(
-            'shareaholic:article_published_time', webpage, 'upload date'))
-        uploader = self._html_search_meta(
-            'shareaholic:article_author_name', webpage, 'article author')
-
-        doc = self._download_xml(
-            'http://content.nerdist.com/jw6/%s.xml' % video_id, video_id)
-        video_info = doc.find('.//item')
-        title = xpath_text(video_info, './title', fatal=True)
-        description = xpath_text(video_info, './description')
-        thumbnail = xpath_text(
-            video_info, './{http://rss.jwpcdn.com/}image', 'thumbnail')
-
-        formats = []
-        for source in video_info.findall('./{http://rss.jwpcdn.com/}source'):
-            vurl = source.attrib['file']
-            ext = determine_ext(vurl)
-            if ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(
-                    vurl, video_id, entry_protocol='m3u8_native', ext='mp4',
-                    preference=0))
-            elif ext == 'smil':
-                formats.extend(self._extract_smil_formats(
-                    vurl, video_id, fatal=False
-                ))
-            else:
-                formats.append({
-                    'format_id': ext,
-                    'url': vurl,
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'timestamp': timestamp,
-            'formats': formats,
-            'uploader': uploader,
-        }
diff --git a/youtube_dl/extractor/neteasemusic.py b/youtube_dl/extractor/neteasemusic.py

index 0d36474fa069b793ff32e90e6de1804de09058ac..978a05841ce68161330f9db24169dd330e51efc1 100644 (file)
--- a/youtube_dl/extractor/neteasemusic.py
+++ b/youtube_dl/extractor/neteasemusic.py
@@ -89,6 +89,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
              'timestamp': 1431878400,
              'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
          },
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'No lyrics translation.',
          'url': 'http://music.163.com/#/song?id=29822014',
@@ -101,6 +102,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
              'timestamp': 1419523200,
              'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
          },
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'No lyrics.',
          'url': 'http://music.163.com/song?id=17241424',
@@ -112,6 +114,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
              'upload_date': '20080211',
              'timestamp': 1202745600,
          },
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'Has translated name.',
          'url': 'http://music.163.com/#/song?id=22735043',
@@ -124,7 +127,8 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
              'upload_date': '20100127',
              'timestamp': 1264608000,
              'alt_title': '说出愿望吧(Genie)',
-        }
+        },
+        'skip': 'Blocked outside Mainland China',
      }]
  
      def _process_lyrics(self, lyrics_info):
@@ -192,6 +196,7 @@ class NetEaseMusicAlbumIE(NetEaseMusicBaseIE):
              'title': 'B\'day',
          },
          'playlist_count': 23,
+        'skip': 'Blocked outside Mainland China',
      }
  
      def _real_extract(self, url):
@@ -223,6 +228,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
              'title': '张惠妹 - aMEI;阿密特',
          },
          'playlist_count': 50,
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'Singer has translated name.',
          'url': 'http://music.163.com/#/artist?id=124098',
@@ -231,6 +237,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
              'title': '李昇基 - 이승기',
          },
          'playlist_count': 50,
+        'skip': 'Blocked outside Mainland China',
      }]
  
      def _real_extract(self, url):
@@ -266,6 +273,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
              'description': 'md5:12fd0819cab2965b9583ace0f8b7b022'
          },
          'playlist_count': 99,
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'Toplist/Charts sample',
          'url': 'http://music.163.com/#/discover/toplist?id=3733003',
@@ -275,6 +283,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
              'description': 'md5:73ec782a612711cadc7872d9c1e134fc',
          },
          'playlist_count': 50,
+        'skip': 'Blocked outside Mainland China',
      }]
  
      def _real_extract(self, url):
@@ -314,6 +323,7 @@ class NetEaseMusicMvIE(NetEaseMusicBaseIE):
              'creator': '白雅言',
              'upload_date': '20150520',
          },
+        'skip': 'Blocked outside Mainland China',
      }
  
      def _real_extract(self, url):
@@ -357,6 +367,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
              'upload_date': '20150613',
              'duration': 900,
          },
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'This program has accompanying songs.',
          'url': 'http://music.163.com/#/program?id=10141022',
@@ -366,6 +377,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
              'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
          },
          'playlist_count': 4,
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'This program has accompanying songs.',
          'url': 'http://music.163.com/#/program?id=10141022',
@@ -379,7 +391,8 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
          },
          'params': {
              'noplaylist': True
-        }
+        },
+        'skip': 'Blocked outside Mainland China',
      }]
  
      def _real_extract(self, url):
@@ -438,6 +451,7 @@ class NetEaseMusicDjRadioIE(NetEaseMusicBaseIE):
              'description': 'md5:766220985cbd16fdd552f64c578a6b15'
          },
          'playlist_mincount': 40,
+        'skip': 'Blocked outside Mainland China',
      }
      _PAGE_SIZE = 1000
  
diff --git a/youtube_dl/extractor/newgrounds.py b/youtube_dl/extractor/newgrounds.py

index cd117b04edeff88d90842f2ed8e15a8c43bde714..9bea610c88a4ac48cc1b84ed5c3fae45789d643e 100644 (file)
--- a/youtube_dl/extractor/newgrounds.py
+++ b/youtube_dl/extractor/newgrounds.py
@@ -1,15 +1,12 @@
  from __future__ import unicode_literals
  
-import json
-import re
-
  from .common import InfoExtractor
  
  
  class NewgroundsIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?newgrounds\.com/audio/listen/(?P<id>[0-9]+)'
-    _TEST = {
-        'url': 'http://www.newgrounds.com/audio/listen/549479',
+    _VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'https://www.newgrounds.com/audio/listen/549479',
          'md5': 'fe6033d297591288fa1c1f780386f07a',
          'info_dict': {
              'id': '549479',
@@ -17,25 +14,32 @@ class NewgroundsIE(InfoExtractor):
              'title': 'B7 - BusMode',
              'uploader': 'Burn7',
          }
-    }
+    }, {
+        'url': 'https://www.newgrounds.com/portal/view/673111',
+        'md5': '3394735822aab2478c31b1004fe5e5bc',
+        'info_dict': {
+            'id': '673111',
+            'ext': 'mp4',
+            'title': 'Dancin',
+            'uploader': 'Squirrelman82',
+        },
+    }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        music_id = mobj.group('id')
-        webpage = self._download_webpage(url, music_id)
+        media_id = self._match_id(url)
+        webpage = self._download_webpage(url, media_id)
  
          title = self._html_search_regex(
-            r',"name":"([^"]+)",', webpage, 'music title')
+            r'<title>([^>]+)</title>', webpage, 'title')
+
          uploader = self._html_search_regex(
-            r',"artist":"([^"]+)",', webpage, 'music uploader')
+            r'Author\s*<a[^>]+>([^<]+)', webpage, 'uploader', fatal=False)
  
-        music_url_json_string = self._html_search_regex(
-            r'({"url":"[^"]+"),', webpage, 'music url') + '}'
-        music_url_json = json.loads(music_url_json_string)
-        music_url = music_url_json['url']
+        music_url = self._parse_json(self._search_regex(
+            r'"url":("[^"]+"),', webpage, ''), media_id)
  
          return {
-            'id': music_id,
+            'id': media_id,
              'title': title,
              'url': music_url,
              'uploader': uploader,
diff --git a/youtube_dl/extractor/newstube.py b/youtube_dl/extractor/newstube.py

index 5a9e73cd66a1b1224bdec848722f5e9d14f65c38..e3f35f1d8b6d526d13bf7b301e57c9570bff3af4 100644 (file)
--- a/youtube_dl/extractor/newstube.py
+++ b/youtube_dl/extractor/newstube.py
@@ -1,27 +1,27 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+)
  
  
  class NewstubeIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)'
      _TEST = {
          'url': 'http://www.newstube.ru/media/telekanal-cnn-peremestil-gorod-slavyansk-v-krym',
+        'md5': '801eef0c2a9f4089fa04e4fe3533abdc',
          'info_dict': {
              'id': '728e0ef2-e187-4012-bac0-5a081fdcb1f6',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Телеканал CNN переместил город Славянск в Крым',
              'description': 'md5:419a8c9f03442bc0b0a794d689360335',
              'duration': 31.05,
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
      }
  
      def _real_extract(self, url):
@@ -62,7 +62,6 @@ class NewstubeIE(InfoExtractor):
              server = media_location.find(ns('./Server')).text
              app = media_location.find(ns('./App')).text
              media_id = stream_info.find(ns('./Id')).text
-            quality_id = stream_info.find(ns('./QualityId')).text
              name = stream_info.find(ns('./Name')).text
              width = int(stream_info.find(ns('./Width')).text)
              height = int(stream_info.find(ns('./Height')).text)
@@ -74,12 +73,38 @@ class NewstubeIE(InfoExtractor):
                  'rtmp_conn': ['S:%s' % session_id, 'S:%s' % media_id, 'S:n2'],
                  'page_url': url,
                  'ext': 'flv',
-                'format_id': quality_id,
-                'format_note': name,
+                'format_id': 'rtmp' + ('-%s' % name if name else ''),
                  'width': width,
                  'height': height,
              })
  
+        sources_data = self._download_json(
+            'http://www.newstube.ru/player2/getsources?guid=%s' % video_guid,
+            video_guid, fatal=False)
+        if sources_data:
+            for source in sources_data.get('Sources', []):
+                source_url = source.get('Src')
+                if not source_url:
+                    continue
+                height = int_or_none(source.get('Height'))
+                f = {
+                    'format_id': 'http' + ('-%dp' % height if height else ''),
+                    'url': source_url,
+                    'width': int_or_none(source.get('Width')),
+                    'height': height,
+                }
+                source_type = source.get('Type')
+                if source_type:
+                    mobj = re.search(r'codecs="([^,]+),\s*([^"]+)"', source_type)
+                    if mobj:
+                        vcodec, acodec = mobj.groups()
+                        f.update({
+                            'vcodec': vcodec,
+                            'acodec': acodec,
+                        })
+                formats.append(f)
+
+        self._check_formats(formats, video_guid)
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/nextmedia.py b/youtube_dl/extractor/nextmedia.py

index aae7aeeebb8e2adebd2669bcd899caec3432275d..dee9056d39e9bb0076d390054006c6dd4246afae 100644 (file)
--- a/youtube_dl/extractor/nextmedia.py
+++ b/youtube_dl/extractor/nextmedia.py
@@ -7,7 +7,7 @@ from ..utils import parse_iso8601
  
  class NextMediaIE(InfoExtractor):
      IE_DESC = '蘋果日報'
-    _VALID_URL = r'https?://hk.apple.nextmedia.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
+    _VALID_URL = r'https?://hk\.apple\.nextmedia\.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://hk.apple.nextmedia.com/realtime/news/20141108/53109199',
          'md5': 'dff9fad7009311c421176d1ac90bfe4f',
@@ -68,7 +68,7 @@ class NextMediaIE(InfoExtractor):
  
  class NextMediaActionNewsIE(NextMediaIE):
      IE_DESC = '蘋果日報 - 動新聞'
-    _VALID_URL = r'https?://hk.dv.nextmedia.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
+    _VALID_URL = r'https?://hk\.dv\.nextmedia\.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
      _TESTS = [{
          'url': 'http://hk.dv.nextmedia.com/actionnews/hit/20150121/19009428/20061460',
          'md5': '05fce8ffeed7a5e00665d4b7cf0f9201',
@@ -93,7 +93,7 @@ class NextMediaActionNewsIE(NextMediaIE):
  
  class AppleDailyIE(NextMediaIE):
      IE_DESC = '臺灣蘋果日報'
-    _VALID_URL = r'https?://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
+    _VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews|actionnews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
      _TESTS = [{
          'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
          'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',
@@ -154,6 +154,9 @@ class AppleDailyIE(NextMediaIE):
              'description': 'md5:7b859991a6a4fedbdf3dd3b66545c748',
              'upload_date': '20140417',
          },
+    }, {
+        'url': 'http://www.appledaily.com.tw/actionnews/appledaily/7/20161003/960588/',
+        'only_matching': True,
      }]
  
      _URL_PATTERN = r'\{url: \'(.+)\'\}'
diff --git a/youtube_dl/extractor/nextmovie.py b/youtube_dl/extractor/nextmovie.py

deleted file mode 100644 (file)

index 9ccd7d7..0000000
--- a/youtube_dl/extractor/nextmovie.py
+++ /dev/null
@@ -1,30 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .mtv import MTVServicesInfoExtractor
-from ..compat import compat_urllib_parse_urlencode
-
-
-class NextMovieIE(MTVServicesInfoExtractor):
-    IE_NAME = 'nextmovie.com'
-    _VALID_URL = r'https?://(?:www\.)?nextmovie\.com/shows/[^/]+/\d{4}-\d{2}-\d{2}/(?P<id>[^/?#]+)'
-    _FEED_URL = 'http://lite.dextr.mtvi.com/service1/dispatch.htm'
-    _TESTS = [{
-        'url': 'http://www.nextmovie.com/shows/exclusives/2013-03-10/mgid:uma:videolist:nextmovie.com:1715019/',
-        'md5': '09a9199f2f11f10107d04fcb153218aa',
-        'info_dict': {
-            'id': '961726',
-            'ext': 'mp4',
-            'title': 'The Muppets\' Gravity',
-        },
-    }]
-
-    def _get_feed_query(self, uri):
-        return compat_urllib_parse_urlencode({
-            'feed': '1505',
-            'mgid': uri,
-        })
-
-    def _real_extract(self, url):
-        mgid = self._match_id(url)
-        return self._get_videos_info(mgid)
diff --git a/youtube_dl/extractor/nfb.py b/youtube_dl/extractor/nfb.py

index 51e4a34f789f0e7e9dff2eeb9ec839e655632c75..adcc636bc32c062fec74074044783e452a3725d9 100644 (file)
--- a/youtube_dl/extractor/nfb.py
+++ b/youtube_dl/extractor/nfb.py
@@ -2,8 +2,12 @@ from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from ..utils import (
-    sanitized_Request,
+    clean_html,
+    determine_ext,
+    int_or_none,
+    qualities,
      urlencode_postdata,
+    xpath_text,
  )
  
  
@@ -16,12 +20,12 @@ class NFBIE(InfoExtractor):
          'url': 'https://www.nfb.ca/film/qallunaat_why_white_people_are_funny',
          'info_dict': {
              'id': 'qallunaat_why_white_people_are_funny',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'Qallunaat! Why White People Are Funny ',
-            'description': 'md5:836d8aff55e087d04d9f6df554d4e038',
+            'description': 'md5:6b8e32dde3abf91e58857b174916620c',
              'duration': 3128,
+            'creator': 'Mark Sandiford',
              'uploader': 'Mark Sandiford',
-            'uploader_id': 'mark-sandiford',
          },
          'params': {
              # rtmp download
@@ -31,65 +35,78 @@ class NFBIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        page = self._download_webpage(
-            'https://www.nfb.ca/film/%s' % video_id, video_id,
-            'Downloading film page')
  
-        uploader_id = self._html_search_regex(r'<a class="director-link" href="/explore-all-directors/([^/]+)/"',
-                                              page, 'director id', fatal=False)
-        uploader = self._html_search_regex(r'<em class="director-name" itemprop="name">([^<]+)</em>',
-                                           page, 'director name', fatal=False)
-
-        request = sanitized_Request(
+        config = self._download_xml(
              'https://www.nfb.ca/film/%s/player_config' % video_id,
-            urlencode_postdata({'getConfig': 'true'}))
-        request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        request.add_header('X-NFB-Referer', 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf')
-
-        config = self._download_xml(request, video_id, 'Downloading player config XML')
+            video_id, 'Downloading player config XML',
+            data=urlencode_postdata({'getConfig': 'true'}),
+            headers={
+                'Content-Type': 'application/x-www-form-urlencoded',
+                'X-NFB-Referer': 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf'
+            })
  
-        title = None
-        description = None
-        thumbnail = None
-        duration = None
-        formats = []
-
-        def extract_thumbnail(media):
-            thumbnails = {}
-            for asset in media.findall('assets/asset'):
-                thumbnails[asset.get('quality')] = asset.find('default/url').text
-            if not thumbnails:
-                return None
-            if 'high' in thumbnails:
-                return thumbnails['high']
-            return list(thumbnails.values())[0]
+        title, description, thumbnail, duration, uploader, author = [None] * 6
+        thumbnails, formats = [[]] * 2
+        subtitles = {}
  
          for media in config.findall('./player/stream/media'):
              if media.get('type') == 'posterImage':
-                thumbnail = extract_thumbnail(media)
+                quality_key = qualities(('low', 'high'))
+                thumbnails = []
+                for asset in media.findall('assets/asset'):
+                    asset_url = xpath_text(asset, 'default/url', default=None)
+                    if not asset_url:
+                        continue
+                    quality = asset.get('quality')
+                    thumbnails.append({
+                        'url': asset_url,
+                        'id': quality,
+                        'preference': quality_key(quality),
+                    })
              elif media.get('type') == 'video':
-                duration = int(media.get('duration'))
-                title = media.find('title').text
-                description = media.find('description').text
-                # It seems assets always go from lower to better quality, so no need to sort
+                title = xpath_text(media, 'title', fatal=True)
                  for asset in media.findall('assets/asset'):
-                    for x in asset:
+                    quality = asset.get('quality')
+                    height = int_or_none(self._search_regex(
+                        r'^(\d+)[pP]$', quality or '', 'height', default=None))
+                    for node in asset:
+                        streamer = xpath_text(node, 'streamerURI', default=None)
+                        if not streamer:
+                            continue
+                        play_path = xpath_text(node, 'url', default=None)
+                        if not play_path:
+                            continue
                          formats.append({
-                            'url': x.find('streamerURI').text,
-                            'app': x.find('streamerURI').text.split('/', 3)[3],
-                            'play_path': x.find('url').text,
+                            'url': streamer,
+                            'app': streamer.split('/', 3)[3],
+                            'play_path': play_path,
                              'rtmp_live': False,
-                            'ext': 'mp4',
-                            'format_id': '%s-%s' % (x.tag, asset.get('quality')),
+                            'ext': 'flv',
+                            'format_id': '%s-%s' % (node.tag, quality) if quality else node.tag,
+                            'height': height,
                          })
+                self._sort_formats(formats)
+                description = clean_html(xpath_text(media, 'description'))
+                uploader = xpath_text(media, 'author')
+                duration = int_or_none(media.get('duration'))
+                for subtitle in media.findall('./subtitles/subtitle'):
+                    subtitle_url = xpath_text(subtitle, 'url', default=None)
+                    if not subtitle_url:
+                        continue
+                    lang = xpath_text(subtitle, 'lang', default='en')
+                    subtitles.setdefault(lang, []).append({
+                        'url': subtitle_url,
+                        'ext': (subtitle.get('format') or determine_ext(subtitle_url)).lower(),
+                    })
  
          return {
              'id': video_id,
              'title': title,
              'description': description,
-            'thumbnail': thumbnail,
+            'thumbnails': thumbnails,
              'duration': duration,
+            'creator': uploader,
              'uploader': uploader,
-            'uploader_id': uploader_id,
              'formats': formats,
+            'subtitles': subtitles,
          }
diff --git a/youtube_dl/extractor/nfl.py b/youtube_dl/extractor/nfl.py

index 200874d68e765e43a6b9787473c6d2b5af54cfb2..3930d16f16e4d295e9afeb84f88eb36dc7ffc30b 100644 (file)
--- a/youtube_dl/extractor/nfl.py
+++ b/youtube_dl/extractor/nfl.py
@@ -165,7 +165,7 @@ class NFLIE(InfoExtractor):
              group='config'))
          # For articles, the id in the url is not the video id
          video_id = self._search_regex(
-            r'(?:<nflcs:avplayer[^>]+data-content[Ii]d\s*=\s*|content[Ii]d\s*:\s*)(["\'])(?P<id>.+?)\1',
+            r'(?:<nflcs:avplayer[^>]+data-content[Ii]d\s*=\s*|content[Ii]d\s*:\s*)(["\'])(?P<id>(?:(?!\1).)+)\1',
              webpage, 'video id', default=video_id, group='id')
          config = self._download_json(config_url, video_id, 'Downloading player config')
          url_template = NFLIE.prepend_host(
diff --git a/youtube_dl/extractor/nhk.py b/youtube_dl/extractor/nhk.py

new file mode 100644 (file)

index 0000000..5c8cd76
--- /dev/null
+++ b/youtube_dl/extractor/nhk.py
@@ -0,0 +1,51 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import ExtractorError
+
+
+class NhkVodIE(InfoExtractor):
+    _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/vod/(?P<id>[^/]+/[^/?#&]+)'
+    _TEST = {
+        # Videos available only for a limited period of time. Visit
+        # http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples.
+        'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815',
+        'info_dict': {
+            'id': 'A1bnNiNTE6nY3jLllS-BIISfcC_PpvF5',
+            'ext': 'flv',
+            'title': 'TOKYO FASHION EXPRESS - The Kimono as Global Fashion',
+            'description': 'md5:db338ee6ce8204f415b754782f819824',
+            'series': 'TOKYO FASHION EXPRESS',
+            'episode': 'The Kimono as Global Fashion',
+        },
+        'skip': 'Videos available only for a limited period of time',
+    }
+    _API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        data = self._download_json(self._API_URL, video_id)
+
+        try:
+            episode = next(
+                e for e in data['data']['episodes']
+                if e.get('url') and video_id in e['url'])
+        except StopIteration:
+            raise ExtractorError('Unable to find episode')
+
+        embed_code = episode['vod_id']
+
+        title = episode.get('sub_title_clean') or episode['sub_title']
+        description = episode.get('description_clean') or episode.get('description')
+        series = episode.get('title_clean') or episode.get('title')
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'Ooyala',
+            'url': 'ooyala:%s' % embed_code,
+            'title': '%s - %s' % (series, title) if series and title else title,
+            'description': description,
+            'series': series,
+            'episode': title,
+        }
diff --git a/youtube_dl/extractor/nhl.py b/youtube_dl/extractor/nhl.py

index c1dea8b6c2a32da6728dee430d1167c04b1747f1..62ce800c072d2a316a0c6b8b7479cc89dc29b90d 100644 (file)
--- a/youtube_dl/extractor/nhl.py
+++ b/youtube_dl/extractor/nhl.py
@@ -8,10 +8,15 @@ from .common import InfoExtractor
  from ..compat import (
      compat_urlparse,
      compat_urllib_parse_urlencode,
-    compat_urllib_parse_urlparse
+    compat_urllib_parse_urlparse,
+    compat_str,
  )
  from ..utils import (
      unified_strdate,
+    determine_ext,
+    int_or_none,
+    parse_iso8601,
+    parse_duration,
  )
  
  
@@ -70,8 +75,8 @@ class NHLBaseInfoExtractor(InfoExtractor):
          return ret
  
  
-class NHLIE(NHLBaseInfoExtractor):
-    IE_NAME = 'nhl.com'
+class NHLVideocenterIE(NHLBaseInfoExtractor):
+    IE_NAME = 'nhl.com:videocenter'
      _VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console|embed)?(?:\?(?:.*?[?&])?)(?:id|hlg|playlist)=(?P<id>[-0-9a-zA-Z,]+)'
  
      _TESTS = [{
@@ -186,8 +191,8 @@ class NHLNewsIE(NHLBaseInfoExtractor):
          return self._real_extract_video(video_id)
  
  
-class NHLVideocenterIE(NHLBaseInfoExtractor):
-    IE_NAME = 'nhl.com:videocenter'
+class NHLVideocenterCategoryIE(NHLBaseInfoExtractor):
+    IE_NAME = 'nhl.com:videocenter:category'
      IE_DESC = 'NHL videocenter category'
      _VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?[^(id=)]*catid=(?P<catid>[0-9]+)(?![&?]id=).*?)?$'
      _TEST = {
@@ -236,3 +241,111 @@ class NHLVideocenterIE(NHLBaseInfoExtractor):
              'id': cat_id,
              'entries': [self._extract_video(v) for v in videos],
          }
+
+
+class NHLIE(InfoExtractor):
+    IE_NAME = 'nhl.com'
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>nhl|wch2016)\.com/(?:[^/]+/)*c-(?P<id>\d+)'
+    _SITES_MAP = {
+        'nhl': 'nhl',
+        'wch2016': 'wch',
+    }
+    _TESTS = [{
+        # type=video
+        'url': 'https://www.nhl.com/video/anisimov-cleans-up-mess/t-277752844/c-43663503',
+        'md5': '0f7b9a8f986fb4b4eeeece9a56416eaf',
+        'info_dict': {
+            'id': '43663503',
+            'ext': 'mp4',
+            'title': 'Anisimov cleans up mess',
+            'description': 'md5:a02354acdfe900e940ce40706939ca63',
+            'timestamp': 1461288600,
+            'upload_date': '20160422',
+        },
+    }, {
+        # type=article
+        'url': 'https://www.nhl.com/news/dennis-wideman-suspended/c-278258934',
+        'md5': '1f39f4ea74c1394dea110699a25b366c',
+        'info_dict': {
+            'id': '40784403',
+            'ext': 'mp4',
+            'title': 'Wideman suspended by NHL',
+            'description': 'Flames defenseman Dennis Wideman was banned 20 games for violation of Rule 40 (Physical Abuse of Officials)',
+            'upload_date': '20160204',
+            'timestamp': 1454544904,
+        },
+    }, {
+        # Some m3u8 URLs are invalid (https://github.com/rg3/youtube-dl/issues/10713)
+        'url': 'https://www.nhl.com/predators/video/poile-laviolette-on-subban-trade/t-277437416/c-44315003',
+        'md5': '50b2bb47f405121484dda3ccbea25459',
+        'info_dict': {
+            'id': '44315003',
+            'ext': 'mp4',
+            'title': 'Poile, Laviolette on Subban trade',
+            'description': 'General manager David Poile and head coach Peter Laviolette share their thoughts on acquiring P.K. Subban from Montreal (06/29/16)',
+            'timestamp': 1467242866,
+            'upload_date': '20160629',
+        },
+    }, {
+        'url': 'https://www.wch2016.com/video/caneur-best-of-game-2-micd-up/t-281230378/c-44983703',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.wch2016.com/news/3-stars-team-europe-vs-team-canada/c-282195068',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        tmp_id, site = mobj.group('id'), mobj.group('site')
+        video_data = self._download_json(
+            'https://nhl.bamcontent.com/%s/id/v1/%s/details/web-v1.json'
+            % (self._SITES_MAP[site], tmp_id), tmp_id)
+        if video_data.get('type') == 'article':
+            video_data = video_data['media']
+
+        video_id = compat_str(video_data['id'])
+        title = video_data['title']
+
+        formats = []
+        for playback in video_data.get('playbacks', []):
+            playback_url = playback.get('url')
+            if not playback_url:
+                continue
+            ext = determine_ext(playback_url)
+            if ext == 'm3u8':
+                m3u8_formats = self._extract_m3u8_formats(
+                    playback_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id=playback.get('name', 'hls'), fatal=False)
+                self._check_formats(m3u8_formats, video_id)
+                formats.extend(m3u8_formats)
+            else:
+                height = int_or_none(playback.get('height'))
+                formats.append({
+                    'format_id': playback.get('name', 'http' + ('-%dp' % height if height else '')),
+                    'url': playback_url,
+                    'width': int_or_none(playback.get('width')),
+                    'height': height,
+                })
+        self._sort_formats(formats, ('preference', 'width', 'height', 'tbr', 'format_id'))
+
+        thumbnails = []
+        for thumbnail_id, thumbnail_data in video_data.get('image', {}).get('cuts', {}).items():
+            thumbnail_url = thumbnail_data.get('src')
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'id': thumbnail_id,
+                'url': thumbnail_url,
+                'width': int_or_none(thumbnail_data.get('width')),
+                'height': int_or_none(thumbnail_data.get('height')),
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'timestamp': parse_iso8601(video_data.get('date')),
+            'duration': parse_duration(video_data.get('duration')),
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/nick.py b/youtube_dl/extractor/nick.py

index ce065f2b086adbca9c551afeb0d2437a59248d88..7672845bfd0c6ebbc08ef326f024f4a02bb44a71 100644 (file)
--- a/youtube_dl/extractor/nick.py
+++ b/youtube_dl/extractor/nick.py
@@ -1,13 +1,16 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .mtv import MTVServicesInfoExtractor
-from ..compat import compat_urllib_parse_urlencode
+from ..utils import update_url_query
  
  
  class NickIE(MTVServicesInfoExtractor):
+    # None of videos on the website are still alive?
      IE_NAME = 'nick.com'
-    _VALID_URL = r'https?://(?:www\.)?nick\.com/videos/clip/(?P<id>[^/?#.]+)'
+    _VALID_URL = r'https?://(?:www\.)?nick(?:jr)?\.com/(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
      _FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm'
      _TESTS = [{
          'url': 'http://www.nick.com/videos/clip/alvinnn-and-the-chipmunks-112-full-episode.html',
@@ -51,13 +54,70 @@ class NickIE(MTVServicesInfoExtractor):
                  }
              },
          ],
+    }, {
+        'url': 'http://www.nickjr.com/paw-patrol/videos/pups-save-a-goldrush-s3-ep302-full-episode/',
+        'only_matching': True,
      }]
  
      def _get_feed_query(self, uri):
-        return compat_urllib_parse_urlencode({
+        return {
              'feed': 'nick_arc_player_prime',
              'mgid': uri,
-        })
+        }
  
      def _extract_mgid(self, webpage):
          return self._search_regex(r'data-contenturi="([^"]+)', webpage, 'mgid')
+
+
+class NickDeIE(MTVServicesInfoExtractor):
+    IE_NAME = 'nick.de'
+    _VALID_URL = r'https?://(?:www\.)?(?P<host>nick\.de|nickelodeon\.(?:nl|at))/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'http://www.nick.de/playlist/3773-top-videos/videos/episode/17306-zu-wasser-und-zu-land-rauchende-erdnusse',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.nick.de/shows/342-icarly',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.nickelodeon.nl/shows/474-spongebob/videos/17403-een-kijkje-in-de-keuken-met-sandy-van-binnenuit',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.nickelodeon.at/playlist/3773-top-videos/videos/episode/77993-das-letzte-gefecht',
+        'only_matching': True,
+    }]
+
+    def _extract_mrss_url(self, webpage, host):
+        return update_url_query(self._search_regex(
+            r'data-mrss=(["\'])(?P<url>http.+?)\1', webpage, 'mrss url', group='url'),
+            {'siteKey': host})
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        host = mobj.group('host')
+
+        webpage = self._download_webpage(url, video_id)
+
+        mrss_url = self._extract_mrss_url(webpage, host)
+
+        return self._get_videos_info_from_url(mrss_url, video_id)
+
+
+class NickNightIE(NickDeIE):
+    IE_NAME = 'nicknight'
+    _VALID_URL = r'https?://(?:www\.)(?P<host>nicknight\.(?:de|at|tv))/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'http://www.nicknight.at/shows/977-awkward/videos/85987-nimmer-beste-freunde',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.nicknight.at/shows/977-awkward',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.nicknight.at/shows/1900-faking-it',
+        'only_matching': True,
+    }]
+
+    def _extract_mrss_url(self, webpage, *args):
+        return self._search_regex(
+            r'mrss\s*:\s*(["\'])(?P<url>http.+?)\1', webpage,
+            'mrss url', group='url')
diff --git a/youtube_dl/extractor/niconico.py b/youtube_dl/extractor/niconico.py

index dd75a48afcc9dfa4a728c600c836741785056770..a104e33f8bdea73540779e41db45d92c1249668a 100644 (file)
--- a/youtube_dl/extractor/niconico.py
+++ b/youtube_dl/extractor/niconico.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -252,7 +252,7 @@ class NiconicoIE(InfoExtractor):
  
  
  class NiconicoPlaylistIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.nicovideo\.jp/mylist/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/mylist/(?P<id>\d+)'
  
      _TEST = {
          'url': 'http://www.nicovideo.jp/mylist/27411728',
diff --git a/youtube_dl/extractor/ninecninemedia.py b/youtube_dl/extractor/ninecninemedia.py

new file mode 100644 (file)

index 0000000..ec4d675
--- /dev/null
+++ b/youtube_dl/extractor/ninecninemedia.py
@@ -0,0 +1,127 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    parse_iso8601,
+    float_or_none,
+    ExtractorError,
+    int_or_none,
+)
+
+
+class NineCNineMediaBaseIE(InfoExtractor):
+    _API_BASE_TEMPLATE = 'http://capi.9c9media.com/destinations/%s/platforms/desktop/contents/%s/'
+
+
+class NineCNineMediaStackIE(NineCNineMediaBaseIE):
+    IE_NAME = '9c9media:stack'
+    _VALID_URL = r'9c9media:stack:(?P<destination_code>[^:]+):(?P<content_id>\d+):(?P<content_package>\d+):(?P<id>\d+)'
+
+    def _real_extract(self, url):
+        destination_code, content_id, package_id, stack_id = re.match(self._VALID_URL, url).groups()
+        stack_base_url_template = self._API_BASE_TEMPLATE + 'contentpackages/%s/stacks/%s/manifest.'
+        stack_base_url = stack_base_url_template % (destination_code, content_id, package_id, stack_id)
+
+        formats = []
+        formats.extend(self._extract_m3u8_formats(
+            stack_base_url + 'm3u8', stack_id, 'mp4',
+            'm3u8_native', m3u8_id='hls', fatal=False))
+        formats.extend(self._extract_f4m_formats(
+            stack_base_url + 'f4m', stack_id,
+            f4m_id='hds', fatal=False))
+        mp4_url = self._download_webpage(stack_base_url + 'pd', stack_id, fatal=False)
+        if mp4_url:
+            formats.append({
+                'url': mp4_url,
+                'format_id': 'mp4',
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': stack_id,
+            'formats': formats,
+        }
+
+
+class NineCNineMediaIE(NineCNineMediaBaseIE):
+    IE_NAME = '9c9media'
+    _VALID_URL = r'9c9media:(?P<destination_code>[^:]+):(?P<id>\d+)'
+
+    def _real_extract(self, url):
+        destination_code, content_id = re.match(self._VALID_URL, url).groups()
+        api_base_url = self._API_BASE_TEMPLATE % (destination_code, content_id)
+        content = self._download_json(api_base_url, content_id, query={
+            '$include': '[Media,Season,ContentPackages]',
+        })
+        title = content['Name']
+        if len(content['ContentPackages']) > 1:
+            raise ExtractorError('multiple content packages')
+        content_package = content['ContentPackages'][0]
+        package_id = content_package['Id']
+        content_package_url = api_base_url + 'contentpackages/%s/' % package_id
+        content_package = self._download_json(content_package_url, content_id)
+
+        if content_package.get('Constraints', {}).get('Security', {}).get('Type') == 'adobe-drm':
+            raise ExtractorError('This video is DRM protected.', expected=True)
+
+        stacks = self._download_json(content_package_url + 'stacks/', package_id)['Items']
+        multistacks = len(stacks) > 1
+
+        thumbnails = []
+        for image in content.get('Images', []):
+            image_url = image.get('Url')
+            if not image_url:
+                continue
+            thumbnails.append({
+                'url': image_url,
+                'width': int_or_none(image.get('Width')),
+                'height': int_or_none(image.get('Height')),
+            })
+
+        tags, categories = [], []
+        for source_name, container in (('Tags', tags), ('Genres', categories)):
+            for e in content.get(source_name, []):
+                e_name = e.get('Name')
+                if not e_name:
+                    continue
+                container.append(e_name)
+
+        description = content.get('Desc') or content.get('ShortDesc')
+        season = content.get('Season', {})
+        base_info = {
+            'description': description,
+            'timestamp': parse_iso8601(content.get('BroadcastDateTime')),
+            'episode_number': int_or_none(content.get('Episode')),
+            'season': season.get('Name'),
+            'season_number': season.get('Number'),
+            'season_id': season.get('Id'),
+            'series': content.get('Media', {}).get('Name'),
+            'tags': tags,
+            'categories': categories,
+        }
+
+        entries = []
+        for stack in stacks:
+            stack_id = compat_str(stack['Id'])
+            entry = {
+                '_type': 'url_transparent',
+                'url': '9c9media:stack:%s:%s:%s:%s' % (destination_code, content_id, package_id, stack_id),
+                'id': stack_id,
+                'title': '%s_part%s' % (title, stack['Name']) if multistacks else title,
+                'duration': float_or_none(stack.get('Duration')),
+                'ie_key': 'NineCNineMediaStack',
+            }
+            entry.update(base_info)
+            entries.append(entry)
+
+        return {
+            '_type': 'multi_video',
+            'id': content_id,
+            'title': title,
+            'description': description,
+            'entries': entries,
+        }
diff --git a/youtube_dl/extractor/ninenow.py b/youtube_dl/extractor/ninenow.py

new file mode 100644 (file)

index 0000000..351bea7
--- /dev/null
+++ b/youtube_dl/extractor/ninenow.py
@@ -0,0 +1,85 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    int_or_none,
+    float_or_none,
+    ExtractorError,
+)
+
+
+class NineNowIE(InfoExtractor):
+    IE_NAME = '9now.com.au'
+    _VALID_URL = r'https?://(?:www\.)?9now\.com\.au/(?:[^/]+/){2}(?P<id>[^/?#]+)'
+    _TESTS = [{
+        # clip
+        'url': 'https://www.9now.com.au/afl-footy-show/2016/clip-ciql02091000g0hp5oktrnytc',
+        'md5': '17cf47d63ec9323e562c9957a968b565',
+        'info_dict': {
+            'id': '16801',
+            'ext': 'mp4',
+            'title': 'St. Kilda\'s Joey Montagna on the potential for a player\'s strike',
+            'description': 'Is a boycott of the NAB Cup "on the table"?',
+            'uploader_id': '4460760524001',
+            'upload_date': '20160713',
+            'timestamp': 1468421266,
+        },
+        'skip': 'Only available in Australia',
+    }, {
+        # episode
+        'url': 'https://www.9now.com.au/afl-footy-show/2016/episode-19',
+        'only_matching': True,
+    }, {
+        # DRM protected
+        'url': 'https://www.9now.com.au/andrew-marrs-history-of-the-world/season-1/episode-1',
+        'only_matching': True,
+    }]
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4460760524001/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        page_data = self._parse_json(self._search_regex(
+            r'window\.__data\s*=\s*({.*?});', webpage,
+            'page data'), display_id)
+
+        for kind in ('episode', 'clip'):
+            current_key = page_data.get(kind, {}).get(
+                'current%sKey' % kind.capitalize())
+            if not current_key:
+                continue
+            cache = page_data.get(kind, {}).get('%sCache' % kind, {})
+            if not cache:
+                continue
+            common_data = (cache.get(current_key) or list(cache.values())[0])[kind]
+            break
+        else:
+            raise ExtractorError('Unable to find video data')
+
+        video_data = common_data['video']
+
+        if video_data.get('drm'):
+            raise ExtractorError('This video is DRM protected.', expected=True)
+
+        brightcove_id = video_data.get('brightcoveId') or 'ref:' + video_data['referenceId']
+        video_id = compat_str(video_data.get('id') or brightcove_id)
+        title = common_data['name']
+
+        thumbnails = [{
+            'id': thumbnail_id,
+            'url': thumbnail_url,
+            'width': int_or_none(thumbnail_id[1:])
+        } for thumbnail_id, thumbnail_url in common_data.get('image', {}).get('sizes', {}).items()]
+
+        return {
+            '_type': 'url_transparent',
+            'url': self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
+            'id': video_id,
+            'title': title,
+            'description': common_data.get('description'),
+            'duration': float_or_none(video_data.get('duration'), 1000),
+            'thumbnails': thumbnails,
+            'ie_key': 'BrightcoveNew',
+        }
diff --git a/youtube_dl/extractor/nintendo.py b/youtube_dl/extractor/nintendo.py

new file mode 100644 (file)

index 0000000..4b4e66b
--- /dev/null
+++ b/youtube_dl/extractor/nintendo.py
@@ -0,0 +1,46 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from .ooyala import OoyalaIE
+from ..utils import unescapeHTML
+
+
+class NintendoIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?nintendo\.com/games/detail/(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'http://www.nintendo.com/games/detail/yEiAzhU2eQI1KZ7wOHhngFoAHc1FpHwj',
+        'info_dict': {
+            'id': 'MzMmticjp0VPzO3CCj4rmFOuohEuEWoW',
+            'ext': 'flv',
+            'title': 'Duck Hunt Wii U VC NES - Trailer',
+            'duration': 60.326,
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': ['Ooyala'],
+    }, {
+        'url': 'http://www.nintendo.com/games/detail/tokyo-mirage-sessions-fe-wii-u',
+        'info_dict': {
+            'id': 'tokyo-mirage-sessions-fe-wii-u',
+            'title': 'Tokyo Mirage Sessions ♯FE',
+        },
+        'playlist_count': 3,
+    }]
+
+    def _real_extract(self, url):
+        page_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, page_id)
+
+        entries = [
+            OoyalaIE._build_url_result(m.group('code'))
+            for m in re.finditer(
+                r'class=(["\'])embed-video\1[^>]+data-video-code=(["\'])(?P<code>(?:(?!\2).)+)\2',
+                webpage)]
+
+        return self.playlist_result(
+            entries, page_id, unescapeHTML(self._og_search_title(webpage, fatal=False)))
diff --git a/youtube_dl/extractor/nobelprize.py b/youtube_dl/extractor/nobelprize.py

new file mode 100644 (file)

index 0000000..4dfdb09
--- /dev/null
+++ b/youtube_dl/extractor/nobelprize.py
@@ -0,0 +1,62 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    js_to_json,
+    mimetype2ext,
+    determine_ext,
+    update_url_query,
+    get_element_by_attribute,
+    int_or_none,
+)
+
+
+class NobelPrizeIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?nobelprize\.org/mediaplayer.*?\bid=(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.nobelprize.org/mediaplayer/?id=2636',
+        'md5': '04c81e5714bb36cc4e2232fee1d8157f',
+        'info_dict': {
+            'id': '2636',
+            'ext': 'mp4',
+            'title': 'Announcement of the 2016 Nobel Prize in Physics',
+            'description': 'md5:05beba57f4f5a4bbd4cf2ef28fcff739',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        media = self._parse_json(self._search_regex(
+            r'(?s)var\s*config\s*=\s*({.+?});', webpage,
+            'config'), video_id, js_to_json)['media']
+        title = media['title']
+
+        formats = []
+        for source in media.get('source', []):
+            source_src = source.get('src')
+            if not source_src:
+                continue
+            ext = mimetype2ext(source.get('type')) or determine_ext(source_src)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    source_src, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    update_url_query(source_src, {'hdcore': '3.7.0'}),
+                    video_id, f4m_id='hds', fatal=False))
+            else:
+                formats.append({
+                    'url': source_src,
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': get_element_by_attribute('itemprop', 'description', webpage),
+            'duration': int_or_none(media.get('duration')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/noco.py b/youtube_dl/extractor/noco.py

index 06f2bda07dd5db2c54e1e0492f244dbf0fc5a526..70ff2ab3653525664b4f1ae590393ee680a2f6e5 100644 (file)
--- a/youtube_dl/extractor/noco.py
+++ b/youtube_dl/extractor/noco.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/normalboots.py b/youtube_dl/extractor/normalboots.py

index 77e09107299824f5ae4063817d73e505e893c2af..6aa0895b82e5949657a62b009addc8e93885936e 100644 (file)
--- a/youtube_dl/extractor/normalboots.py
+++ b/youtube_dl/extractor/normalboots.py
@@ -1,7 +1,8 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from .screenwavemedia import ScreenwaveMediaIE
  
  from ..utils import (
      unified_strdate,
@@ -12,7 +13,6 @@ class NormalbootsIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?normalboots\.com/video/(?P<id>[0-9a-z-]*)/?$'
      _TEST = {
          'url': 'http://normalboots.com/video/home-alone-games-jontron/',
-        'md5': '8bf6de238915dd501105b44ef5f1e0f6',
          'info_dict': {
              'id': 'home-alone-games-jontron',
              'ext': 'mp4',
@@ -22,9 +22,10 @@ class NormalbootsIE(InfoExtractor):
              'upload_date': '20140125',
          },
          'params': {
-            # rtmp download
+            # m3u8 download
              'skip_download': True,
          },
+        'add_ie': ['ScreenwaveMedia'],
      }
  
      def _real_extract(self, url):
@@ -38,16 +39,15 @@ class NormalbootsIE(InfoExtractor):
              r'<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>',
              webpage, 'date', fatal=False))
  
-        player_url = self._html_search_regex(
-            r'<iframe\swidth="[0-9]+"\sheight="[0-9]+"\ssrc="(?P<url>[\S]+)"',
-            webpage, 'player url')
-        player_page = self._download_webpage(player_url, video_id)
-        video_url = self._html_search_regex(
-            r"file:\s'(?P<file>[^']+\.mp4)'", player_page, 'file')
+        screenwavemedia_url = self._html_search_regex(
+            ScreenwaveMediaIE.EMBED_PATTERN, webpage, 'screenwave URL',
+            group='url')
  
          return {
+            '_type': 'url_transparent',
              'id': video_id,
-            'url': video_url,
+            'url': screenwavemedia_url,
+            'ie_key': ScreenwaveMediaIE.ie_key(),
              'title': self._og_search_title(webpage),
              'description': self._og_search_description(webpage),
              'thumbnail': self._og_search_thumbnail(webpage),
diff --git a/youtube_dl/extractor/nova.py b/youtube_dl/extractor/nova.py

index 17671ad398b9e9a8148bceff74db678969d26d3f..103952345aa98ed186515452baf2f945409ffdaa 100644 (file)
--- a/youtube_dl/extractor/nova.py
+++ b/youtube_dl/extractor/nova.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/novamov.py b/youtube_dl/extractor/novamov.py

index a131f7dbdd95f5cbd39add52cf3a721068085e78..3bbd4735502e113fcc46a07981ff5863c52fef15 100644 (file)
--- a/youtube_dl/extractor/novamov.py
+++ b/youtube_dl/extractor/novamov.py
@@ -16,7 +16,14 @@ class NovaMovIE(InfoExtractor):
      IE_NAME = 'novamov'
      IE_DESC = 'NovaMov'
  
-    _VALID_URL_TEMPLATE = r'http://(?:(?:www\.)?%(host)s/(?:file|video|mobile/#/videos)/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<id>[a-z\d]{13})'
+    _VALID_URL_TEMPLATE = r'''(?x)
+                            http://
+                                (?:
+                                    (?:www\.)?%(host)s/(?:file|video|mobile/\#/videos)/|
+                                    (?:(?:embed|www)\.)%(host)s/embed(?:\.php|/)?\?(?:.*?&)?\bv=
+                                )
+                                (?P<id>[a-z\d]{13})
+                            '''
      _VALID_URL = _VALID_URL_TEMPLATE % {'host': 'novamov\.com'}
  
      _HOST = 'www.novamov.com'
@@ -27,17 +34,7 @@ class NovaMovIE(InfoExtractor):
      _DESCRIPTION_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>[^<]+</h3><p>([^<]+)</p>'
      _URL_TEMPLATE = 'http://%s/video/%s'
  
-    _TEST = {
-        'url': 'http://www.novamov.com/video/4rurhn9x446jj',
-        'md5': '7205f346a52bbeba427603ba10d4b935',
-        'info_dict': {
-            'id': '4rurhn9x446jj',
-            'ext': 'flv',
-            'title': 'search engine optimization',
-            'description': 'search engine optimization is used to rank the web page in the google search engine'
-        },
-        'skip': '"Invalid token" errors abound (in web interface as well as youtube-dl, there is nothing we can do about it.)'
-    }
+    _TEST = None
  
      def _check_existence(self, webpage, video_id):
          if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
@@ -81,7 +78,7 @@ class NovaMovIE(InfoExtractor):
  
          filekey = extract_filekey()
  
-        title = self._html_search_regex(self._TITLE_REGEX, webpage, 'title', fatal=False)
+        title = self._html_search_regex(self._TITLE_REGEX, webpage, 'title')
          description = self._html_search_regex(self._DESCRIPTION_REGEX, webpage, 'description', default='', fatal=False)
  
          api_response = self._download_webpage(
@@ -187,3 +184,29 @@ class CloudTimeIE(NovaMovIE):
      _TITLE_REGEX = r'<div[^>]+class=["\']video_det["\'][^>]*>\s*<strong>([^<]+)</strong>'
  
      _TEST = None
+
+
+class AuroraVidIE(NovaMovIE):
+    IE_NAME = 'auroravid'
+    IE_DESC = 'AuroraVid'
+
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'auroravid\.to'}
+
+    _HOST = 'www.auroravid.to'
+
+    _FILE_DELETED_REGEX = r'This file no longer exists on our servers!<'
+
+    _TESTS = [{
+        'url': 'http://www.auroravid.to/video/4rurhn9x446jj',
+        'md5': '7205f346a52bbeba427603ba10d4b935',
+        'info_dict': {
+            'id': '4rurhn9x446jj',
+            'ext': 'flv',
+            'title': 'search engine optimization',
+            'description': 'search engine optimization is used to rank the web page in the google search engine'
+        },
+        'skip': '"Invalid token" errors abound (in web interface as well as youtube-dl, there is nothing we can do about it.)'
+    }, {
+        'url': 'http://www.auroravid.to/embed/?v=4rurhn9x446jj',
+        'only_matching': True,
+    }]
diff --git a/youtube_dl/extractor/nowness.py b/youtube_dl/extractor/nowness.py

index 446f5901c1701166fa1a345827d839a25f6d98af..7e53463164b281e84a349a6fc382f5e203f278a4 100644 (file)
--- a/youtube_dl/extractor/nowness.py
+++ b/youtube_dl/extractor/nowness.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .brightcove import (
@@ -63,8 +63,11 @@ class NownessIE(NownessBaseIE):
              'title': 'Candor: The Art of Gesticulation',
              'description': 'Candor: The Art of Gesticulation',
              'thumbnail': 're:^https?://.*\.jpg',
-            'uploader': 'Nowness',
+            'timestamp': 1446745676,
+            'upload_date': '20151105',
+            'uploader_id': '2385340575001',
          },
+        'add_ie': ['BrightcoveNew'],
      }, {
          'url': 'https://cn.nowness.com/story/kasper-bjorke-ft-jaakko-eino-kalevi-tnr',
          'md5': 'e79cf125e387216f86b2e0a5b5c63aa3',
@@ -74,8 +77,11 @@ class NownessIE(NownessBaseIE):
              'title': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
              'description': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
              'thumbnail': 're:^https?://.*\.jpg',
-            'uploader': 'Nowness',
+            'timestamp': 1407315371,
+            'upload_date': '20140806',
+            'uploader_id': '2385340575001',
          },
+        'add_ie': ['BrightcoveNew'],
      }, {
          # vimeo
          'url': 'https://www.nowness.com/series/nowness-picks/jean-luc-godard-supercut',
@@ -90,6 +96,7 @@ class NownessIE(NownessBaseIE):
              'uploader': 'Cinema Sem Lei',
              'uploader_id': 'cinemasemlei',
          },
+        'add_ie': ['Vimeo'],
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/npo.py b/youtube_dl/extractor/npo.py

index 87f5675c7ff8b14169291420feb9bcf85edf894d..c91f5846171be2a720523a4531313703d18920fd 100644 (file)
--- a/youtube_dl/extractor/npo.py
+++ b/youtube_dl/extractor/npo.py
@@ -3,12 +3,15 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_HTTPError
  from ..utils import (
      fix_xml_ampersands,
+    orderedSet,
      parse_duration,
      qualities,
      strip_jsonp,
      unified_strdate,
+    ExtractorError,
  )
  
  
@@ -180,9 +183,16 @@ class NPOIE(NPOBaseIE):
                      continue
                  streams = format_info.get('streams')
                  if streams:
-                    video_info = self._download_json(
-                        streams[0] + '&type=json',
-                        video_id, 'Downloading %s stream JSON' % format_id)
+                    try:
+                        video_info = self._download_json(
+                            streams[0] + '&type=json',
+                            video_id, 'Downloading %s stream JSON' % format_id)
+                    except ExtractorError as ee:
+                        if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
+                            error = (self._parse_json(ee.cause.read().decode(), video_id, fatal=False) or {}).get('errorstring')
+                            if error:
+                                raise ExtractorError(error, expected=True)
+                        raise
                  else:
                      video_info = format_info
                  video_url = video_info.get('url')
@@ -429,7 +439,7 @@ class SchoolTVIE(InfoExtractor):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
          video_id = self._search_regex(
-            r'data-mid=(["\'])(?P<id>.+?)\1', webpage, 'video_id', group='id')
+            r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video_id', group='id')
          return {
              '_type': 'url_transparent',
              'ie_key': 'NPO',
@@ -438,9 +448,30 @@ class SchoolTVIE(InfoExtractor):
          }
  
  
-class VPROIE(NPOIE):
+class NPOPlaylistBaseIE(NPOIE):
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        entries = [
+            self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
+            for video_id in orderedSet(re.findall(self._PLAYLIST_ENTRY_RE, webpage))
+        ]
+
+        playlist_title = self._html_search_regex(
+            self._PLAYLIST_TITLE_RE, webpage, 'playlist title',
+            default=None) or self._og_search_title(webpage)
+
+        return self.playlist_result(entries, playlist_id, playlist_title)
+
+
+class VPROIE(NPOPlaylistBaseIE):
      IE_NAME = 'vpro'
-    _VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html'
+    _VALID_URL = r'https?://(?:www\.)?(?:(?:tegenlicht\.)?vpro|2doc)\.nl/(?:[^/]+/)*(?P<id>[^/]+)\.html'
+    _PLAYLIST_TITLE_RE = (r'<h1[^>]+class=["\'].*?\bmedia-platform-title\b.*?["\'][^>]*>([^<]+)',
+                          r'<h5[^>]+class=["\'].*?\bmedia-platform-subtitle\b.*?["\'][^>]*>([^<]+)')
+    _PLAYLIST_ENTRY_RE = r'data-media-id="([^"]+)"'
  
      _TESTS = [
          {
@@ -453,12 +484,13 @@ class VPROIE(NPOIE):
                  'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
                  'upload_date': '20130225',
              },
+            'skip': 'Video gone',
          },
          {
              'url': 'http://www.vpro.nl/programmas/2doc/2015/sergio-herman.html',
              'info_dict': {
                  'id': 'sergio-herman',
-                'title': 'Sergio Herman: Fucking perfect',
+                'title': 'sergio herman: fucking perfect',
              },
              'playlist_count': 2,
          },
@@ -467,54 +499,61 @@ class VPROIE(NPOIE):
              'url': 'http://www.vpro.nl/programmas/2doc/2015/education-education.html',
              'info_dict': {
                  'id': 'education-education',
-                'title': '2Doc',
+                'title': 'education education',
+            },
+            'playlist_count': 2,
+        },
+        {
+            'url': 'http://www.2doc.nl/documentaires/series/2doc/2015/oktober/de-tegenprestatie.html',
+            'info_dict': {
+                'id': 'de-tegenprestatie',
+                'title': 'De Tegenprestatie',
              },
              'playlist_count': 2,
+        }, {
+            'url': 'http://www.2doc.nl/speel~VARA_101375237~mh17-het-verdriet-van-nederland~.html',
+            'info_dict': {
+                'id': 'VARA_101375237',
+                'ext': 'm4v',
+                'title': 'MH17: Het verdriet van Nederland',
+                'description': 'md5:09e1a37c1fdb144621e22479691a9f18',
+                'upload_date': '20150716',
+            },
+            'params': {
+                # Skip because of m3u8 download
+                'skip_download': True
+            },
          }
      ]
  
-    def _real_extract(self, url):
-        playlist_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, playlist_id)
-
-        entries = [
-            self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
-            for video_id in re.findall(r'data-media-id="([^"]+)"', webpage)
-        ]
-
-        playlist_title = self._search_regex(
-            r'<title>\s*([^>]+?)\s*-\s*Teledoc\s*-\s*VPRO\s*</title>',
-            webpage, 'playlist title', default=None) or self._og_search_title(webpage)
-
-        return self.playlist_result(entries, playlist_id, playlist_title)
-
  
-class WNLIE(InfoExtractor):
+class WNLIE(NPOPlaylistBaseIE):
+    IE_NAME = 'wnl'
      _VALID_URL = r'https?://(?:www\.)?omroepwnl\.nl/video/detail/(?P<id>[^/]+)__\d+'
+    _PLAYLIST_TITLE_RE = r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>'
+    _PLAYLIST_ENTRY_RE = r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>Deel \d+'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.omroepwnl.nl/video/detail/vandaag-de-dag-6-mei__060515',
          'info_dict': {
              'id': 'vandaag-de-dag-6-mei',
              'title': 'Vandaag de Dag 6 mei',
          },
          'playlist_count': 4,
-    }
-
-    def _real_extract(self, url):
-        playlist_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, playlist_id)
+    }]
  
-        entries = [
-            self.url_result('npo:%s' % video_id, 'NPO')
-            for video_id, part in re.findall(
-                r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>(Deel \d+)', webpage)
-        ]
  
-        playlist_title = self._html_search_regex(
-            r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>',
-            webpage, 'playlist title')
+class AndereTijdenIE(NPOPlaylistBaseIE):
+    IE_NAME = 'anderetijden'
+    _VALID_URL = r'https?://(?:www\.)?anderetijden\.nl/programma/(?:[^/]+/)+(?P<id>[^/?#&]+)'
+    _PLAYLIST_TITLE_RE = r'(?s)<h1[^>]+class=["\'].*?\bpage-title\b.*?["\'][^>]*>(.+?)</h1>'
+    _PLAYLIST_ENTRY_RE = r'<figure[^>]+class=["\']episode-container episode-page["\'][^>]+data-prid=["\'](.+?)["\']'
  
-        return self.playlist_result(entries, playlist_id, playlist_title)
+    _TESTS = [{
+        'url': 'http://anderetijden.nl/programma/1/Andere-Tijden/aflevering/676/Duitse-soldaten-over-de-Slag-bij-Arnhem',
+        'info_dict': {
+            'id': 'Duitse-soldaten-over-de-Slag-bij-Arnhem',
+            'title': 'Duitse soldaten over de Slag bij Arnhem',
+        },
+        'playlist_count': 3,
+    }]
diff --git a/youtube_dl/extractor/nrk.py b/youtube_dl/extractor/nrk.py

index 9df20082224f84099657d2c2415cb9b2e66df8b6..c89aac63ee90f133074d8ade8b7af23cf020f148 100644 (file)
--- a/youtube_dl/extractor/nrk.py
+++ b/youtube_dl/extractor/nrk.py
@@ -1,93 +1,263 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
+import random
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urlparse,
-    compat_urllib_parse_unquote,
-)
+from ..compat import compat_urllib_parse_unquote
  from ..utils import (
-    determine_ext,
      ExtractorError,
-    float_or_none,
+    int_or_none,
+    parse_age_limit,
      parse_duration,
-    unified_strdate,
  )
  
  
-class NRKIE(InfoExtractor):
-    _VALID_URL = r'(?:nrk:|https?://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)'
+class NRKBaseIE(InfoExtractor):
+    _faked_ip = None
  
-    _TESTS = [
-        {
-            'url': 'http://www.nrk.no/video/PS*150533',
-            'md5': 'bccd850baebefe23b56d708a113229c2',
-            'info_dict': {
-                'id': '150533',
-                'ext': 'flv',
-                'title': 'Dompap og andre fugler i Piip-Show',
-                'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
-                'duration': 263,
-            }
-        },
-        {
-            'url': 'http://www.nrk.no/video/PS*154915',
-            'md5': '0b1493ba1aae7d9579a5ad5531bc395a',
-            'info_dict': {
-                'id': '154915',
-                'ext': 'flv',
-                'title': 'Slik høres internett ut når du er blind',
-                'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568',
-                'duration': 20,
-            }
-        },
-    ]
+    def _download_webpage_handle(self, *args, **kwargs):
+        # NRK checks X-Forwarded-For HTTP header in order to figure out the
+        # origin of the client behind proxy. This allows to bypass geo
+        # restriction by faking this header's value to some Norway IP.
+        # We will do so once we encounter any geo restriction error.
+        if self._faked_ip:
+            # NB: str is intentional
+            kwargs.setdefault(str('headers'), {})['X-Forwarded-For'] = self._faked_ip
+        return super(NRKBaseIE, self)._download_webpage_handle(*args, **kwargs)
+
+    def _fake_ip(self):
+        # Use fake IP from 37.191.128.0/17 in order to workaround geo
+        # restriction
+        def octet(lb=0, ub=255):
+            return random.randint(lb, ub)
+        self._faked_ip = '37.191.%d.%d' % (octet(128), octet())
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          data = self._download_json(
-            'http://v8.psapi.nrk.no/mediaelement/%s' % video_id,
-            video_id, 'Downloading media JSON')
-
-        media_url = data.get('mediaUrl')
-
-        if not media_url:
-            if data['usageRights']['isGeoBlocked']:
-                raise ExtractorError(
-                    'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
-                    expected=True)
-
-        if determine_ext(media_url) == 'f4m':
-            formats = self._extract_f4m_formats(
-                media_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81', video_id, f4m_id='hds')
-            self._sort_formats(formats)
-        else:
-            formats = [{
-                'url': media_url,
-                'ext': 'flv',
-            }]
+            'http://%s/mediaelement/%s' % (self._API_HOST, video_id),
+            video_id, 'Downloading mediaelement JSON')
+
+        title = data.get('fullTitle') or data.get('mainTitle') or data['title']
+        video_id = data.get('id') or video_id
  
-        duration = parse_duration(data.get('duration'))
+        http_headers = {'X-Forwarded-For': self._faked_ip} if self._faked_ip else {}
  
+        entries = []
+
+        media_assets = data.get('mediaAssets')
+        if media_assets and isinstance(media_assets, list):
+            def video_id_and_title(idx):
+                return ((video_id, title) if len(media_assets) == 1
+                        else ('%s-%d' % (video_id, idx), '%s (Part %d)' % (title, idx)))
+            for num, asset in enumerate(media_assets, 1):
+                asset_url = asset.get('url')
+                if not asset_url:
+                    continue
+                formats = self._extract_akamai_formats(asset_url, video_id)
+                if not formats:
+                    continue
+                self._sort_formats(formats)
+                entry_id, entry_title = video_id_and_title(num)
+                duration = parse_duration(asset.get('duration'))
+                subtitles = {}
+                for subtitle in ('webVtt', 'timedText'):
+                    subtitle_url = asset.get('%sSubtitlesUrl' % subtitle)
+                    if subtitle_url:
+                        subtitles.setdefault('no', []).append({
+                            'url': compat_urllib_parse_unquote(subtitle_url)
+                        })
+                entries.append({
+                    'id': asset.get('carrierId') or entry_id,
+                    'title': entry_title,
+                    'duration': duration,
+                    'subtitles': subtitles,
+                    'formats': formats,
+                    'http_headers': http_headers,
+                })
+
+        if not entries:
+            media_url = data.get('mediaUrl')
+            if media_url:
+                formats = self._extract_akamai_formats(media_url, video_id)
+                self._sort_formats(formats)
+                duration = parse_duration(data.get('duration'))
+                entries = [{
+                    'id': video_id,
+                    'title': title,
+                    'duration': duration,
+                    'formats': formats,
+                }]
+
+        if not entries:
+            message_type = data.get('messageType', '')
+            # Can be ProgramIsGeoBlocked or ChannelIsGeoBlocked*
+            if 'IsGeoBlocked' in message_type and not self._faked_ip:
+                self.report_warning(
+                    'Video is geo restricted, trying to fake IP')
+                self._fake_ip()
+                return self._real_extract(url)
+
+            MESSAGES = {
+                'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
+                'ProgramRightsHasExpired': 'Programmet har gått ut',
+                'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
+            }
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, MESSAGES.get(
+                    message_type, message_type)),
+                expected=True)
+
+        conviva = data.get('convivaStatistics') or {}
+        series = conviva.get('seriesName') or data.get('seriesTitle')
+        episode = conviva.get('episodeName') or data.get('episodeNumberOrDate')
+
+        thumbnails = None
          images = data.get('images')
-        if images:
-            thumbnails = images['webImages']
-            thumbnails.sort(key=lambda image: image['pixelWidth'])
-            thumbnail = thumbnails[-1]['imageUrl']
-        else:
-            thumbnail = None
-
-        return {
-            'id': video_id,
-            'title': data['title'],
-            'description': data['description'],
-            'duration': duration,
-            'thumbnail': thumbnail,
-            'formats': formats,
+        if images and isinstance(images, dict):
+            web_images = images.get('webImages')
+            if isinstance(web_images, list):
+                thumbnails = [{
+                    'url': image['imageUrl'],
+                    'width': int_or_none(image.get('width')),
+                    'height': int_or_none(image.get('height')),
+                } for image in web_images if image.get('imageUrl')]
+
+        description = data.get('description')
+
+        common_info = {
+            'description': description,
+            'series': series,
+            'episode': episode,
+            'age_limit': parse_age_limit(data.get('legalAge')),
+            'thumbnails': thumbnails,
+        }
+
+        vcodec = 'none' if data.get('mediaType') == 'Audio' else None
+
+        # TODO: extract chapters when https://github.com/rg3/youtube-dl/pull/9409 is merged
+
+        for entry in entries:
+            entry.update(common_info)
+            for f in entry['formats']:
+                f['vcodec'] = vcodec
+
+        return self.playlist_result(entries, video_id, title, description)
+
+
+class NRKIE(NRKBaseIE):
+    _VALID_URL = r'''(?x)
+                        (?:
+                            nrk:|
+                            https?://
+                                (?:
+                                    (?:www\.)?nrk\.no/video/PS\*|
+                                    v8-psapi\.nrk\.no/mediaelement/
+                                )
+                            )
+                            (?P<id>[^/?#&]+)
+                        '''
+    _API_HOST = 'v8.psapi.nrk.no'
+    _TESTS = [{
+        # video
+        'url': 'http://www.nrk.no/video/PS*150533',
+        'md5': '2f7f6eeb2aacdd99885f355428715cfa',
+        'info_dict': {
+            'id': '150533',
+            'ext': 'mp4',
+            'title': 'Dompap og andre fugler i Piip-Show',
+            'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
+            'duration': 263,
          }
+    }, {
+        # audio
+        'url': 'http://www.nrk.no/video/PS*154915',
+        # MD5 is unstable
+        'info_dict': {
+            'id': '154915',
+            'ext': 'flv',
+            'title': 'Slik høres internett ut når du er blind',
+            'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568',
+            'duration': 20,
+        }
+    }, {
+        'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
+        'only_matching': True,
+    }, {
+        'url': 'https://v8-psapi.nrk.no/mediaelement/ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
+        'only_matching': True,
+    }]
+
+
+class NRKTVIE(NRKBaseIE):
+    IE_DESC = 'NRK TV and NRK Radio'
+    _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
+    _API_HOST = 'psapi-we.nrk.no'
+
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
+        'md5': '4e9ca6629f09e588ed240fb11619922a',
+        'info_dict': {
+            'id': 'MUHH48000314AA',
+            'ext': 'mp4',
+            'title': '20 spørsmål 23.05.2014',
+            'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
+            'duration': 1741,
+        },
+    }, {
+        'url': 'https://tv.nrk.no/program/mdfp15000514',
+        'md5': '43d0be26663d380603a9cf0c24366531',
+        'info_dict': {
+            'id': 'MDFP15000514CA',
+            'ext': 'mp4',
+            'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting 24.05.2014',
+            'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db',
+            'duration': 4605,
+        },
+    }, {
+        # single playlist video
+        'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
+        'md5': 'adbd1dbd813edaf532b0a253780719c2',
+        'info_dict': {
+            'id': 'MSPO40010515-part2',
+            'ext': 'flv',
+            'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
+            'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
+        },
+        'skip': 'Only works from Norway',
+    }, {
+        'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
+        'playlist': [{
+            'md5': '9480285eff92d64f06e02a5367970a7a',
+            'info_dict': {
+                'id': 'MSPO40010515-part1',
+                'ext': 'flv',
+                'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 1:2)',
+                'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
+            },
+        }, {
+            'md5': 'adbd1dbd813edaf532b0a253780719c2',
+            'info_dict': {
+                'id': 'MSPO40010515-part2',
+                'ext': 'flv',
+                'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
+                'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
+            },
+        }],
+        'info_dict': {
+            'id': 'MSPO40010515',
+            'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn',
+            'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
+            'duration': 6947.52,
+        },
+        'skip': 'Only works from Norway',
+    }, {
+        'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
+        'only_matching': True,
+    }]
  
  
  class NRKPlaylistIE(InfoExtractor):
@@ -132,206 +302,34 @@ class NRKPlaylistIE(InfoExtractor):
  
  class NRKSkoleIE(InfoExtractor):
      IE_DESC = 'NRK Skole'
-    _VALID_URL = r'https?://(?:www\.)?nrk\.no/skole/klippdetalj?.*\btopic=(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:www\.)?nrk\.no/skole/?\?.*\bmediaId=(?P<id>\d+)'
  
      _TESTS = [{
-        'url': 'http://nrk.no/skole/klippdetalj?topic=nrk:klipp/616532',
-        'md5': '04cd85877cc1913bce73c5d28a47e00f',
+        'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
+        'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6',
          'info_dict': {
              'id': '6021',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Genetikk og eneggede tvillinger',
              'description': 'md5:3aca25dcf38ec30f0363428d2b265f8d',
              'duration': 399,
          },
      }, {
-        'url': 'http://www.nrk.no/skole/klippdetalj?topic=nrk%3Aklipp%2F616532#embed',
-        'only_matching': True,
-    }, {
-        'url': 'http://www.nrk.no/skole/klippdetalj?topic=urn:x-mediadb:21379',
+        'url': 'https://www.nrk.no/skole/?page=objectives&subject=naturfag&objective=K15114&mediaId=19355',
          'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        video_id = compat_urllib_parse_unquote(self._match_id(url))
-
-        webpage = self._download_webpage(url, video_id)
-
-        nrk_id = self._search_regex(r'data-nrk-id=["\'](\d+)', webpage, 'nrk id')
-        return self.url_result('nrk:%s' % nrk_id)
-
-
-class NRKTVIE(InfoExtractor):
-    IE_DESC = 'NRK TV and NRK Radio'
-    _VALID_URL = r'(?P<baseurl>https?://(?:tv|radio)\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
+        video_id = self._match_id(url)
  
-    _TESTS = [
-        {
-            'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
-            'info_dict': {
-                'id': 'MUHH48000314',
-                'ext': 'mp4',
-                'title': '20 spørsmål',
-                'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
-                'upload_date': '20140523',
-                'duration': 1741.52,
-            },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            },
-        },
-        {
-            'url': 'https://tv.nrk.no/program/mdfp15000514',
-            'info_dict': {
-                'id': 'mdfp15000514',
-                'ext': 'mp4',
-                'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting',
-                'description': 'md5:654c12511f035aed1e42bdf5db3b206a',
-                'upload_date': '20140524',
-                'duration': 4605.08,
-            },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            },
-        },
-        {
-            # single playlist video
-            'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
-            'md5': 'adbd1dbd813edaf532b0a253780719c2',
-            'info_dict': {
-                'id': 'MSPO40010515-part2',
-                'ext': 'flv',
-                'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
-                'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
-                'upload_date': '20150106',
-            },
-            'skip': 'Only works from Norway',
-        },
-        {
-            'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
-            'playlist': [
-                {
-                    'md5': '9480285eff92d64f06e02a5367970a7a',
-                    'info_dict': {
-                        'id': 'MSPO40010515-part1',
-                        'ext': 'flv',
-                        'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 1:2)',
-                        'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
-                        'upload_date': '20150106',
-                    },
-                },
-                {
-                    'md5': 'adbd1dbd813edaf532b0a253780719c2',
-                    'info_dict': {
-                        'id': 'MSPO40010515-part2',
-                        'ext': 'flv',
-                        'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
-                        'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
-                        'upload_date': '20150106',
-                    },
-                },
-            ],
-            'info_dict': {
-                'id': 'MSPO40010515',
-                'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn',
-                'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
-                'upload_date': '20150106',
-                'duration': 6947.5199999999995,
-            },
-            'skip': 'Only works from Norway',
-        },
-        {
-            'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
-            'only_matching': True,
-        }
-    ]
+        webpage = self._download_webpage(
+            'https://mimir.nrk.no/plugin/1.0/static?mediaId=%s' % video_id,
+            video_id)
  
-    def _extract_f4m(self, manifest_url, video_id):
-        return self._extract_f4m_formats(
-            manifest_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', video_id, f4m_id='hds')
+        nrk_id = self._parse_json(
+            self._search_regex(
+                r'<script[^>]+type=["\']application/json["\'][^>]*>({.+?})</script>',
+                webpage, 'application json'),
+            video_id)['activeMedia']['psId']
  
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        part_id = mobj.group('part_id')
-        base_url = mobj.group('baseurl')
-
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._html_search_meta(
-            'title', webpage, 'title')
-        description = self._html_search_meta(
-            'description', webpage, 'description')
-
-        thumbnail = self._html_search_regex(
-            r'data-posterimage="([^"]+)"',
-            webpage, 'thumbnail', fatal=False)
-        upload_date = unified_strdate(self._html_search_meta(
-            'rightsfrom', webpage, 'upload date', fatal=False))
-        duration = float_or_none(self._html_search_regex(
-            r'data-duration="([^"]+)"',
-            webpage, 'duration', fatal=False))
-
-        # playlist
-        parts = re.findall(
-            r'<a href="#del=(\d+)"[^>]+data-argument="([^"]+)">([^<]+)</a>', webpage)
-        if parts:
-            entries = []
-            for current_part_id, stream_url, part_title in parts:
-                if part_id and current_part_id != part_id:
-                    continue
-                video_part_id = '%s-part%s' % (video_id, current_part_id)
-                formats = self._extract_f4m(stream_url, video_part_id)
-                entries.append({
-                    'id': video_part_id,
-                    'title': part_title,
-                    'description': description,
-                    'thumbnail': thumbnail,
-                    'upload_date': upload_date,
-                    'formats': formats,
-                })
-            if part_id:
-                if entries:
-                    return entries[0]
-            else:
-                playlist = self.playlist_result(entries, video_id, title, description)
-                playlist.update({
-                    'thumbnail': thumbnail,
-                    'upload_date': upload_date,
-                    'duration': duration,
-                })
-                return playlist
-
-        formats = []
-
-        f4m_url = re.search(r'data-media="([^"]+)"', webpage)
-        if f4m_url:
-            formats.extend(self._extract_f4m(f4m_url.group(1), video_id))
-
-        m3u8_url = re.search(r'data-hls-media="([^"]+)"', webpage)
-        if m3u8_url:
-            formats.extend(self._extract_m3u8_formats(m3u8_url.group(1), video_id, 'mp4', m3u8_id='hls'))
-        self._sort_formats(formats)
-
-        subtitles_url = self._html_search_regex(
-            r'data-subtitlesurl\s*=\s*(["\'])(?P<url>.+?)\1',
-            webpage, 'subtitle URL', default=None, group='url')
-        subtitles = {}
-        if subtitles_url:
-            subtitles['no'] = [{
-                'ext': 'ttml',
-                'url': compat_urlparse.urljoin(base_url, subtitles_url),
-            }]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'upload_date': upload_date,
-            'duration': duration,
-            'formats': formats,
-            'subtitles': subtitles,
-        }
+        return self.url_result('nrk:%s' % nrk_id)
diff --git a/youtube_dl/extractor/ntvde.py b/youtube_dl/extractor/ntvde.py

index a83e85cb8109ef44468851355f2b522e22fc5831..d28a8154247f75cbc612f7999083cd60275c5a88 100644 (file)
--- a/youtube_dl/extractor/ntvde.py
+++ b/youtube_dl/extractor/ntvde.py
@@ -1,6 +1,8 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..compat import compat_urlparse
  from ..utils import (
@@ -40,8 +42,8 @@ class NTVDeIE(InfoExtractor):
          timestamp = int_or_none(info.get('publishedDateAsUnixTimeStamp'))
          vdata = self._parse_json(self._search_regex(
              r'(?s)\$\(\s*"\#player"\s*\)\s*\.data\(\s*"player",\s*(\{.*?\})\);',
-            webpage, 'player data'),
-            video_id, transform_source=js_to_json)
+            webpage, 'player data'), video_id,
+            transform_source=lambda s: js_to_json(re.sub(r'advertising:\s*{[^}]+},', '', s)))
          duration = parse_duration(vdata.get('duration'))
  
          formats = []
diff --git a/youtube_dl/extractor/ntvru.py b/youtube_dl/extractor/ntvru.py

index 0895d7ea4cb88f805605a55cb0c1fe56ff1d475d..7d7a785ab10e7b71ceb4729a012ebb574c7752d5 100644 (file)
--- a/youtube_dl/extractor/ntvru.py
+++ b/youtube_dl/extractor/ntvru.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -11,70 +11,64 @@ from ..utils import (
  
  class NTVRuIE(InfoExtractor):
      IE_NAME = 'ntv.ru'
-    _VALID_URL = r'https?://(?:www\.)?ntv\.ru/(?P<id>.+)'
+    _VALID_URL = r'https?://(?:www\.)?ntv\.ru/(?:[^/]+/)*(?P<id>[^/?#&]+)'
  
-    _TESTS = [
-        {
-            'url': 'http://www.ntv.ru/novosti/863142/',
-            'md5': 'ba7ea172a91cb83eb734cad18c10e723',
-            'info_dict': {
-                'id': '746000',
-                'ext': 'mp4',
-                'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
-                'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
-                'thumbnail': 're:^http://.*\.jpg',
-                'duration': 136,
-            },
+    _TESTS = [{
+        'url': 'http://www.ntv.ru/novosti/863142/',
+        'md5': 'ba7ea172a91cb83eb734cad18c10e723',
+        'info_dict': {
+            'id': '746000',
+            'ext': 'mp4',
+            'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
+            'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
+            'thumbnail': 're:^http://.*\.jpg',
+            'duration': 136,
          },
-        {
-            'url': 'http://www.ntv.ru/video/novosti/750370/',
-            'md5': 'adecff79691b4d71e25220a191477124',
-            'info_dict': {
-                'id': '750370',
-                'ext': 'mp4',
-                'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
-                'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
-                'thumbnail': 're:^http://.*\.jpg',
-                'duration': 172,
-            },
+    }, {
+        'url': 'http://www.ntv.ru/video/novosti/750370/',
+        'md5': 'adecff79691b4d71e25220a191477124',
+        'info_dict': {
+            'id': '750370',
+            'ext': 'mp4',
+            'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
+            'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
+            'thumbnail': 're:^http://.*\.jpg',
+            'duration': 172,
          },
-        {
-            'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416',
-            'md5': '82dbd49b38e3af1d00df16acbeab260c',
-            'info_dict': {
-                'id': '747480',
-                'ext': 'mp4',
-                'title': '«Сегодня». 21 марта 2014 года. 16:00',
-                'description': '«Сегодня». 21 марта 2014 года. 16:00',
-                'thumbnail': 're:^http://.*\.jpg',
-                'duration': 1496,
-            },
+    }, {
+        'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416',
+        'md5': '82dbd49b38e3af1d00df16acbeab260c',
+        'info_dict': {
+            'id': '747480',
+            'ext': 'mp4',
+            'title': '«Сегодня». 21 марта 2014 года. 16:00',
+            'description': '«Сегодня». 21 марта 2014 года. 16:00',
+            'thumbnail': 're:^http://.*\.jpg',
+            'duration': 1496,
          },
-        {
-            'url': 'http://www.ntv.ru/kino/Koma_film',
-            'md5': 'f825770930937aa7e5aca0dc0d29319a',
-            'info_dict': {
-                'id': '1007609',
-                'ext': 'mp4',
-                'title': 'Остросюжетный фильм «Кома»',
-                'description': 'Остросюжетный фильм «Кома»',
-                'thumbnail': 're:^http://.*\.jpg',
-                'duration': 5592,
-            },
+    }, {
+        'url': 'http://www.ntv.ru/kino/Koma_film',
+        'md5': 'f825770930937aa7e5aca0dc0d29319a',
+        'info_dict': {
+            'id': '1007609',
+            'ext': 'mp4',
+            'title': 'Остросюжетный фильм «Кома»',
+            'description': 'Остросюжетный фильм «Кома»',
+            'thumbnail': 're:^http://.*\.jpg',
+            'duration': 5592,
          },
-        {
-            'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/',
-            'md5': '9320cd0e23f3ea59c330dc744e06ff3b',
-            'info_dict': {
-                'id': '751482',
-                'ext': 'mp4',
-                'title': '«Дело врачей»: «Деревце жизни»',
-                'description': '«Дело врачей»: «Деревце жизни»',
-                'thumbnail': 're:^http://.*\.jpg',
-                'duration': 2590,
-            },
+    }, {
+        'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/',
+        'md5': '9320cd0e23f3ea59c330dc744e06ff3b',
+        'info_dict': {
+            'id': '751482',
+            'ext': 'mp4',
+            'title': '«Дело врачей»: «Деревце жизни»',
+            'description': '«Дело врачей»: «Деревце жизни»',
+            'thumbnail': 're:^http://.*\.jpg',
+            'duration': 2590,
          },
-    ]
+    }]
  
      _VIDEO_ID_REGEXES = [
          r'<meta property="og:url" content="http://www\.ntv\.ru/video/(\d+)',
@@ -87,11 +81,21 @@ class NTVRuIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        video_id = self._html_search_regex(self._VIDEO_ID_REGEXES, webpage, 'video id')
+        video_url = self._og_search_property(
+            ('video', 'video:iframe'), webpage, default=None)
+        if video_url:
+            video_id = self._search_regex(
+                r'https?://(?:www\.)?ntv\.ru/video/(?:embed/)?(\d+)',
+                video_url, 'video id', default=None)
+
+        if not video_id:
+            video_id = self._html_search_regex(
+                self._VIDEO_ID_REGEXES, webpage, 'video id')
  
          player = self._download_xml(
              'http://www.ntv.ru/vi%s/' % video_id,
              video_id, 'Downloading video XML')
+
          title = clean_html(xpath_text(player, './data/title', 'title', fatal=True))
          description = clean_html(xpath_text(player, './data/description', 'description'))
  
diff --git a/youtube_dl/extractor/nuevo.py b/youtube_dl/extractor/nuevo.py

index ef093dec2201afb7cb24384d7120f6ec00158de4..87fb94d1f583f5b174fe8d9ace84e4791f3afa4e 100644 (file)
--- a/youtube_dl/extractor/nuevo.py
+++ b/youtube_dl/extractor/nuevo.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/nuvid.py b/youtube_dl/extractor/nuvid.py

index 9fa7cefadc79ef1d8bda971dc52483a0b8d998eb..ab6bfcd7f485218d19034930607d35d1665e7406 100644 (file)
--- a/youtube_dl/extractor/nuvid.py
+++ b/youtube_dl/extractor/nuvid.py
@@ -5,8 +5,6 @@ import re
  from .common import InfoExtractor
  from ..utils import (
      parse_duration,
-    sanitized_Request,
-    unified_strdate,
  )
  
  
@@ -20,7 +18,6 @@ class NuvidIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Horny babes show their awesome bodeis and',
              'duration': 129,
-            'upload_date': '20140508',
              'age_limit': 18,
          }
      }
@@ -28,28 +25,31 @@ class NuvidIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        formats = []
+        page_url = 'http://m.nuvid.com/video/%s' % video_id
+        webpage = self._download_webpage(
+            page_url, video_id, 'Downloading video page')
+        # When dwnld_speed exists and has a value larger than the MP4 file's
+        # bitrate, Nuvid returns the MP4 URL
+        # It's unit is 100bytes/millisecond, see mobile-nuvid-min.js for the algorithm
+        self._set_cookie('nuvid.com', 'dwnld_speed', '10.0')
+        mp4_webpage = self._download_webpage(
+            page_url, video_id, 'Downloading video page for MP4 format')
  
-        for dwnld_speed, format_id in [(0, '3gp'), (5, 'mp4')]:
-            request = sanitized_Request(
-                'http://m.nuvid.com/play/%s' % video_id)
-            request.add_header('Cookie', 'skip_download_page=1; dwnld_speed=%d; adv_show=1' % dwnld_speed)
-            webpage = self._download_webpage(
-                request, video_id, 'Downloading %s page' % format_id)
-            video_url = self._html_search_regex(
-                r'<a\s+href="([^"]+)"\s+class="b_link">', webpage, '%s video URL' % format_id, fatal=False)
-            if not video_url:
-                continue
+        html5_video_re = r'(?s)<(?:video|audio)[^<]*(?:>.*?<source[^>]*)?\s+src=["\'](.*?)["\']',
+        video_url = self._html_search_regex(html5_video_re, webpage, video_id)
+        mp4_video_url = self._html_search_regex(html5_video_re, mp4_webpage, video_id)
+        formats = [{
+            'url': video_url,
+        }]
+        if mp4_video_url != video_url:
              formats.append({
-                'url': video_url,
-                'format_id': format_id,
+                'url': mp4_video_url,
              })
  
-        webpage = self._download_webpage(
-            'http://m.nuvid.com/video/%s' % video_id, video_id, 'Downloading video page')
          title = self._html_search_regex(
              [r'<span title="([^"]+)">',
-             r'<div class="thumb-holder video">\s*<h5[^>]*>([^<]+)</h5>'], webpage, 'title').strip()
+             r'<div class="thumb-holder video">\s*<h5[^>]*>([^<]+)</h5>',
+             r'<span[^>]+class="title_thumb">([^<]+)</span>'], webpage, 'title').strip()
          thumbnails = [
              {
                  'url': thumb_url,
@@ -57,9 +57,8 @@ class NuvidIE(InfoExtractor):
          ]
          thumbnail = thumbnails[0]['url'] if thumbnails else None
          duration = parse_duration(self._html_search_regex(
-            r'<i class="fa fa-clock-o"></i>\s*(\d{2}:\d{2})', webpage, 'duration', fatal=False))
-        upload_date = unified_strdate(self._html_search_regex(
-            r'<i class="fa fa-user"></i>\s*(\d{4}-\d{2}-\d{2})', webpage, 'upload date', fatal=False))
+            [r'<i class="fa fa-clock-o"></i>\s*(\d{2}:\d{2})',
+             r'<span[^>]+class="view_time">([^<]+)</span>'], webpage, 'duration', fatal=False))
  
          return {
              'id': video_id,
@@ -67,7 +66,6 @@ class NuvidIE(InfoExtractor):
              'thumbnails': thumbnails,
              'thumbnail': thumbnail,
              'duration': duration,
-            'upload_date': upload_date,
              'age_limit': 18,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/nytimes.py b/youtube_dl/extractor/nytimes.py

index 681683e86f54e796f1c954de2c0cb374016fe303..2bb77ab249239163d8318a57e8fd0fdb57d2e32a 100644 (file)
--- a/youtube_dl/extractor/nytimes.py
+++ b/youtube_dl/extractor/nytimes.py
@@ -1,26 +1,40 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
+import hmac
+import hashlib
+import base64
+
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      float_or_none,
      int_or_none,
+    js_to_json,
+    mimetype2ext,
      parse_iso8601,
+    remove_start,
  )
  
  
  class NYTimesBaseIE(InfoExtractor):
+    _SECRET = b'pX(2MbU2);4N{7J8)>YwKRJ+/pQ3JkiU2Q^V>mFYv6g6gYvt6v'
+
      def _extract_video_from_id(self, video_id):
-        video_data = self._download_json(
-            'http://www.nytimes.com/svc/video/api/v2/video/%s' % video_id,
-            video_id, 'Downloading video JSON')
+        # Authorization generation algorithm is reverse engineered from `signer` in
+        # http://graphics8.nytimes.com/video/vhs/vhs-2.x.min.js
+        path = '/svc/video/api/v3/video/' + video_id
+        hm = hmac.new(self._SECRET, (path + ':vhs').encode(), hashlib.sha512).hexdigest()
+        video_data = self._download_json('http://www.nytimes.com' + path, video_id, 'Downloading video JSON', headers={
+            'Authorization': 'NYTV ' + base64.b64encode(hm.encode()).decode(),
+            'X-NYTV': 'vhs',
+        }, fatal=False)
+        if not video_data:
+            video_data = self._download_json(
+                'http://www.nytimes.com/svc/video/api/v2/video/' + video_id,
+                video_id, 'Downloading video JSON')
  
          title = video_data['headline']
-        description = video_data.get('summary')
-        duration = float_or_none(video_data.get('duration'), 1000)
-
-        uploader = video_data.get('byline')
-        publication_date = video_data.get('publication_date')
-        timestamp = parse_iso8601(publication_date[:-8]) if publication_date else None
  
          def get_file_size(file_size):
              if isinstance(file_size, int):
@@ -28,35 +42,59 @@ class NYTimesBaseIE(InfoExtractor):
              elif isinstance(file_size, dict):
                  return int(file_size.get('value', 0))
              else:
-                return 0
-
-        formats = [
-            {
-                'url': video['url'],
-                'format_id': video.get('type'),
-                'vcodec': video.get('video_codec'),
-                'width': int_or_none(video.get('width')),
-                'height': int_or_none(video.get('height')),
-                'filesize': get_file_size(video.get('fileSize')),
-            } for video in video_data['renditions'] if video.get('url')
-        ]
+                return None
+
+        urls = []
+        formats = []
+        for video in video_data.get('renditions', []):
+            video_url = video.get('url')
+            format_id = video.get('type')
+            if not video_url or format_id == 'thumbs' or video_url in urls:
+                continue
+            urls.append(video_url)
+            ext = mimetype2ext(video.get('mimetype')) or determine_ext(video_url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id=format_id or 'hls', fatal=False))
+            elif ext == 'mpd':
+                continue
+            #     formats.extend(self._extract_mpd_formats(
+            #         video_url, video_id, format_id or 'dash', fatal=False))
+            else:
+                formats.append({
+                    'url': video_url,
+                    'format_id': format_id,
+                    'vcodec': video.get('videoencoding') or video.get('video_codec'),
+                    'width': int_or_none(video.get('width')),
+                    'height': int_or_none(video.get('height')),
+                    'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
+                    'tbr': int_or_none(video.get('bitrate'), 1000),
+                    'ext': ext,
+                })
          self._sort_formats(formats)
  
-        thumbnails = [
-            {
-                'url': 'http://www.nytimes.com/%s' % image['url'],
+        thumbnails = []
+        for image in video_data.get('images', []):
+            image_url = image.get('url')
+            if not image_url:
+                continue
+            thumbnails.append({
+                'url': 'http://www.nytimes.com/' + image_url,
                  'width': int_or_none(image.get('width')),
                  'height': int_or_none(image.get('height')),
-            } for image in video_data.get('images', []) if image.get('url')
-        ]
+            })
+
+        publication_date = video_data.get('publication_date')
+        timestamp = parse_iso8601(publication_date[:-8]) if publication_date else None
  
          return {
              'id': video_id,
              'title': title,
-            'description': description,
+            'description': video_data.get('summary'),
              'timestamp': timestamp,
-            'uploader': uploader,
-            'duration': duration,
+            'uploader': video_data.get('byline'),
+            'duration': float_or_none(video_data.get('duration'), 1000),
              'formats': formats,
              'thumbnails': thumbnails,
          }
@@ -67,7 +105,7 @@ class NYTimesIE(NYTimesBaseIE):
  
      _TESTS = [{
          'url': 'http://www.nytimes.com/video/opinion/100000002847155/verbatim-what-is-a-photocopier.html?playlistId=100000001150263',
-        'md5': '18a525a510f942ada2720db5f31644c0',
+        'md5': 'd665342765db043f7e225cff19df0f2d',
          'info_dict': {
              'id': '100000002847155',
              'ext': 'mov',
@@ -103,16 +141,83 @@ class NYTimesArticleIE(NYTimesBaseIE):
              'upload_date': '20150414',
              'uploader': 'Matthew Williams',
          }
+    }, {
+        'url': 'http://www.nytimes.com/2016/10/14/podcasts/revelations-from-the-final-weeks.html',
+        'md5': 'e0d52040cafb07662acf3c9132db3575',
+        'info_dict': {
+            'id': '100000004709062',
+            'title': 'The Run-Up: ‘He Was Like an Octopus’',
+            'ext': 'mp3',
+            'description': 'md5:fb5c6b93b12efc51649b4847fe066ee4',
+            'series': 'The Run-Up',
+            'episode': '‘He Was Like an Octopus’',
+            'episode_number': 20,
+            'duration': 2130,
+        }
+    }, {
+        'url': 'http://www.nytimes.com/2016/10/16/books/review/inside-the-new-york-times-book-review-the-rise-of-hitler.html',
+        'info_dict': {
+            'id': '100000004709479',
+            'title': 'The Rise of Hitler',
+            'ext': 'mp3',
+            'description': 'md5:bce877fd9e3444990cb141875fab0028',
+            'creator': 'Pamela Paul',
+            'duration': 3475,
+        },
+        'params': {
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.nytimes.com/news/minute/2014/03/17/times-minute-whats-next-in-crimea/?_php=true&_type=blogs&_php=true&_type=blogs&_r=1',
          'only_matching': True,
      }]
  
+    def _extract_podcast_from_json(self, json, page_id, webpage):
+        podcast_audio = self._parse_json(
+            json, page_id, transform_source=js_to_json)
+
+        audio_data = podcast_audio['data']
+        track = audio_data['track']
+
+        episode_title = track['title']
+        video_url = track['source']
+
+        description = track.get('description') or self._html_search_meta(
+            ['og:description', 'twitter:description'], webpage)
+
+        podcast_title = audio_data.get('podcast', {}).get('title')
+        title = ('%s: %s' % (podcast_title, episode_title)
+                 if podcast_title else episode_title)
+
+        episode = audio_data.get('podcast', {}).get('episode') or ''
+        episode_number = int_or_none(self._search_regex(
+            r'[Ee]pisode\s+(\d+)', episode, 'episode number', default=None))
+
+        return {
+            'id': remove_start(podcast_audio.get('target'), 'FT') or page_id,
+            'url': video_url,
+            'title': title,
+            'description': description,
+            'creator': track.get('credit'),
+            'series': podcast_title,
+            'episode': episode_title,
+            'episode_number': episode_number,
+            'duration': int_or_none(track.get('duration')),
+        }
+
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        page_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(url, page_id)
  
-        video_id = self._html_search_regex(r'data-videoid="(\d+)"', webpage, 'video id')
+        video_id = self._search_regex(
+            r'data-videoid=["\'](\d+)', webpage, 'video id',
+            default=None, fatal=False)
+        if video_id is not None:
+            return self._extract_video_from_id(video_id)
  
-        return self._extract_video_from_id(video_id)
+        podcast_data = self._search_regex(
+            (r'NYTD\.FlexTypes\.push\s*\(\s*({.+?})\s*\)\s*;\s*</script',
+             r'NYTD\.FlexTypes\.push\s*\(\s*({.+})\s*\)\s*;'),
+            webpage, 'podcast data')
+        return self._extract_podcast_from_json(podcast_data, page_id, webpage)
diff --git a/youtube_dl/extractor/nzz.py b/youtube_dl/extractor/nzz.py

new file mode 100644 (file)

index 0000000..2d352f5
--- /dev/null
+++ b/youtube_dl/extractor/nzz.py
@@ -0,0 +1,36 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    extract_attributes,
+)
+
+
+class NZZIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?nzz\.ch/(?:[^/]+/)*[^/?#]+-ld\.(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.nzz.ch/zuerich/gymizyte/gymizyte-schreiben-schueler-heute-noch-diktate-ld.9153',
+        'info_dict': {
+            'id': '9153',
+        },
+        'playlist_mincount': 6,
+    }
+
+    def _real_extract(self, url):
+        page_id = self._match_id(url)
+        webpage = self._download_webpage(url, page_id)
+
+        entries = []
+        for player_element in re.findall(r'(<[^>]+class="kalturaPlayer"[^>]*>)', webpage):
+            player_params = extract_attributes(player_element)
+            if player_params.get('data-type') not in ('kaltura_singleArticle',):
+                self.report_warning('Unsupported player type')
+                continue
+            entry_id = player_params['data-id']
+            entries.append(self.url_result(
+                'kaltura:1750922:' + entry_id, 'Kaltura', entry_id))
+
+        return self.playlist_result(entries, page_id)
diff --git a/youtube_dl/extractor/odatv.py b/youtube_dl/extractor/odatv.py

new file mode 100644 (file)

index 0000000..314527f
--- /dev/null
+++ b/youtube_dl/extractor/odatv.py
@@ -0,0 +1,50 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    NO_DEFAULT,
+    remove_start
+)
+
+
+class OdaTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?odatv\.com/(?:mob|vid)_video\.php\?.*\bid=(?P<id>[^&]+)'
+    _TESTS = [{
+        'url': 'http://odatv.com/vid_video.php?id=8E388',
+        'md5': 'dc61d052f205c9bf2da3545691485154',
+        'info_dict': {
+            'id': '8E388',
+            'ext': 'mp4',
+            'title': 'Artık Davutoğlu ile devam edemeyiz'
+        }
+    }, {
+        # mobile URL
+        'url': 'http://odatv.com/mob_video.php?id=8E388',
+        'only_matching': True,
+    }, {
+        # no video
+        'url': 'http://odatv.com/mob_video.php?id=8E900',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        no_video = 'NO VIDEO!' in webpage
+
+        video_url = self._search_regex(
+            r'mp4\s*:\s*(["\'])(?P<url>http.+?)\1', webpage, 'video url',
+            default=None if no_video else NO_DEFAULT, group='url')
+
+        if no_video:
+            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
+
+        return {
+            'id': video_id,
+            'url': video_url,
+            'title': remove_start(self._og_search_title(webpage), 'Video: '),
+            'thumbnail': self._og_search_thumbnail(webpage),
+        }
diff --git a/youtube_dl/extractor/odnoklassniki.py b/youtube_dl/extractor/odnoklassniki.py

index f9e064a60e445668200b759ca4e0ad1a6f7c28ab..986708e75e45f7f24f656f319767e6adbd9504ea 100644 (file)
--- a/youtube_dl/extractor/odnoklassniki.py
+++ b/youtube_dl/extractor/odnoklassniki.py
@@ -2,7 +2,11 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
+from ..compat import (
+    compat_parse_qs,
+    compat_urllib_parse_unquote,
+    compat_urllib_parse_urlparse,
+)
  from ..utils import (
      ExtractorError,
      unified_strdate,
@@ -32,7 +36,7 @@ class OdnoklassnikiIE(InfoExtractor):
          'skip': 'Video has been blocked',
      }, {
          # metadataUrl
-        'url': 'http://ok.ru/video/63567059965189-0',
+        'url': 'http://ok.ru/video/63567059965189-0?fromTime=5',
          'md5': '9676cf86eff5391d35dea675d224e131',
          'info_dict': {
              'id': '63567059965189-0',
@@ -44,6 +48,7 @@ class OdnoklassnikiIE(InfoExtractor):
              'uploader': '☭ Андрей Мещанинов ☭',
              'like_count': int,
              'age_limit': 0,
+            'start_time': 5,
          },
      }, {
          # YouTube embed (metadataUrl, provider == USER_YOUTUBE)
@@ -60,6 +65,22 @@ class OdnoklassnikiIE(InfoExtractor):
              'uploader': 'Алина П',
              'age_limit': 0,
          },
+    }, {
+        # YouTube embed (metadata, provider == USER_YOUTUBE, no metadata.movie.title field)
+        'url': 'http://ok.ru/video/62036049272859-0',
+        'info_dict': {
+            'id': '62036049272859-0',
+            'ext': 'mp4',
+            'title': 'МУЗЫКА     ДОЖДЯ .',
+            'description': 'md5:6f1867132bd96e33bf53eda1091e8ed0',
+            'upload_date': '20120106',
+            'uploader_id': '473534735899',
+            'uploader': 'МARINA D',
+            'age_limit': 0,
+        },
+        'params': {
+            'skip_download': True,
+        },
      }, {
          'url': 'http://ok.ru/web-api/video/moviePlayer/20079905452',
          'only_matching': True,
@@ -78,6 +99,9 @@ class OdnoklassnikiIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
+        start_time = int_or_none(compat_parse_qs(
+            compat_urllib_parse_urlparse(url).query).get('fromTime', [None])[0])
+
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(
@@ -106,7 +130,14 @@ class OdnoklassnikiIE(InfoExtractor):
                  video_id, 'Downloading metadata JSON')
  
          movie = metadata['movie']
-        title = movie['title']
+
+        # Some embedded videos may not contain title in movie dict (e.g.
+        # http://ok.ru/video/62036049272859-0) thus we allow missing title
+        # here and it's going to be extracted later by an extractor that
+        # will process the actual embed.
+        provider = metadata.get('provider')
+        title = movie['title'] if provider == 'UPLOADED_ODKL' else movie.get('title')
+
          thumbnail = movie.get('poster')
          duration = int_or_none(movie.get('duration'))
  
@@ -135,9 +166,10 @@ class OdnoklassnikiIE(InfoExtractor):
              'uploader_id': uploader_id,
              'like_count': like_count,
              'age_limit': age_limit,
+            'start_time': start_time,
          }
  
-        if metadata.get('provider') == 'USER_YOUTUBE':
+        if provider == 'USER_YOUTUBE':
              info.update({
                  '_type': 'url_transparent',
                  'url': movie['contentId'],
diff --git a/youtube_dl/extractor/oktoberfesttv.py b/youtube_dl/extractor/oktoberfesttv.py

index 4a41c0542102165334124ea22a99a48473d19e50..50fbbc79c12761449adc70e74a58f0442f5b9cfa 100644 (file)
--- a/youtube_dl/extractor/oktoberfesttv.py
+++ b/youtube_dl/extractor/oktoberfesttv.py
@@ -1,11 +1,11 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
  
  
  class OktoberfestTVIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.oktoberfest-tv\.de/[^/]+/[^/]+/video/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?oktoberfest-tv\.de/[^/]+/[^/]+/video/(?P<id>[^/?#]+)'
  
      _TEST = {
          'url': 'http://www.oktoberfest-tv.de/de/kameras/video/hb-zelt',
diff --git a/youtube_dl/extractor/onet.py b/youtube_dl/extractor/onet.py

new file mode 100644 (file)

index 0000000..0a501b3
--- /dev/null
+++ b/youtube_dl/extractor/onet.py
@@ -0,0 +1,169 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    ExtractorError,
+    float_or_none,
+    get_element_by_class,
+    int_or_none,
+    js_to_json,
+    parse_iso8601,
+    remove_start,
+    strip_or_none,
+    url_basename,
+)
+
+
+class OnetBaseIE(InfoExtractor):
+    def _search_mvp_id(self, webpage):
+        return self._search_regex(
+            r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
+
+    def _extract_from_id(self, video_id, webpage):
+        response = self._download_json(
+            'http://qi.ckm.onetapi.pl/', video_id,
+            query={
+                'body[id]': video_id,
+                'body[jsonrpc]': '2.0',
+                'body[method]': 'get_asset_detail',
+                'body[params][ID_Publikacji]': video_id,
+                'body[params][Service]': 'www.onet.pl',
+                'content-type': 'application/jsonp',
+                'x-onet-app': 'player.front.onetapi.pl',
+            })
+
+        error = response.get('error')
+        if error:
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, error['message']), expected=True)
+
+        video = response['result'].get('0')
+
+        formats = []
+        for _, formats_dict in video['formats'].items():
+            if not isinstance(formats_dict, dict):
+                continue
+            for format_id, format_list in formats_dict.items():
+                if not isinstance(format_list, list):
+                    continue
+                for f in format_list:
+                    video_url = f.get('url')
+                    if not video_url:
+                        continue
+                    ext = determine_ext(video_url)
+                    if format_id == 'ism':
+                        formats.extend(self._extract_ism_formats(
+                            video_url, video_id, 'mss', fatal=False))
+                    elif ext == 'mpd':
+                        formats.extend(self._extract_mpd_formats(
+                            video_url, video_id, mpd_id='dash', fatal=False))
+                    else:
+                        formats.append({
+                            'url': video_url,
+                            'format_id': format_id,
+                            'height': int_or_none(f.get('vertical_resolution')),
+                            'width': int_or_none(f.get('horizontal_resolution')),
+                            'abr': float_or_none(f.get('audio_bitrate')),
+                            'vbr': float_or_none(f.get('video_bitrate')),
+                        })
+        self._sort_formats(formats)
+
+        meta = video.get('meta', {})
+
+        title = self._og_search_title(webpage, default=None) or meta['title']
+        description = self._og_search_description(webpage, default=None) or meta.get('description')
+        duration = meta.get('length') or meta.get('lenght')
+        timestamp = parse_iso8601(meta.get('addDate'), ' ')
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'timestamp': timestamp,
+            'formats': formats,
+        }
+
+
+class OnetIE(OnetBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
+    IE_NAME = 'onet.tv'
+
+    _TEST = {
+        'url': 'http://onet.tv/k/openerfestival/open-er-festival-2016-najdziwniejsze-wymagania-gwiazd/qbpyqc',
+        'md5': 'e3ffbf47590032ac3f27249204173d50',
+        'info_dict': {
+            'id': 'qbpyqc',
+            'display_id': 'open-er-festival-2016-najdziwniejsze-wymagania-gwiazd',
+            'ext': 'mp4',
+            'title': 'Open\'er Festival 2016: najdziwniejsze wymagania gwiazd',
+            'description': 'Trzy samochody, których nigdy nie użyto, prywatne spa, hotel dekorowany czarnym suknem czy nielegalne używki. Organizatorzy koncertów i festiwali muszą stawać przed nie lada wyzwaniem zapraszając gwia...',
+            'upload_date': '20160705',
+            'timestamp': 1467721580,
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        display_id, video_id = mobj.group('display_id', 'id')
+
+        webpage = self._download_webpage(url, display_id)
+
+        mvp_id = self._search_mvp_id(webpage)
+
+        info_dict = self._extract_from_id(mvp_id, webpage)
+        info_dict.update({
+            'id': video_id,
+            'display_id': display_id,
+        })
+
+        return info_dict
+
+
+class OnetChannelIE(OnetBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/(?P<id>[a-z]+)(?:[?#]|$)'
+    IE_NAME = 'onet.tv:channel'
+
+    _TEST = {
+        'url': 'http://onet.tv/k/openerfestival',
+        'info_dict': {
+            'id': 'openerfestival',
+            'title': 'Open\'er Festival Live',
+            'description': 'Dziękujemy, że oglądaliście transmisje. Zobaczcie nasze relacje i wywiady z artystami.',
+        },
+        'playlist_mincount': 46,
+    }
+
+    def _real_extract(self, url):
+        channel_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, channel_id)
+
+        current_clip_info = self._parse_json(self._search_regex(
+            r'var\s+currentClip\s*=\s*({[^}]+})', webpage, 'video info'), channel_id,
+            transform_source=lambda s: js_to_json(re.sub(r'\'\s*\+\s*\'', '', s)))
+        video_id = remove_start(current_clip_info['ckmId'], 'mvp:')
+        video_name = url_basename(current_clip_info['url'])
+
+        if self._downloader.params.get('noplaylist'):
+            self.to_screen(
+                'Downloading just video %s because of --no-playlist' % video_name)
+            return self._extract_from_id(video_id, webpage)
+
+        self.to_screen(
+            'Downloading channel %s - add --no-playlist to just download video %s' % (
+                channel_id, video_name))
+        matches = re.findall(
+            r'<a[^>]+href=[\'"](https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/[0-9a-z-]+/[0-9a-z]+)',
+            webpage)
+        entries = [
+            self.url_result(video_link, OnetIE.ie_key())
+            for video_link in matches]
+
+        channel_title = strip_or_none(get_element_by_class('o_channelName', webpage))
+        channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage))
+        return self.playlist_result(entries, channel_id, channel_title, channel_description)
diff --git a/youtube_dl/extractor/onionstudios.py b/youtube_dl/extractor/onionstudios.py

index 0f1f448fe3126670932b371498b73b0bf0a7924e..6fb1a3fcc0bd565677b232adcb883b3649715dde 100644 (file)
--- a/youtube_dl/extractor/onionstudios.py
+++ b/youtube_dl/extractor/onionstudios.py
@@ -4,7 +4,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import determine_ext
+from ..utils import (
+    determine_ext,
+    int_or_none,
+    float_or_none,
+    mimetype2ext,
+)
  
  
  class OnionStudiosIE(InfoExtractor):
@@ -12,15 +17,14 @@ class OnionStudiosIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'http://www.onionstudios.com/videos/hannibal-charges-forward-stops-for-a-cocktail-2937',
-        'md5': 'd4851405d31adfadf71cd7a487b765bb',
+        'md5': 'e49f947c105b8a78a675a0ee1bddedfe',
          'info_dict': {
              'id': '2937',
              'ext': 'mp4',
              'title': 'Hannibal charges forward, stops for a cocktail',
-            'description': 'md5:545299bda6abf87e5ec666548c6a9448',
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': 'The A.V. Club',
-            'uploader_id': 'TheAVClub',
+            'uploader_id': 'the-av-club',
          },
      }, {
          'url': 'http://www.onionstudios.com/embed?id=2855&autoplay=true',
@@ -37,40 +41,38 @@ class OnionStudiosIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(
-            'http://www.onionstudios.com/embed?id=%s' % video_id, video_id)
+        video_data = self._download_json(
+            'http://www.onionstudios.com/video/%s.json' % video_id, video_id)
+
+        title = video_data['title']
  
          formats = []
-        for src in re.findall(r'<source[^>]+src="([^"]+)"', webpage):
-            if determine_ext(src) != 'm3u8':  # m3u8 always results in 403
+        for source in video_data.get('sources', []):
+            source_url = source.get('url')
+            if not source_url:
+                continue
+            ext = mimetype2ext(source.get('content_type')) or determine_ext(source_url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    source_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+            else:
+                tbr = int_or_none(source.get('bitrate'))
                  formats.append({
-                    'url': src,
+                    'format_id': ext + ('-%d' % tbr if tbr else ''),
+                    'url': source_url,
+                    'width': int_or_none(source.get('width')),
+                    'tbr': tbr,
+                    'ext': ext,
                  })
          self._sort_formats(formats)
  
-        title = self._search_regex(
-            r'share_title\s*=\s*(["\'])(?P<title>[^\1]+?)\1',
-            webpage, 'title', group='title')
-        description = self._search_regex(
-            r'share_description\s*=\s*(["\'])(?P<description>[^\1]+?)\1',
-            webpage, 'description', default=None, group='description')
-        thumbnail = self._search_regex(
-            r'poster\s*=\s*(["\'])(?P<thumbnail>[^\1]+?)\1',
-            webpage, 'thumbnail', default=False, group='thumbnail')
-
-        uploader_id = self._search_regex(
-            r'twitter_handle\s*=\s*(["\'])(?P<uploader_id>[^\1]+?)\1',
-            webpage, 'uploader id', fatal=False, group='uploader_id')
-        uploader = self._search_regex(
-            r'window\.channelName\s*=\s*(["\'])Embedded:(?P<uploader>[^\1]+?)\1',
-            webpage, 'uploader', default=False, group='uploader')
-
          return {
              'id': video_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
+            'thumbnail': video_data.get('poster_url'),
+            'uploader': video_data.get('channel_name'),
+            'uploader_id': video_data.get('channel_slug'),
+            'duration': float_or_none(video_data.get('duration', 1000)),
+            'tags': video_data.get('tags'),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/ooyala.py b/youtube_dl/extractor/ooyala.py

index 16f040191aa31bd9e8dd49b37a42085c2b340582..c2807d0f61b2ab5134944bd0c79b2030df80d3a1 100644 (file)
--- a/youtube_dl/extractor/ooyala.py
+++ b/youtube_dl/extractor/ooyala.py
@@ -8,6 +8,7 @@ from ..utils import (
      float_or_none,
      ExtractorError,
      unsmuggle_url,
+    determine_ext,
  )
  from ..compat import compat_urllib_parse_urlencode
  
@@ -15,71 +16,80 @@ from ..compat import compat_urllib_parse_urlencode
  class OoyalaBaseIE(InfoExtractor):
      _PLAYER_BASE = 'http://player.ooyala.com/'
      _CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/'
-    _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v1/authorization/embed_code/%s/%s?'
+    _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s?'
  
-    def _extract(self, content_tree_url, video_id, domain='example.org'):
+    def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None):
          content_tree = self._download_json(content_tree_url, video_id)['content_tree']
          metadata = content_tree[list(content_tree)[0]]
          embed_code = metadata['embed_code']
          pcode = metadata.get('asset_pcode') or embed_code
-        video_info = {
-            'id': embed_code,
-            'title': metadata['title'],
-            'description': metadata.get('description'),
-            'thumbnail': metadata.get('thumbnail_image') or metadata.get('promo_image'),
-            'duration': float_or_none(metadata.get('duration'), 1000),
-        }
+        title = metadata['title']
+
+        auth_data = self._download_json(
+            self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) +
+            compat_urllib_parse_urlencode({
+                'domain': domain,
+                'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds',
+            }), video_id)
+
+        cur_auth_data = auth_data['authorization_data'][embed_code]
  
          urls = []
          formats = []
-        for supported_format in ('mp4', 'm3u8', 'hds', 'rtmp'):
-            auth_data = self._download_json(
-                self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) +
-                compat_urllib_parse_urlencode({
-                    'domain': domain,
-                    'supportedFormats': supported_format
-                }),
-                video_id, 'Downloading %s JSON' % supported_format)
-
-            cur_auth_data = auth_data['authorization_data'][embed_code]
-
-            if cur_auth_data['authorized']:
-                for stream in cur_auth_data['streams']:
-                    url = base64.b64decode(
-                        stream['url']['data'].encode('ascii')).decode('utf-8')
-                    if url in urls:
-                        continue
-                    urls.append(url)
-                    delivery_type = stream['delivery_type']
-                    if delivery_type == 'hls' or '.m3u8' in url:
-                        formats.extend(self._extract_m3u8_formats(
-                            url, embed_code, 'mp4', 'm3u8_native',
-                            m3u8_id='hls', fatal=False))
-                    elif delivery_type == 'hds' or '.f4m' in url:
-                        formats.extend(self._extract_f4m_formats(
-                            url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
-                    elif '.smil' in url:
-                        formats.extend(self._extract_smil_formats(
-                            url, embed_code, fatal=False))
-                    else:
-                        formats.append({
-                            'url': url,
-                            'ext': stream.get('delivery_type'),
-                            'vcodec': stream.get('video_codec'),
-                            'format_id': delivery_type,
-                            'width': int_or_none(stream.get('width')),
-                            'height': int_or_none(stream.get('height')),
-                            'abr': int_or_none(stream.get('audio_bitrate')),
-                            'vbr': int_or_none(stream.get('video_bitrate')),
-                            'fps': float_or_none(stream.get('framerate')),
-                        })
-            else:
-                raise ExtractorError('%s said: %s' % (
-                    self.IE_NAME, cur_auth_data['message']), expected=True)
+        if cur_auth_data['authorized']:
+            for stream in cur_auth_data['streams']:
+                s_url = base64.b64decode(
+                    stream['url']['data'].encode('ascii')).decode('utf-8')
+                if s_url in urls:
+                    continue
+                urls.append(s_url)
+                ext = determine_ext(s_url, None)
+                delivery_type = stream['delivery_type']
+                if delivery_type == 'hls' or ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native',
+                        m3u8_id='hls', fatal=False))
+                elif delivery_type == 'hds' or ext == 'f4m':
+                    formats.extend(self._extract_f4m_formats(
+                        s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
+                elif ext == 'smil':
+                    formats.extend(self._extract_smil_formats(
+                        s_url, embed_code, fatal=False))
+                else:
+                    formats.append({
+                        'url': s_url,
+                        'ext': ext or stream.get('delivery_type'),
+                        'vcodec': stream.get('video_codec'),
+                        'format_id': delivery_type,
+                        'width': int_or_none(stream.get('width')),
+                        'height': int_or_none(stream.get('height')),
+                        'abr': int_or_none(stream.get('audio_bitrate')),
+                        'vbr': int_or_none(stream.get('video_bitrate')),
+                        'fps': float_or_none(stream.get('framerate')),
+                    })
+        else:
+            raise ExtractorError('%s said: %s' % (
+                self.IE_NAME, cur_auth_data['message']), expected=True)
          self._sort_formats(formats)
  
-        video_info['formats'] = formats
-        return video_info
+        subtitles = {}
+        for lang, sub in metadata.get('closed_captions_vtt', {}).get('captions', {}).items():
+            sub_url = sub.get('url')
+            if not sub_url:
+                continue
+            subtitles[lang] = [{
+                'url': sub_url,
+            }]
+
+        return {
+            'id': embed_code,
+            'title': title,
+            'description': metadata.get('description'),
+            'thumbnail': metadata.get('thumbnail_image') or metadata.get('promo_image'),
+            'duration': float_or_none(metadata.get('duration'), 1000),
+            'subtitles': subtitles,
+            'formats': formats,
+        }
  
  
  class OoyalaIE(OoyalaBaseIE):
@@ -96,6 +106,8 @@ class OoyalaIE(OoyalaBaseIE):
                  'description': 'How badly damaged does a drive have to be to defeat Russell and his crew? Apparently, smashed to bits.',
                  'duration': 853.386,
              },
+            # The video in the original webpage now uses PlayWire
+            'skip': 'Ooyala said: movie expired',
          }, {
              # Only available for ipad
              'url': 'http://player.ooyala.com/player.js?embedCode=x1b3lqZDq9y_7kMyC2Op5qo-p077tXD0',
@@ -133,8 +145,9 @@ class OoyalaIE(OoyalaBaseIE):
          url, smuggled_data = unsmuggle_url(url, {})
          embed_code = self._match_id(url)
          domain = smuggled_data.get('domain')
+        supportedformats = smuggled_data.get('supportedformats')
          content_tree_url = self._CONTENT_TREE_BASE + 'embed_code/%s/%s' % (embed_code, embed_code)
-        return self._extract(content_tree_url, embed_code, domain)
+        return self._extract(content_tree_url, embed_code, domain, supportedformats)
  
  
  class OoyalaExternalIE(OoyalaBaseIE):
diff --git a/youtube_dl/extractor/openload.py b/youtube_dl/extractor/openload.py

index 4468f31fcae074090346d134180fee98752b7822..7f19b1ba5c3c355977c694334694b51f71a9840c 100644 (file)
--- a/youtube_dl/extractor/openload.py
+++ b/youtube_dl/extractor/openload.py
@@ -1,18 +1,25 @@
  # coding: utf-8
-from __future__ import unicode_literals
+from __future__ import unicode_literals, division
  
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_chr
+from ..compat import (
+    compat_chr,
+    compat_ord,
+)
  from ..utils import (
-    encode_base_n,
+    determine_ext,
      ExtractorError,
  )
+from ..jsinterp import (
+    JSInterpreter,
+    _NAME_RE
+)
  
  
  class OpenloadIE(InfoExtractor):
-    _VALID_URL = r'https://openload.(?:co|io)/(?:f|embed)/(?P<id>[a-zA-Z0-9-]+)'
+    _VALID_URL = r'https?://openload\.(?:co|io)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
  
      _TESTS = [{
          'url': 'https://openload.co/f/kUEfGclsU9o',
@@ -23,85 +30,130 @@ class OpenloadIE(InfoExtractor):
              'title': 'skyrim_no-audio_1080.mp4',
              'thumbnail': 're:^https?://.*\.jpg$',
          },
+    }, {
+        'url': 'https://openload.co/embed/rjC09fkPLYs',
+        'info_dict': {
+            'id': 'rjC09fkPLYs',
+            'ext': 'mp4',
+            'title': 'movie.mp4',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'subtitles': {
+                'en': [{
+                    'ext': 'vtt',
+                }],
+            },
+        },
+        'params': {
+            'skip_download': True,  # test subtitles only
+        },
      }, {
          'url': 'https://openload.co/embed/kUEfGclsU9o/skyrim_no-audio_1080.mp4',
          'only_matching': True,
      }, {
          'url': 'https://openload.io/f/ZAn6oz-VZGE/',
          'only_matching': True,
+    }, {
+        'url': 'https://openload.co/f/_-ztPaZtMhM/',
+        'only_matching': True,
+    }, {
+        # unavailable via https://openload.co/f/Sxz5sADo82g/, different layout
+        # for title and ext
+        'url': 'https://openload.co/embed/Sxz5sADo82g/',
+        'only_matching': True,
      }]
  
-    @staticmethod
-    def openload_level2_debase(m):
-        radix, num = int(m.group(1)) + 27, int(m.group(2))
-        return '"' + encode_base_n(num, radix) + '"'
-
-    @classmethod
-    def openload_level2(cls, txt):
-        # The function name is ǃ \u01c3
-        # Using escaped unicode literals does not work in Python 3.2
-        return re.sub(r'ǃ\((\d+),(\d+)\)', cls.openload_level2_debase, txt, re.UNICODE).replace('"+"', '')
-
-    # Openload uses a variant of aadecode
-    # openload_decode and related functions are originally written by
-    # vitas@matfyz.cz and released with public domain
-    # See https://github.com/rg3/youtube-dl/issues/8489
-    @classmethod
-    def openload_decode(cls, txt):
-        symbol_table = [
-            ('_', '(ﾟДﾟ) [ﾟΘﾟ]'),
-            ('a', '(ﾟДﾟ) [ﾟωﾟﾉ]'),
-            ('b', '(ﾟДﾟ) [ﾟΘﾟﾉ]'),
-            ('c', '(ﾟДﾟ) [\'c\']'),
-            ('d', '(ﾟДﾟ) [ﾟｰﾟﾉ]'),
-            ('e', '(ﾟДﾟ) [ﾟДﾟﾉ]'),
-            ('f', '(ﾟДﾟ) [1]'),
-
-            ('o', '(ﾟДﾟ) [\'o\']'),
-            ('u', '(oﾟｰﾟo)'),
-            ('c', '(ﾟДﾟ) [\'c\']'),
-
-            ('7', '((ﾟｰﾟ) + (o^_^o))'),
-            ('6', '((o^_^o) +(o^_^o) +(c^_^o))'),
-            ('5', '((ﾟｰﾟ) + (ﾟΘﾟ))'),
-            ('4', '(-~3)'),
-            ('3', '(-~-~1)'),
-            ('2', '(-~1)'),
-            ('1', '(-~0)'),
-            ('0', '((c^_^o)-(c^_^o))'),
-        ]
+    def openload_decode(self, txt):
+        symbol_dict = {
+            '(ﾟДﾟ) [ﾟΘﾟ]': '_',
+            '(ﾟДﾟ) [ﾟωﾟﾉ]': 'a',
+            '(ﾟДﾟ) [ﾟΘﾟﾉ]': 'b',
+            '(ﾟДﾟ) [\'c\']': 'c',
+            '(ﾟДﾟ) [ﾟｰﾟﾉ]': 'd',
+            '(ﾟДﾟ) [ﾟДﾟﾉ]': 'e',
+            '(ﾟДﾟ) [1]': 'f',
+            '(ﾟДﾟ) [\'o\']': 'o',
+            '(oﾟｰﾟo)': 'u',
+            '(ﾟДﾟ) [\'c\']': 'c',
+            '((ﾟｰﾟ) + (o^_^o))': '7',
+            '((o^_^o) +(o^_^o) +(c^_^o))': '6',
+            '((ﾟｰﾟ) + (ﾟΘﾟ))': '5',
+            '(-~3)': '4',
+            '(-~-~1)': '3',
+            '(-~1)': '2',
+            '(-~0)': '1',
+            '((c^_^o)-(c^_^o))': '0',
+        }
          delim = '(ﾟДﾟ)[ﾟεﾟ]+'
+        end_token = '(ﾟДﾟ)[ﾟoﾟ]'
+        symbols = '|'.join(map(re.escape, symbol_dict.keys()))
+        txt = re.sub('(%s)\+\s?' % symbols, lambda m: symbol_dict[m.group(1)], txt)
          ret = ''
-        for aachar in txt.split(delim):
-            for val, pat in symbol_table:
-                aachar = aachar.replace(pat, val)
-            aachar = aachar.replace('+ ', '')
-            m = re.match(r'^\d+', aachar)
-            if m:
-                ret += compat_chr(int(m.group(0), 8))
-            else:
-                m = re.match(r'^u([\da-f]+)', aachar)
-                if m:
-                    ret += compat_chr(int(m.group(1), 16))
-        return cls.openload_level2(ret)
+        for aacode in re.findall(r'{0}\+\s?{1}(.*?){0}'.format(re.escape(end_token), re.escape(delim)), txt):
+            for aachar in aacode.split(delim):
+                if aachar.isdigit():
+                    ret += compat_chr(int(aachar, 8))
+                else:
+                    m = re.match(r'^u([\da-f]{4})$', aachar)
+                    if m:
+                        ret += compat_chr(int(m.group(1), 16))
+                    else:
+                        self.report_warning("Cannot decode: %s" % aachar)
+        return ret
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage('https://openload.co/embed/%s/' % video_id, video_id)
  
-        if 'File not found' in webpage:
+        if 'File not found' in webpage or 'deleted by the owner' in webpage:
              raise ExtractorError('File not found', expected=True)
  
-        code = self._search_regex(
-            r'<video[^>]+>\s*<script[^>]+>([^<]+)</script>',
-            webpage, 'JS code')
+        # The following decryption algorithm is written by @yokrysty and
+        # declared to be freely used in youtube-dl
+        # See https://github.com/rg3/youtube-dl/issues/10408
+        enc_data = self._html_search_regex(
+            r'<span[^>]*>([^<]+)</span>\s*<span[^>]*>[^<]+</span>\s*<span[^>]+id="streamurl"',
+            webpage, 'encrypted data')
+
+        enc_code = self._html_search_regex(r'<script[^>]+>(ﾟωﾟ[^<]+)</script>',
+                                           webpage, 'encrypted code')
+
+        js_code = self.openload_decode(enc_code)
+        jsi = JSInterpreter(js_code)
+
+        m_offset_fun = self._search_regex(r'slice\(0\s*-\s*(%s)\(\)' % _NAME_RE, js_code, 'javascript offset function')
+        m_diff_fun = self._search_regex(r'charCodeAt\(0\)\s*\+\s*(%s)\(\)' % _NAME_RE, js_code, 'javascript diff function')
+
+        offset = jsi.call_function(m_offset_fun)
+        diff = jsi.call_function(m_diff_fun)
  
-        video_url = self._search_regex(
-            r'return\s+"(https?://[^"]+)"', self.openload_decode(code), 'video URL')
+        video_url_chars = []
  
-        return {
+        for idx, c in enumerate(enc_data):
+            j = compat_ord(c)
+            if j >= 33 and j <= 126:
+                j = ((j + 14) % 94) + 33
+            if idx == len(enc_data) - offset:
+                j += diff
+            video_url_chars += compat_chr(j)
+
+        video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)
+
+        title = self._og_search_title(webpage, default=None) or self._search_regex(
+            r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
+            'title', default=None) or self._html_search_meta(
+            'description', webpage, 'title', fatal=True)
+
+        entries = self._parse_html5_media_entries(url, webpage, video_id)
+        subtitles = entries[0]['subtitles'] if entries else None
+
+        info_dict = {
              'id': video_id,
-            'title': self._og_search_title(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
+            'title': title,
+            'thumbnail': self._og_search_thumbnail(webpage, default=None),
              'url': video_url,
+            # Seems all videos have extensions in their titles
+            'ext': determine_ext(title),
+            'subtitles': subtitles,
          }
+
+        return info_dict
diff --git a/youtube_dl/extractor/ora.py b/youtube_dl/extractor/ora.py

index 8545fb1b88cbf29ae1acb999566ee9041335dc4a..1d42be39b3303c95952a8ec54a34abbb9d09f0b1 100644 (file)
--- a/youtube_dl/extractor/ora.py
+++ b/youtube_dl/extractor/ora.py
@@ -12,8 +12,8 @@ from ..utils import (
  
  
  class OraTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ora\.tv/([^/]+/)*(?P<id>[^/\?#]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?(?:ora\.tv|unsafespeech\.com)/([^/]+/)*(?P<id>[^/\?#]+)'
+    _TESTS = [{
          'url': 'https://www.ora.tv/larrykingnow/2015/12/16/vine-youtube-stars-zach-king-king-bach-on-their-viral-videos-0_36jupg6090pq',
          'md5': 'fa33717591c631ec93b04b0e330df786',
          'info_dict': {
@@ -22,7 +22,10 @@ class OraTVIE(InfoExtractor):
              'title': 'Vine & YouTube Stars Zach King & King Bach On Their Viral Videos!',
              'description': 'md5:ebbc5b1424dd5dba7be7538148287ac1',
          }
-    }
+    }, {
+        'url': 'http://www.unsafespeech.com/video/2016/5/10/student-self-censorship-and-the-thought-police-on-university-campuses-0_6622bnkppw4d',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
diff --git a/youtube_dl/extractor/orf.py b/youtube_dl/extractor/orf.py

index 66c75f8b3559752127c091d437e4764b7c722e9d..b4cce7ea9334c7bbaf9e617932189504dcd25121 100644 (file)
--- a/youtube_dl/extractor/orf.py
+++ b/youtube_dl/extractor/orf.py
@@ -1,28 +1,28 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import json
  import re
  import calendar
  import datetime
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      HEADRequest,
      unified_strdate,
-    ExtractorError,
      strip_jsonp,
      int_or_none,
      float_or_none,
      determine_ext,
      remove_end,
+    unescapeHTML,
  )
  
  
  class ORFTVthekIE(InfoExtractor):
      IE_NAME = 'orf:tvthek'
      IE_DESC = 'ORF TVthek'
-    _VALID_URL = r'https?://tvthek\.orf\.at/(?:programs/.+?/episodes|topics?/.+?|program/[^/]+)/(?P<id>\d+)'
+    _VALID_URL = r'https?://tvthek\.orf\.at/(?:[^/]+/)+(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://tvthek.orf.at/program/Aufgetischt/2745173/Aufgetischt-Mit-der-Steirischen-Tafelrunde/8891389',
@@ -40,37 +40,34 @@ class ORFTVthekIE(InfoExtractor):
          'skip': 'Blocked outside of Austria / Germany',
      }, {
          'url': 'http://tvthek.orf.at/topic/Im-Wandel-der-Zeit/8002126/Best-of-Ingrid-Thurnher/7982256',
-        'playlist': [{
-            'md5': '68f543909aea49d621dfc7703a11cfaf',
-            'info_dict': {
-                'id': '7982259',
-                'ext': 'mp4',
-                'title': 'Best of Ingrid Thurnher',
-                'upload_date': '20140527',
-                'description': 'Viele Jahre war Ingrid Thurnher das "Gesicht" der ZIB 2. Vor ihrem Wechsel zur ZIB 2 im jahr 1995 moderierte sie unter anderem "Land und Leute", "Österreich-Bild" und "Niederösterreich heute".',
-            }
-        }],
+        'info_dict': {
+            'id': '7982259',
+            'ext': 'mp4',
+            'title': 'Best of Ingrid Thurnher',
+            'upload_date': '20140527',
+            'description': 'Viele Jahre war Ingrid Thurnher das "Gesicht" der ZIB 2. Vor ihrem Wechsel zur ZIB 2 im Jahr 1995 moderierte sie unter anderem "Land und Leute", "Österreich-Bild" und "Niederösterreich heute".',
+        },
+        'params': {
+            'skip_download': True,  # rtsp downloads
+        },
          '_skip': 'Blocked outside of Austria / Germany',
+    }, {
+        'url': 'http://tvthek.orf.at/topic/Fluechtlingskrise/10463081/Heimat-Fremde-Heimat/13879132/Senioren-betreuen-Migrantenkinder/13879141',
+        'skip_download': True,
+    }, {
+        'url': 'http://tvthek.orf.at/profile/Universum/35429',
+        'skip_download': True,
      }]
  
      def _real_extract(self, url):
          playlist_id = self._match_id(url)
          webpage = self._download_webpage(url, playlist_id)
  
-        data_json = self._search_regex(
-            r'initializeAdworx\((.+?)\);\n', webpage, 'video info')
-        all_data = json.loads(data_json)
-
-        def get_segments(all_data):
-            for data in all_data:
-                if data['name'] in (
-                        'Tracker::EPISODE_DETAIL_PAGE_OVER_PROGRAM',
-                        'Tracker::EPISODE_DETAIL_PAGE_OVER_TOPIC'):
-                    return data['values']['segments']
-
-        sdata = get_segments(all_data)
-        if not sdata:
-            raise ExtractorError('Unable to extract segments')
+        data_jsb = self._parse_json(
+            self._search_regex(
+                r'<div[^>]+class=(["\']).*?VideoPlaylist.*?\1[^>]+data-jsb=(["\'])(?P<json>.+?)\2',
+                webpage, 'playlist', group='json'),
+            playlist_id, transform_source=unescapeHTML)['playlist']['videos']
  
          def quality_to_int(s):
              m = re.search('([0-9]+)', s)
@@ -79,8 +76,11 @@ class ORFTVthekIE(InfoExtractor):
              return int(m.group(1))
  
          entries = []
-        for sd in sdata:
-            video_id = sd['id']
+        for sd in data_jsb:
+            video_id, title = sd.get('id'), sd.get('title')
+            if not video_id or not title:
+                continue
+            video_id = compat_str(video_id)
              formats = [{
                  'preference': -10 if fd['delivery'] == 'hls' else None,
                  'format_id': '%s-%s-%s' % (
@@ -88,7 +88,7 @@ class ORFTVthekIE(InfoExtractor):
                  'url': fd['src'],
                  'protocol': fd['protocol'],
                  'quality': quality_to_int(fd['quality']),
-            } for fd in sd['playlist_item_array']['sources']]
+            } for fd in sd['sources']]
  
              # Check for geoblocking.
              # There is a property is_geoprotection, but that's always false
@@ -115,14 +115,24 @@ class ORFTVthekIE(InfoExtractor):
              self._check_formats(formats, video_id)
              self._sort_formats(formats)
  
-            upload_date = unified_strdate(sd['created_date'])
+            subtitles = {}
+            for sub in sd.get('subtitles', []):
+                sub_src = sub.get('src')
+                if not sub_src:
+                    continue
+                subtitles.setdefault(sub.get('lang', 'de-AT'), []).append({
+                    'url': sub_src,
+                })
+
+            upload_date = unified_strdate(sd.get('created_date'))
              entries.append({
                  '_type': 'video',
                  'id': video_id,
-                'title': sd['header'],
+                'title': title,
                  'formats': formats,
+                'subtitles': subtitles,
                  'description': sd.get('description'),
-                'duration': int(sd['duration_in_seconds']),
+                'duration': int_or_none(sd.get('duration_in_seconds')),
                  'upload_date': upload_date,
                  'thumbnail': sd.get('image_full_url'),
              })
@@ -137,13 +147,16 @@ class ORFTVthekIE(InfoExtractor):
  class ORFOE1IE(InfoExtractor):
      IE_NAME = 'orf:oe1'
      IE_DESC = 'Radio Österreich 1'
-    _VALID_URL = r'https?://oe1\.orf\.at/(?:programm/|konsole.*?#\?track_id=)(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://oe1\.orf\.at/(?:programm/|konsole\?.*?\btrack_id=)(?P<id>[0-9]+)'
  
      # Audios on ORF radio are only available for 7 days, so we can't add tests.
-    _TEST = {
+    _TESTS = [{
          'url': 'http://oe1.orf.at/konsole?show=on_demand#?track_id=394211',
          'only_matching': True,
-    }
+    }, {
+        'url': 'http://oe1.orf.at/konsole?show=ondemand&track_id=443608&load_day=/programm/konsole/tag/20160726',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          show_id = self._match_id(url)
@@ -185,6 +198,7 @@ class ORFFM4IE(InfoExtractor):
              'timestamp': 1452456073,
              'upload_date': '20160110',
          },
+        'skip': 'Live streams on FM4 got deleted soon',
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/pandatv.py b/youtube_dl/extractor/pandatv.py

new file mode 100644 (file)

index 0000000..133cc9b
--- /dev/null
+++ b/youtube_dl/extractor/pandatv.py
@@ -0,0 +1,91 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    qualities,
+)
+
+
+class PandaTVIE(InfoExtractor):
+    IE_DESC = '熊猫TV'
+    _VALID_URL = r'http://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.panda.tv/10091',
+        'info_dict': {
+            'id': '10091',
+            'title': 're:.+',
+            'uploader': '囚徒',
+            'ext': 'flv',
+            'is_live': True,
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'skip': 'Live stream is offline',
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        config = self._download_json(
+            'http://www.panda.tv/api_room?roomid=%s' % video_id, video_id)
+
+        error_code = config.get('errno', 0)
+        if error_code is not 0:
+            raise ExtractorError(
+                '%s returned error %s: %s'
+                % (self.IE_NAME, error_code, config['errmsg']),
+                expected=True)
+
+        data = config['data']
+        video_info = data['videoinfo']
+
+        # 2 = live, 3 = offline
+        if video_info.get('status') != '2':
+            raise ExtractorError(
+                'Live stream is offline', expected=True)
+
+        title = data['roominfo']['name']
+        uploader = data.get('hostinfo', {}).get('name')
+        room_key = video_info['room_key']
+        stream_addr = video_info.get(
+            'stream_addr', {'OD': '1', 'HD': '1', 'SD': '1'})
+
+        # Reverse engineered from web player swf
+        # (http://s6.pdim.gs/static/07153e425f581151.swf at the moment of
+        # writing).
+        plflag0, plflag1 = video_info['plflag'].split('_')
+        plflag0 = int(plflag0) - 1
+        if plflag1 == '21':
+            plflag0 = 10
+            plflag1 = '4'
+        live_panda = 'live_panda' if plflag0 < 1 else ''
+
+        quality_key = qualities(['OD', 'HD', 'SD'])
+        suffix = ['_small', '_mid', '']
+        formats = []
+        for k, v in stream_addr.items():
+            if v != '1':
+                continue
+            quality = quality_key(k)
+            if quality <= 0:
+                continue
+            for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
+                formats.append({
+                    'url': 'http://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s'
+                    % (pl, plflag1, room_key, live_panda, suffix[quality], ext),
+                    'format_id': '%s-%s' % (k, ext),
+                    'quality': quality,
+                    'source_preference': pref,
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': self._live_title(title),
+            'uploader': uploader,
+            'formats': formats,
+            'is_live': True,
+        }
diff --git a/youtube_dl/extractor/pandoratv.py b/youtube_dl/extractor/pandoratv.py

index 8d49f5c4aff04954e773b9eb575af912130a8401..2b07958bb1f5815a162dadadff4f450f7ea0e97d 100644 (file)
--- a/youtube_dl/extractor/pandoratv.py
+++ b/youtube_dl/extractor/pandoratv.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/parliamentliveuk.py b/youtube_dl/extractor/parliamentliveuk.py

index 0a423a08f0dd9b746ecf708509a525b4c0bed541..ebdab8db9faa0c8911c53c5764a18456926b6a55 100644 (file)
--- a/youtube_dl/extractor/parliamentliveuk.py
+++ b/youtube_dl/extractor/parliamentliveuk.py
@@ -1,53 +1,43 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  
  
  class ParliamentLiveUKIE(InfoExtractor):
      IE_NAME = 'parliamentlive.tv'
      IE_DESC = 'UK parliament videos'
-    _VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)'
+    _VALID_URL = r'(?i)https?://(?:www\.)?parliamentlive\.tv/Event/Index/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
  
-    _TEST = {
-        'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia',
+    _TESTS = [{
+        'url': 'http://parliamentlive.tv/Event/Index/c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
          'info_dict': {
-            'id': '15121',
-            'ext': 'asf',
-            'title': 'hoc home affairs committee, 18 mar 2014.pm',
-            'description': 'md5:033b3acdf83304cd43946b2d5e5798d1',
+            'id': 'c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
+            'ext': 'mp4',
+            'title': 'Home Affairs Committee',
+            'uploader_id': 'FFMPEG-01',
+            'timestamp': 1422696664,
+            'upload_date': '20150131',
          },
-        'params': {
-            'skip_download': True,  # Requires mplayer (mms)
-        }
-    }
+    }, {
+        'url': 'http://parliamentlive.tv/event/index/3f24936f-130f-40bf-9a5d-b3d6479da6a4',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        webpage = self._download_webpage(url, video_id)
-
-        asx_url = self._html_search_regex(
-            r'embed.*?src="([^"]+)" name="MediaPlayer"', webpage,
-            'metadata URL')
-        asx = self._download_xml(asx_url, video_id, 'Downloading ASX metadata')
-        video_url = asx.find('.//REF').attrib['HREF']
-
-        title = self._search_regex(
-            r'''(?x)player\.setClipDetails\(
-                (?:(?:[0-9]+|"[^"]+"),\s*){2}
-                "([^"]+",\s*"[^"]+)"
-                ''',
-            webpage, 'title').replace('", "', ', ')
-        description = self._html_search_regex(
-            r'(?s)<span id="MainContentPlaceHolder_CaptionsBlock_WitnessInfo">(.*?)</span>',
-            webpage, 'description')
-
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(
+            'http://vodplayer.parliamentlive.tv/?mid=' + video_id, video_id)
+        widget_config = self._parse_json(self._search_regex(
+            r'kWidgetConfig\s*=\s*({.+});',
+            webpage, 'kaltura widget config'), video_id)
+        kaltura_url = 'kaltura:%s:%s' % (widget_config['wid'][1:], widget_config['entry_id'])
+        event_title = self._download_json(
+            'http://parliamentlive.tv/Event/GetShareVideo/' + video_id, video_id)['event']['title']
          return {
+            '_type': 'url_transparent',
              'id': video_id,
-            'ext': 'asf',
-            'url': video_url,
-            'title': title,
-            'description': description,
+            'title': event_title,
+            'description': '',
+            'url': kaltura_url,
+            'ie_key': 'Kaltura',
          }
diff --git a/youtube_dl/extractor/patreon.py b/youtube_dl/extractor/patreon.py

index 22975066516a0d37e74c9c520dd4faf0a68305a6..a6a2c273f240db52c967a12a96484261bd37664a 100644 (file)
--- a/youtube_dl/extractor/patreon.py
+++ b/youtube_dl/extractor/patreon.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/pbs.py b/youtube_dl/extractor/pbs.py

index f43e3a146e7bd35d9a99ab730289f4a1d4f5b91c..b490ef74c5fb768751d4598ff88e70a13d41c060 100644 (file)
--- a/youtube_dl/extractor/pbs.py
+++ b/youtube_dl/extractor/pbs.py
@@ -4,13 +4,13 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_HTTPError
  from ..utils import (
      ExtractorError,
      determine_ext,
      int_or_none,
      js_to_json,
      strip_jsonp,
+    strip_or_none,
      unified_strdate,
      US_RATINGS,
  )
@@ -196,31 +196,25 @@ class PBSIE(InfoExtractor):
      _TESTS = [
          {
              'url': 'http://www.pbs.org/tpt/constitution-usa-peter-sagal/watch/a-more-perfect-union/',
-            'md5': 'ce1888486f0908d555a8093cac9a7362',
+            'md5': '173dc391afd361fa72eab5d3d918968d',
              'info_dict': {
                  'id': '2365006249',
                  'ext': 'mp4',
                  'title': 'Constitution USA with Peter Sagal - A More Perfect Union',
-                'description': 'md5:36f341ae62e251b8f5bd2b754b95a071',
+                'description': 'md5:31b664af3c65fd07fa460d306b837d00',
                  'duration': 3190,
              },
-            'params': {
-                'skip_download': True,  # requires ffmpeg
-            },
          },
          {
              'url': 'http://www.pbs.org/wgbh/pages/frontline/losing-iraq/',
-            'md5': '143c98aa54a346738a3d78f54c925321',
+            'md5': '6f722cb3c3982186d34b0f13374499c7',
              'info_dict': {
                  'id': '2365297690',
                  'ext': 'mp4',
                  'title': 'FRONTLINE - Losing Iraq',
-                'description': 'md5:4d3eaa01f94e61b3e73704735f1196d9',
+                'description': 'md5:5979a4d069b157f622d02bff62fbe654',
                  'duration': 5050,
              },
-            'params': {
-                'skip_download': True,  # requires ffmpeg
-            }
          },
          {
              'url': 'http://www.pbs.org/newshour/bb/education-jan-june12-cyberschools_02-23/',
@@ -229,7 +223,7 @@ class PBSIE(InfoExtractor):
                  'id': '2201174722',
                  'ext': 'mp4',
                  'title': 'PBS NewsHour - Cyber Schools Gain Popularity, but Quality Questions Persist',
-                'description': 'md5:95a19f568689d09a166dff9edada3301',
+                'description': 'md5:86ab9a3d04458b876147b355788b8781',
                  'duration': 801,
              },
          },
@@ -244,9 +238,6 @@ class PBSIE(InfoExtractor):
                  'duration': 6559,
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
-            'params': {
-                'skip_download': True,  # requires ffmpeg
-            },
          },
          {
              'url': 'http://www.pbs.org/wgbh/nova/earth/killer-typhoon.html',
@@ -262,9 +253,6 @@ class PBSIE(InfoExtractor):
                  'upload_date': '20140122',
                  'age_limit': 10,
              },
-            'params': {
-                'skip_download': True,  # requires ffmpeg
-            },
          },
          {
              'url': 'http://www.pbs.org/wgbh/pages/frontline/united-states-of-secrets/',
@@ -280,7 +268,7 @@ class PBSIE(InfoExtractor):
                  'display_id': 'player',
                  'ext': 'mp4',
                  'title': 'American Experience - Death and the Civil War, Chapter 1',
-                'description': 'md5:1b80a74e0380ed2a4fb335026de1600d',
+                'description': 'md5:67fa89a9402e2ee7d08f53b920674c18',
                  'duration': 682,
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
@@ -290,6 +278,7 @@ class PBSIE(InfoExtractor):
          },
          {
              'url': 'http://www.pbs.org/video/2365245528/',
+            'md5': '115223d41bd55cda8ae5cd5ed4e11497',
              'info_dict': {
                  'id': '2365245528',
                  'display_id': '2365245528',
@@ -299,27 +288,22 @@ class PBSIE(InfoExtractor):
                  'duration': 6851,
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
-            'params': {
-                'skip_download': True,  # requires ffmpeg
-            },
          },
          {
              # Video embedded in iframe containing angle brackets as attribute's value (e.g.
              # "<iframe style='position: absolute;<br />\ntop: 0; left: 0;' ...", see
              # https://github.com/rg3/youtube-dl/issues/7059)
              'url': 'http://www.pbs.org/food/features/a-chefs-life-season-3-episode-5-prickly-business/',
+            'md5': '59b0ef5009f9ac8a319cc5efebcd865e',
              'info_dict': {
                  'id': '2365546844',
                  'display_id': 'a-chefs-life-season-3-episode-5-prickly-business',
                  'ext': 'mp4',
                  'title': "A Chef's Life - Season 3, Ep. 5: Prickly Business",
-                'description': 'md5:54033c6baa1f9623607c6e2ed245888b',
+                'description': 'md5:c0ff7475a4b70261c7e58f493c2792a5',
                  'duration': 1480,
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
-            'params': {
-                'skip_download': True,  # requires ffmpeg
-            },
          },
          {
              # Frontline video embedded via flp2012.js
@@ -329,7 +313,7 @@ class PBSIE(InfoExtractor):
                  'display_id': 'the-atomic-artists',
                  'ext': 'mp4',
                  'title': 'FRONTLINE - The Atomic Artists',
-                'description': 'md5:1a2481e86b32b2e12ec1905dd473e2c1',
+                'description': 'md5:f677e4520cfacb4a5ce1471e31b57800',
                  'duration': 723,
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
@@ -340,6 +324,7 @@ class PBSIE(InfoExtractor):
          {
              # Serves hd only via wigget/partnerplayer page
              'url': 'http://www.pbs.org/video/2365641075/',
+            'md5': 'fdf907851eab57211dd589cf12006666',
              'info_dict': {
                  'id': '2365641075',
                  'ext': 'mp4',
@@ -348,9 +333,6 @@ class PBSIE(InfoExtractor):
                  'thumbnail': 're:^https?://.*\.jpg$',
                  'formats': 'mincount:8',
              },
-            'params': {
-                'skip_download': True,  # requires ffmpeg
-            },
          },
          {
              'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
@@ -371,11 +353,16 @@ class PBSIE(InfoExtractor):
      def _extract_webpage(self, url):
          mobj = re.match(self._VALID_URL, url)
  
+        description = None
+
          presumptive_id = mobj.group('presumptive_id')
          display_id = presumptive_id
          if presumptive_id:
              webpage = self._download_webpage(url, display_id)
  
+            description = strip_or_none(self._og_search_description(
+                webpage, default=None) or self._html_search_meta(
+                'description', webpage, default=None))
              upload_date = unified_strdate(self._search_regex(
                  r'<input type="hidden" id="air_date_[0-9]+" value="([^"]+)"',
                  webpage, 'upload date', default=None))
@@ -388,7 +375,7 @@ class PBSIE(InfoExtractor):
              for p in MULTI_PART_REGEXES:
                  tabbed_videos = re.findall(p, webpage)
                  if tabbed_videos:
-                    return tabbed_videos, presumptive_id, upload_date
+                    return tabbed_videos, presumptive_id, upload_date, description
  
              MEDIA_ID_REGEXES = [
                  r"div\s*:\s*'videoembed'\s*,\s*mediaid\s*:\s*'(\d+)'",  # frontline video embed
@@ -400,7 +387,7 @@ class PBSIE(InfoExtractor):
              media_id = self._search_regex(
                  MEDIA_ID_REGEXES, webpage, 'media ID', fatal=False, default=None)
              if media_id:
-                return media_id, presumptive_id, upload_date
+                return media_id, presumptive_id, upload_date, description
  
              # Fronline video embedded via flp
              video_id = self._search_regex(
@@ -417,7 +404,7 @@ class PBSIE(InfoExtractor):
                      'http://www.pbs.org/wgbh/pages/frontline/.json/getdir/getdir%d.json' % prg_id,
                      presumptive_id, 'Downloading getdir JSON',
                      transform_source=strip_jsonp)
-                return getdir['mid'], presumptive_id, upload_date
+                return getdir['mid'], presumptive_id, upload_date, description
  
              for iframe in re.findall(r'(?s)<iframe(.+?)></iframe>', webpage):
                  url = self._search_regex(
@@ -441,10 +428,10 @@ class PBSIE(InfoExtractor):
              video_id = mobj.group('id')
              display_id = video_id
  
-        return video_id, display_id, None
+        return video_id, display_id, None, description
  
      def _real_extract(self, url):
-        video_id, display_id, upload_date = self._extract_webpage(url)
+        video_id, display_id, upload_date, description = self._extract_webpage(url)
  
          if isinstance(video_id, list):
              entries = [self.url_result(
@@ -466,17 +453,6 @@ class PBSIE(InfoExtractor):
                      redirects.append(redirect)
                      redirect_urls.add(redirect_url)
  
-        try:
-            video_info = self._download_json(
-                'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
-                display_id, 'Downloading video info JSON')
-            extract_redirect_urls(video_info)
-            info = video_info
-        except ExtractorError as e:
-            # videoInfo API may not work for some videos
-            if not isinstance(e.cause, compat_HTTPError) or e.cause.code != 404:
-                raise
-
          # Player pages may also serve different qualities
          for page in ('widget/partnerplayer', 'portalplayer'):
              player = self._download_webpage(
@@ -494,6 +470,7 @@ class PBSIE(InfoExtractor):
                          info = video_info
  
          formats = []
+        http_url = None
          for num, redirect in enumerate(redirects):
              redirect_id = redirect.get('eeid')
  
@@ -514,13 +491,41 @@ class PBSIE(InfoExtractor):
  
              if determine_ext(format_url) == 'm3u8':
                  formats.extend(self._extract_m3u8_formats(
-                    format_url, display_id, 'mp4', preference=1, m3u8_id='hls'))
+                    format_url, display_id, 'mp4', m3u8_id='hls', fatal=False))
              else:
                  formats.append({
                      'url': format_url,
                      'format_id': redirect_id,
                  })
+                if re.search(r'^https?://.*(?:\d+k|baseline)', format_url):
+                    http_url = format_url
          self._remove_duplicate_formats(formats)
+        m3u8_formats = list(filter(
+            lambda f: f.get('protocol') == 'm3u8' and f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
+            formats))
+        if http_url:
+            for m3u8_format in m3u8_formats:
+                bitrate = self._search_regex(r'(\d+)k', m3u8_format['url'], 'bitrate', default=None)
+                # Lower qualities (150k and 192k) are not available as HTTP formats (see [1]),
+                # we won't try extracting them.
+                # Since summer 2016 higher quality formats (4500k and 6500k) are also available
+                # albeit they are not documented in [2].
+                # 1. https://github.com/rg3/youtube-dl/commit/cbc032c8b70a038a69259378c92b4ba97b42d491#commitcomment-17313656
+                # 2. https://projects.pbs.org/confluence/display/coveapi/COVE+Video+Specifications
+                if not bitrate or int(bitrate) < 400:
+                    continue
+                f_url = re.sub(r'\d+k|baseline', bitrate + 'k', http_url)
+                # This may produce invalid links sometimes (e.g.
+                # http://www.pbs.org/wgbh/frontline/film/suicide-plan)
+                if not self._is_valid_url(f_url, display_id, 'http-%sk video' % bitrate):
+                    continue
+                f = m3u8_format.copy()
+                f.update({
+                    'url': f_url,
+                    'format_id': m3u8_format['format_id'].replace('hls', 'http'),
+                    'protocol': 'http',
+                })
+                formats.append(f)
          self._sort_formats(formats)
  
          rating_str = info.get('rating')
@@ -535,6 +540,19 @@ class PBSIE(InfoExtractor):
                  'ext': 'ttml',
                  'url': closed_captions_url,
              }]
+            mobj = re.search(r'/(\d+)_Encoded\.dfxp', closed_captions_url)
+            if mobj:
+                ttml_caption_suffix, ttml_caption_id = mobj.group(0, 1)
+                ttml_caption_id = int(ttml_caption_id)
+                subtitles['en'].extend([{
+                    'url': closed_captions_url.replace(
+                        ttml_caption_suffix, '/%d_Encoded.srt' % (ttml_caption_id + 1)),
+                    'ext': 'srt',
+                }, {
+                    'url': closed_captions_url.replace(
+                        ttml_caption_suffix, '/%d_Encoded.vtt' % (ttml_caption_id + 2)),
+                    'ext': 'vtt',
+                }])
  
          # info['title'] is often incomplete (e.g. 'Full Episode', 'Episode 5', etc)
          # Try turning it to 'program - title' naming scheme if possible
@@ -542,11 +560,14 @@ class PBSIE(InfoExtractor):
          if alt_title:
              info['title'] = alt_title + ' - ' + re.sub(r'^' + alt_title + '[\s\-:]+', '', info['title'])
  
+        description = info.get('description') or info.get(
+            'program', {}).get('description') or description
+
          return {
              'id': video_id,
              'display_id': display_id,
              'title': info['title'],
-            'description': info.get('description') or info.get('program', {}).get('description'),
+            'description': description,
              'thumbnail': info.get('image_url'),
              'duration': int_or_none(info.get('duration')),
              'age_limit': age_limit,
diff --git a/youtube_dl/extractor/people.py b/youtube_dl/extractor/people.py

new file mode 100644 (file)

index 0000000..9ecdbc1
--- /dev/null
+++ b/youtube_dl/extractor/people.py
@@ -0,0 +1,32 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class PeopleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?people\.com/people/videos/0,,(?P<id>\d+),00\.html'
+
+    _TEST = {
+        'url': 'http://www.people.com/people/videos/0,,20995451,00.html',
+        'info_dict': {
+            'id': 'ref:20995451',
+            'ext': 'mp4',
+            'title': 'Astronaut Love Triangle Victim Speaks Out: “The Crime in 2007 Hasn’t Defined Us”',
+            'description': 'Colleen Shipman speaks to PEOPLE for the first time about life after the attack',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 246.318,
+            'timestamp': 1458720585,
+            'upload_date': '20160323',
+            'uploader_id': '416418724',
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': ['BrightcoveNew'],
+    }
+
+    def _real_extract(self, url):
+        return self.url_result(
+            'http://players.brightcove.net/416418724/default_default/index.html?videoId=ref:%s'
+            % self._match_id(url), 'BrightcoveNew')
diff --git a/youtube_dl/extractor/periscope.py b/youtube_dl/extractor/periscope.py

index 514e9b4339be43b509f9c9a8a6d2b87187e5f056..0e362302425cbe504b33b90aa1937dc68b9e288a 100644 (file)
--- a/youtube_dl/extractor/periscope.py
+++ b/youtube_dl/extractor/periscope.py
@@ -1,12 +1,25 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
-from ..utils import parse_iso8601
+from ..utils import (
+    parse_iso8601,
+    unescapeHTML,
+)
+
+
+class PeriscopeBaseIE(InfoExtractor):
+    def _call_api(self, method, query, item_id):
+        return self._download_json(
+            'https://api.periscope.tv/api/v2/%s' % method,
+            item_id, query=query)
  
  
-class PeriscopeIE(InfoExtractor):
+class PeriscopeIE(PeriscopeBaseIE):
      IE_DESC = 'Periscope'
+    IE_NAME = 'periscope'
      _VALID_URL = r'https?://(?:www\.)?periscope\.tv/[^/]+/(?P<id>[^/?#]+)'
      # Alive example URLs can be found here http://onperiscope.com/
      _TESTS = [{
@@ -30,19 +43,26 @@ class PeriscopeIE(InfoExtractor):
          'only_matching': True,
      }]
  
-    def _call_api(self, method, value):
-        return self._download_json(
-            'https://api.periscope.tv/api/v2/%s?broadcast_id=%s' % (method, value), value)
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src=([\'"])(?P<url>(?:https?:)?//(?:www\.)?periscope\.tv/(?:(?!\1).)+)\1', webpage)
+        if mobj:
+            return mobj.group('url')
  
      def _real_extract(self, url):
          token = self._match_id(url)
  
-        broadcast_data = self._call_api('getBroadcastPublic', token)
+        broadcast_data = self._call_api(
+            'getBroadcastPublic', {'broadcast_id': token}, token)
          broadcast = broadcast_data['broadcast']
          status = broadcast['status']
  
-        uploader = broadcast.get('user_display_name') or broadcast_data.get('user', {}).get('display_name')
-        uploader_id = broadcast.get('user_id') or broadcast_data.get('user', {}).get('id')
+        user = broadcast_data.get('user', {})
+
+        uploader = broadcast.get('user_display_name') or user.get('display_name')
+        uploader_id = (broadcast.get('username') or user.get('username') or
+                       broadcast.get('user_id') or user.get('id'))
  
          title = '%s - %s' % (uploader, status) if uploader else status
          state = broadcast.get('state').lower()
@@ -54,7 +74,8 @@ class PeriscopeIE(InfoExtractor):
              'url': broadcast[image],
          } for image in ('image_url', 'image_url_small') if broadcast.get(image)]
  
-        stream = self._call_api('getAccessPublic', token)
+        stream = self._call_api(
+            'getAccessPublic', {'broadcast_id': token}, token)
  
          formats = []
          for format_id in ('replay', 'rtmp', 'hls', 'https_hls'):
@@ -66,7 +87,7 @@ class PeriscopeIE(InfoExtractor):
                  'ext': 'flv' if format_id == 'rtmp' else 'mp4',
              }
              if format_id != 'rtmp':
-                f['protocol'] = 'm3u8_native' if state == 'ended' else 'm3u8'
+                f['protocol'] = 'm3u8_native' if state in ('ended', 'timed_out') else 'm3u8'
              formats.append(f)
          self._sort_formats(formats)
  
@@ -79,3 +100,54 @@ class PeriscopeIE(InfoExtractor):
              'thumbnails': thumbnails,
              'formats': formats,
          }
+
+
+class PeriscopeUserIE(PeriscopeBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?periscope\.tv/(?P<id>[^/]+)/?$'
+    IE_DESC = 'Periscope user videos'
+    IE_NAME = 'periscope:user'
+
+    _TEST = {
+        'url': 'https://www.periscope.tv/LularoeHusbandMike/',
+        'info_dict': {
+            'id': 'LularoeHusbandMike',
+            'title': 'LULAROE HUSBAND MIKE',
+            'description': 'md5:6cf4ec8047768098da58e446e82c82f0',
+        },
+        # Periscope only shows videos in the last 24 hours, so it's possible to
+        # get 0 videos
+        'playlist_mincount': 0,
+    }
+
+    def _real_extract(self, url):
+        user_name = self._match_id(url)
+
+        webpage = self._download_webpage(url, user_name)
+
+        data_store = self._parse_json(
+            unescapeHTML(self._search_regex(
+                r'data-store=(["\'])(?P<data>.+?)\1',
+                webpage, 'data store', default='{}', group='data')),
+            user_name)
+
+        user = list(data_store['UserCache']['users'].values())[0]['user']
+        user_id = user['id']
+        session_id = data_store['SessionToken']['public']['broadcastHistory']['token']['session_id']
+
+        broadcasts = self._call_api(
+            'getUserBroadcastsPublic',
+            {'user_id': user_id, 'session_id': session_id},
+            user_name)['broadcasts']
+
+        broadcast_ids = [
+            broadcast['id'] for broadcast in broadcasts if broadcast.get('id')]
+
+        title = user.get('display_name') or user.get('username') or user_name
+        description = user.get('description')
+
+        entries = [
+            self.url_result(
+                'https://www.periscope.tv/%s/%s' % (user_name, broadcast_id))
+            for broadcast_id in broadcast_ids]
+
+        return self.playlist_result(entries, user_id, title, description)
diff --git a/youtube_dl/extractor/pladform.py b/youtube_dl/extractor/pladform.py

index bc559d1df289fca39b96f5cfc5519bf6acb8bbb3..77e1211d6095cf17464ce09a27b756157b4931e9 100644 (file)
--- a/youtube_dl/extractor/pladform.py
+++ b/youtube_dl/extractor/pladform.py
@@ -49,7 +49,7 @@ class PladformIE(InfoExtractor):
      @staticmethod
      def _extract_url(webpage):
          mobj = re.search(
-            r'<iframe[^>]+src="(?P<url>(?:https?:)?//out\.pladform\.ru/player\?.+?)"', webpage)
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//out\.pladform\.ru/player\?.+?)\1', webpage)
          if mobj:
              return mobj.group('url')
  
diff --git a/youtube_dl/extractor/planetaplay.py b/youtube_dl/extractor/planetaplay.py

deleted file mode 100644 (file)

index 06505e9..0000000
--- a/youtube_dl/extractor/planetaplay.py
+++ /dev/null
@@ -1,61 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import ExtractorError
-
-
-class PlanetaPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?planetaplay\.com/\?sng=(?P<id>[0-9]+)'
-    _API_URL = 'http://planetaplay.com/action/playlist/?sng={0:}'
-    _THUMBNAIL_URL = 'http://planetaplay.com/img/thumb/{thumb:}'
-    _TEST = {
-        'url': 'http://planetaplay.com/?sng=3586',
-        'md5': '9d569dceb7251a4e01355d5aea60f9db',
-        'info_dict': {
-            'id': '3586',
-            'ext': 'flv',
-            'title': 'md5:e829428ee28b1deed00de90de49d1da1',
-        },
-        'skip': 'Not accessible from Travis CI server',
-    }
-
-    _SONG_FORMATS = {
-        'lq': (0, 'http://www.planetaplay.com/videoplayback/{med_hash:}'),
-        'hq': (1, 'http://www.planetaplay.com/videoplayback/hi/{med_hash:}'),
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        response = self._download_json(
-            self._API_URL.format(video_id), video_id)['response']
-        try:
-            data = response.get('data')[0]
-        except IndexError:
-            raise ExtractorError(
-                '%s: failed to get the playlist' % self.IE_NAME, expected=True)
-
-        title = '{song_artists:} - {sng_name:}'.format(**data)
-        thumbnail = self._THUMBNAIL_URL.format(**data)
-
-        formats = []
-        for format_id, (quality, url_template) in self._SONG_FORMATS.items():
-            formats.append({
-                'format_id': format_id,
-                'url': url_template.format(**data),
-                'quality': quality,
-                'ext': 'flv',
-            })
-
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-            'thumbnail': thumbnail,
-        }
diff --git a/youtube_dl/extractor/played.py b/youtube_dl/extractor/played.py

deleted file mode 100644 (file)

index 57c875e..0000000
--- a/youtube_dl/extractor/played.py
+++ /dev/null
@@ -1,60 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-import os.path
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    sanitized_Request,
-    urlencode_postdata,
-)
-
-
-class PlayedIE(InfoExtractor):
-    IE_NAME = 'played.to'
-    _VALID_URL = r'https?://(?:www\.)?played\.to/(?P<id>[a-zA-Z0-9_-]+)'
-
-    _TEST = {
-        'url': 'http://played.to/j2f2sfiiukgt',
-        'md5': 'c2bd75a368e82980e7257bf500c00637',
-        'info_dict': {
-            'id': 'j2f2sfiiukgt',
-            'ext': 'flv',
-            'title': 'youtube-dl_test_video.mp4',
-        },
-        'skip': 'Removed for copyright infringement.',  # oh wow
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        orig_webpage = self._download_webpage(url, video_id)
-
-        m_error = re.search(
-            r'(?s)Reason for deletion:.*?<b class="err"[^>]*>(?P<msg>[^<]+)</b>', orig_webpage)
-        if m_error:
-            raise ExtractorError(m_error.group('msg'), expected=True)
-
-        data = self._hidden_inputs(orig_webpage)
-
-        self._sleep(2, video_id)
-
-        post = urlencode_postdata(data)
-        headers = {
-            b'Content-Type': b'application/x-www-form-urlencoded',
-        }
-        req = sanitized_Request(url, post, headers)
-        webpage = self._download_webpage(
-            req, video_id, note='Downloading video page ...')
-
-        title = os.path.splitext(data['fname'])[0]
-
-        video_url = self._search_regex(
-            r'file: "?(.+?)",', webpage, 'video URL')
-
-        return {
-            'id': video_id,
-            'title': title,
-            'url': video_url,
-        }
diff --git a/youtube_dl/extractor/plays.py b/youtube_dl/extractor/plays.py

index c3c38cf4ac07787e520c7c2c7eac7da1ed2aa8b4..ddfc6f1486c4b49185bf68b3be3ff9ba9e957633 100644 (file)
--- a/youtube_dl/extractor/plays.py
+++ b/youtube_dl/extractor/plays.py
@@ -8,30 +8,31 @@ from ..utils import int_or_none
  
  
  class PlaysTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?plays\.tv/video/(?P<id>[0-9a-f]{18})'
-    _TEST = {
-        'url': 'http://plays.tv/video/56af17f56c95335490/when-you-outplay-the-azir-wall',
+    _VALID_URL = r'https?://(?:www\.)?plays\.tv/(?:video|embeds)/(?P<id>[0-9a-f]{18})'
+    _TESTS = [{
+        'url': 'https://plays.tv/video/56af17f56c95335490/when-you-outplay-the-azir-wall',
          'md5': 'dfeac1198506652b5257a62762cec7bc',
          'info_dict': {
              'id': '56af17f56c95335490',
              'ext': 'mp4',
-            'title': 'When you outplay the Azir wall',
+            'title': 'Bjergsen - When you outplay the Azir wall',
              'description': 'Posted by Bjergsen',
          }
-    }
+    }, {
+        'url': 'https://plays.tv/embeds/56af17f56c95335490',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(
+            'https://plays.tv/video/%s' % video_id, video_id)
+
+        info = self._search_json_ld(webpage, video_id,)
  
-        title = self._og_search_title(webpage)
-        content = self._parse_json(
-            self._search_regex(
-                r'R\.bindContent\(({.+?})\);', webpage,
-                'content'), video_id)['content']
          mpd_url, sources = re.search(
              r'(?s)<video[^>]+data-mpd="([^"]+)"[^>]*>(.+?)</video>',
-            content).groups()
+            webpage).groups()
          formats = self._extract_mpd_formats(
              self._proto_relative_url(mpd_url), video_id, mpd_id='DASH')
          for format_id, height, format_url in re.findall(r'<source\s+res="((\d+)h?)"\s+src="([^"]+)"', sources):
@@ -42,10 +43,11 @@ class PlaysTVIE(InfoExtractor):
              })
          self._sort_formats(formats)
  
-        return {
+        info.update({
              'id': video_id,
-            'title': title,
              'description': self._og_search_description(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
+            'thumbnail': info.get('thumbnail') or self._og_search_thumbnail(webpage),
              'formats': formats,
-        }
+        })
+
+        return info
diff --git a/youtube_dl/extractor/playvid.py b/youtube_dl/extractor/playvid.py

index 2eb4fd96dcbc071c1c2ecfb596ab20c4526018bd..79c2db08541e93d1d377c53c3e8adc415f4302e2 100644 (file)
--- a/youtube_dl/extractor/playvid.py
+++ b/youtube_dl/extractor/playvid.py
@@ -14,8 +14,8 @@ from ..utils import (
  
  
  class PlayvidIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)'
+    _TESTS = [{
          'url': 'http://www.playvid.com/watch/RnmBNgtrrJu',
          'md5': 'ffa2f6b2119af359f544388d8c01eb6c',
          'info_dict': {
@@ -24,8 +24,19 @@ class PlayvidIE(InfoExtractor):
              'title': 'md5:9256d01c6317e3f703848b5906880dc8',
              'duration': 82,
              'age_limit': 18,
-        }
-    }
+        },
+        'skip': 'Video removed due to ToS',
+    }, {
+        'url': 'http://www.playvid.com/watch/hwb0GpNkzgH',
+        'md5': '39d49df503ad7b8f23a4432cbf046477',
+        'info_dict': {
+            'id': 'hwb0GpNkzgH',
+            'ext': 'mp4',
+            'title': 'Ellen Euro Cutie Blond Takes a Sexy Survey Get Facial in The Park',
+            'age_limit': 18,
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/playwire.py b/youtube_dl/extractor/playwire.py

index 6d138ef25d2d5cec02a012f5a06af085a6c35d26..0bc7431189a0eed819fb85a6fbbdc1558a4b84ed 100644 (file)
--- a/youtube_dl/extractor/playwire.py
+++ b/youtube_dl/extractor/playwire.py
@@ -4,9 +4,8 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
-    xpath_text,
+    dict_get,
      float_or_none,
-    int_or_none,
  )
  
  
@@ -23,6 +22,19 @@ class PlaywireIE(InfoExtractor):
              'duration': 145.94,
          },
      }, {
+        # m3u8 in f4m
+        'url': 'http://config.playwire.com/21772/videos/v2/4840492/zeus.json',
+        'info_dict': {
+            'id': '4840492',
+            'ext': 'mp4',
+            'title': 'ITV EL SHOW FULL',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        # Multiple resolutions while bitrates missing
          'url': 'http://cdn.playwire.com/11625/embed/85228.html',
          'only_matching': True,
      }, {
@@ -48,25 +60,10 @@ class PlaywireIE(InfoExtractor):
          thumbnail = content.get('poster')
          src = content['media']['f4m']
  
-        f4m = self._download_xml(src, video_id)
-        base_url = xpath_text(f4m, './{http://ns.adobe.com/f4m/1.0}baseURL', 'base url', fatal=True)
-        formats = []
-        for media in f4m.findall('./{http://ns.adobe.com/f4m/1.0}media'):
-            media_url = media.get('url')
-            if not media_url:
-                continue
-            tbr = int_or_none(media.get('bitrate'))
-            width = int_or_none(media.get('width'))
-            height = int_or_none(media.get('height'))
-            f = {
-                'url': '%s/%s' % (base_url, media.attrib['url']),
-                'tbr': tbr,
-                'width': width,
-                'height': height,
-            }
-            if not (tbr or width or height):
-                f['quality'] = 1 if '-hd.' in media_url else 0
-            formats.append(f)
+        formats = self._extract_f4m_formats(src, video_id, m3u8_id='hls')
+        for a_format in formats:
+            if not dict_get(a_format, ['tbr', 'width', 'height']):
+                a_format['quality'] = 1 if '-hd.' in a_format['url'] else 0
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/pluralsight.py b/youtube_dl/extractor/pluralsight.py

index df03dd4198c1e0f264c641f516fc6de813b7dd7f..0ffd41ecd3b73bdaaba3b27cd1638cdf0383103e 100644 (file)
--- a/youtube_dl/extractor/pluralsight.py
+++ b/youtube_dl/extractor/pluralsight.py
@@ -1,9 +1,9 @@
  from __future__ import unicode_literals
  
-import re
+import collections
  import json
+import os
  import random
-import collections
  
  from .common import InfoExtractor
  from ..compat import (
@@ -11,22 +11,24 @@ from ..compat import (
      compat_urlparse,
  )
  from ..utils import (
+    dict_get,
      ExtractorError,
+    float_or_none,
      int_or_none,
      parse_duration,
      qualities,
-    sanitized_Request,
+    srt_subtitles_timecode,
      urlencode_postdata,
  )
  
  
  class PluralsightBaseIE(InfoExtractor):
-    _API_BASE = 'http://app.pluralsight.com'
+    _API_BASE = 'https://app.pluralsight.com'
  
  
  class PluralsightIE(PluralsightBaseIE):
      IE_NAME = 'pluralsight'
-    _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/training/player\?'
+    _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/(?:training/)?player\?'
      _LOGIN_URL = 'https://app.pluralsight.com/id/'
  
      _NETRC_MACHINE = 'pluralsight'
@@ -48,6 +50,9 @@ class PluralsightIE(PluralsightBaseIE):
          # available without pluralsight account
          'url': 'http://app.pluralsight.com/training/player?author=scott-allen&name=angularjs-get-started-m1-introduction&mode=live&clip=0&course=angularjs-get-started',
          'only_matching': True,
+    }, {
+        'url': 'https://app.pluralsight.com/player?course=ccna-intro-networking&author=ross-bagurdes&name=ccna-intro-networking-m06&clip=0',
+        'only_matching': True,
      }]
  
      def _real_initialize(self):
@@ -64,8 +69,8 @@ class PluralsightIE(PluralsightBaseIE):
          login_form = self._hidden_inputs(login_page)
  
          login_form.update({
-            'Username': username.encode('utf-8'),
-            'Password': password.encode('utf-8'),
+            'Username': username,
+            'Password': password,
          })
  
          post_url = self._search_regex(
@@ -75,12 +80,10 @@ class PluralsightIE(PluralsightBaseIE):
          if not post_url.startswith('http'):
              post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
  
-        request = sanitized_Request(
-            post_url, urlencode_postdata(login_form))
-        request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-
          response = self._download_webpage(
-            request, None, 'Logging in as %s' % username)
+            post_url, None, 'Logging in as %s' % username,
+            data=urlencode_postdata(login_form),
+            headers={'Content-Type': 'application/x-www-form-urlencoded'})
  
          error = self._search_regex(
              r'<span[^>]+class="field-validation-error"[^>]*>([^<]+)</span>',
@@ -91,34 +94,78 @@ class PluralsightIE(PluralsightBaseIE):
          if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
              raise ExtractorError('Unable to log in')
  
+    def _get_subtitles(self, author, clip_id, lang, name, duration, video_id):
+        captions_post = {
+            'a': author,
+            'cn': clip_id,
+            'lc': lang,
+            'm': name,
+        }
+        captions = self._download_json(
+            '%s/player/retrieve-captions' % self._API_BASE, video_id,
+            'Downloading captions JSON', 'Unable to download captions JSON',
+            fatal=False, data=json.dumps(captions_post).encode('utf-8'),
+            headers={'Content-Type': 'application/json;charset=utf-8'})
+        if captions:
+            return {
+                lang: [{
+                    'ext': 'json',
+                    'data': json.dumps(captions),
+                }, {
+                    'ext': 'srt',
+                    'data': self._convert_subtitles(duration, captions),
+                }]
+            }
+
+    @staticmethod
+    def _convert_subtitles(duration, subs):
+        srt = ''
+        TIME_OFFSET_KEYS = ('displayTimeOffset', 'DisplayTimeOffset')
+        TEXT_KEYS = ('text', 'Text')
+        for num, current in enumerate(subs):
+            current = subs[num]
+            start, text = (
+                float_or_none(dict_get(current, TIME_OFFSET_KEYS)),
+                dict_get(current, TEXT_KEYS))
+            if start is None or text is None:
+                continue
+            end = duration if num == len(subs) - 1 else float_or_none(
+                dict_get(subs[num + 1], TIME_OFFSET_KEYS))
+            if end is None:
+                continue
+            srt += os.linesep.join(
+                (
+                    '%d' % num,
+                    '%s --> %s' % (
+                        srt_subtitles_timecode(start),
+                        srt_subtitles_timecode(end)),
+                    text,
+                    os.linesep,
+                ))
+        return srt
+
      def _real_extract(self, url):
          qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
  
          author = qs.get('author', [None])[0]
          name = qs.get('name', [None])[0]
          clip_id = qs.get('clip', [None])[0]
-        course = qs.get('course', [None])[0]
+        course_name = qs.get('course', [None])[0]
  
-        if any(not f for f in (author, name, clip_id, course,)):
+        if any(not f for f in (author, name, clip_id, course_name,)):
              raise ExtractorError('Invalid URL', expected=True)
  
          display_id = '%s-%s' % (name, clip_id)
  
-        webpage = self._download_webpage(url, display_id)
+        parsed_url = compat_urlparse.urlparse(url)
  
-        modules = self._search_regex(
-            r'moduleCollection\s*:\s*new\s+ModuleCollection\((\[.+?\])\s*,\s*\$rootScope\)',
-            webpage, 'modules', default=None)
+        payload_url = compat_urlparse.urlunparse(parsed_url._replace(
+            netloc='app.pluralsight.com', path='player/api/v1/payload'))
  
-        if modules:
-            collection = self._parse_json(modules, display_id)
-        else:
-            # Webpage may be served in different layout (see
-            # https://github.com/rg3/youtube-dl/issues/7607)
-            collection = self._parse_json(
-                self._search_regex(
-                    r'var\s+initialState\s*=\s*({.+?});\n', webpage, 'initial state'),
-                display_id)['course']['modules']
+        course = self._download_json(
+            payload_url, display_id, headers={'Referer': url})['payload']['course']
+
+        collection = course['modules']
  
          module, clip = None, None
  
@@ -138,6 +185,8 @@ class PluralsightIE(PluralsightBaseIE):
          if not clip:
              raise ExtractorError('Unable to resolve clip')
  
+        title = '%s - %s' % (module['title'], clip['title'])
+
          QUALITIES = {
              'low': {'width': 640, 'height': 480},
              'medium': {'width': 848, 'height': 640},
@@ -157,8 +206,7 @@ class PluralsightIE(PluralsightBaseIE):
  
          # Some courses also offer widescreen resolution for high quality (see
          # https://github.com/rg3/youtube-dl/issues/7766)
-        widescreen = True if re.search(
-            r'courseSupportsWidescreenVideoFormats\s*:\s*true', webpage) else False
+        widescreen = course.get('supportsWideScreenVideoFormats') is True
          best_quality = 'high-widescreen' if widescreen else 'high'
          if widescreen:
              for allowed_quality in ALLOWED_QUALITIES:
@@ -187,22 +235,21 @@ class PluralsightIE(PluralsightBaseIE):
              for quality in qualities_:
                  f = QUALITIES[quality].copy()
                  clip_post = {
-                    'a': author,
-                    'cap': 'false',
-                    'cn': clip_id,
-                    'course': course,
-                    'lc': 'en',
-                    'm': name,
-                    'mt': ext,
-                    'q': '%dx%d' % (f['width'], f['height']),
+                    'author': author,
+                    'includeCaptions': False,
+                    'clipIndex': int(clip_id),
+                    'courseName': course_name,
+                    'locale': 'en',
+                    'moduleName': name,
+                    'mediaType': ext,
+                    'quality': '%dx%d' % (f['width'], f['height']),
                  }
-                request = sanitized_Request(
-                    '%s/training/Player/ViewClip' % self._API_BASE,
-                    json.dumps(clip_post).encode('utf-8'))
-                request.add_header('Content-Type', 'application/json;charset=utf-8')
                  format_id = '%s-%s' % (ext, quality)
-                clip_url = self._download_webpage(
-                    request, display_id, 'Downloading %s URL' % format_id, fatal=False)
+                viewclip = self._download_json(
+                    '%s/video/clips/viewclip' % self._API_BASE, display_id,
+                    'Downloading %s viewclip JSON' % format_id, fatal=False,
+                    data=json.dumps(clip_post).encode('utf-8'),
+                    headers={'Content-Type': 'application/json;charset=utf-8'})
  
                  # Pluralsight tracks multiple sequential calls to ViewClip API and start
                  # to return 429 HTTP errors after some time (see
@@ -214,29 +261,44 @@ class PluralsightIE(PluralsightBaseIE):
                      random.randint(2, 5), display_id,
                      '%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
  
-                if not clip_url:
+                if not viewclip:
                      continue
-                f.update({
-                    'url': clip_url,
-                    'ext': ext,
-                    'format_id': format_id,
-                    'quality': quality_key(quality),
-                })
-                formats.append(f)
+
+                clip_urls = viewclip.get('urls')
+                if not isinstance(clip_urls, list):
+                    continue
+
+                for clip_url_data in clip_urls:
+                    clip_url = clip_url_data.get('url')
+                    if not clip_url:
+                        continue
+                    cdn = clip_url_data.get('cdn')
+                    clip_f = f.copy()
+                    clip_f.update({
+                        'url': clip_url,
+                        'ext': ext,
+                        'format_id': '%s-%s' % (format_id, cdn) if cdn else format_id,
+                        'quality': quality_key(quality),
+                        'source_preference': int_or_none(clip_url_data.get('rank')),
+                    })
+                    formats.append(clip_f)
+
          self._sort_formats(formats)
  
-        # TODO: captions
-        # http://www.pluralsight.com/training/Player/ViewClip + cap = true
-        # or
-        # http://www.pluralsight.com/training/Player/Captions
-        # { a = author, cn = clip_id, lc = end, m = name }
+        duration = int_or_none(
+            clip.get('duration')) or parse_duration(clip.get('formattedDuration'))
+
+        # TODO: other languages?
+        subtitles = self.extract_subtitles(
+            author, clip_id, 'en', name, duration, display_id)
  
          return {
              'id': clip.get('clipName') or clip['name'],
-            'title': '%s - %s' % (module['title'], clip['title']),
-            'duration': int_or_none(clip.get('duration')) or parse_duration(clip.get('formattedDuration')),
+            'title': title,
+            'duration': duration,
              'creator': author,
-            'formats': formats
+            'formats': formats,
+            'subtitles': subtitles,
          }
  
  
diff --git a/youtube_dl/extractor/pokemon.py b/youtube_dl/extractor/pokemon.py

new file mode 100644 (file)

index 0000000..2d87e7e
--- /dev/null
+++ b/youtube_dl/extractor/pokemon.py
@@ -0,0 +1,58 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    extract_attributes,
+    int_or_none,
+)
+
+
+class PokemonIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?pokemon\.com/[a-z]{2}(?:.*?play=(?P<id>[a-z0-9]{32})|/[^/]+/\d+_\d+-(?P<display_id>[^/?#]+))'
+    _TESTS = [{
+        'url': 'http://www.pokemon.com/us/pokemon-episodes/19_01-from-a-to-z/?play=true',
+        'md5': '9fb209ae3a569aac25de0f5afc4ee08f',
+        'info_dict': {
+            'id': 'd0436c00c3ce4071ac6cee8130ac54a1',
+            'ext': 'mp4',
+            'title': 'From A to Z!',
+            'description': 'Bonnie makes a new friend, Ash runs into an old friend, and a terrifying premonition begins to unfold!',
+            'timestamp': 1460478136,
+            'upload_date': '20160412',
+        },
+        'add_id': ['LimelightMedia']
+    }, {
+        'url': 'http://www.pokemon.com/uk/pokemon-episodes/?play=2e8b5c761f1d4a9286165d7748c1ece2',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.pokemon.com/fr/episodes-pokemon/18_09-un-hiver-inattendu/',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.pokemon.com/de/pokemon-folgen/01_20-bye-bye-smettbo/',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id, display_id = re.match(self._VALID_URL, url).groups()
+        webpage = self._download_webpage(url, video_id or display_id)
+        video_data = extract_attributes(self._search_regex(
+            r'(<[^>]+data-video-id="%s"[^>]*>)' % (video_id if video_id else '[a-z0-9]{32}'),
+            webpage, 'video data element'))
+        video_id = video_data['data-video-id']
+        title = video_data['data-video-title']
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': 'limelight:media:%s' % video_id,
+            'title': title,
+            'description': video_data.get('data-video-summary'),
+            'thumbnail': video_data.get('data-video-poster'),
+            'series': 'Pokémon',
+            'season_number': int_or_none(video_data.get('data-video-season')),
+            'episode': title,
+            'episode_number': int_or_none(video_data.get('data-video-episode')),
+            'ie_key': 'LimelightMedia',
+        }
diff --git a/youtube_dl/extractor/polskieradio.py b/youtube_dl/extractor/polskieradio.py

new file mode 100644 (file)

index 0000000..5ff1737
--- /dev/null
+++ b/youtube_dl/extractor/polskieradio.py
@@ -0,0 +1,180 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import itertools
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_urllib_parse_unquote,
+    compat_urlparse
+)
+from ..utils import (
+    extract_attributes,
+    int_or_none,
+    strip_or_none,
+    unified_timestamp,
+)
+
+
+class PolskieRadioIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?polskieradio\.pl/\d+/\d+/Artykul/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://www.polskieradio.pl/7/5102/Artykul/1587943,Prof-Andrzej-Nowak-o-historii-nie-da-sie-myslec-beznamietnie',
+        'info_dict': {
+            'id': '1587943',
+            'title': 'Prof. Andrzej Nowak: o historii nie da się myśleć beznamiętnie',
+            'description': 'md5:12f954edbf3120c5e7075e17bf9fc5c5',
+        },
+        'playlist': [{
+            'md5': '2984ee6ce9046d91fc233bc1a864a09a',
+            'info_dict': {
+                'id': '1540576',
+                'ext': 'mp3',
+                'title': 'md5:d4623290d4ac983bf924061c75c23a0d',
+                'timestamp': 1456594200,
+                'upload_date': '20160227',
+                'duration': 2364,
+                'thumbnail': 're:^https?://static\.prsa\.pl/images/.*\.jpg$'
+            },
+        }],
+    }, {
+        'url': 'http://www.polskieradio.pl/265/5217/Artykul/1635803,Euro-2016-nie-ma-miejsca-na-blad-Polacy-graja-ze-Szwajcaria-o-cwiercfinal',
+        'info_dict': {
+            'id': '1635803',
+            'title': 'Euro 2016: nie ma miejsca na błąd. Polacy grają ze Szwajcarią o ćwierćfinał',
+            'description': 'md5:01cb7d0cad58664095d72b51a1ebada2',
+        },
+        'playlist_mincount': 12,
+    }, {
+        'url': 'http://polskieradio.pl/9/305/Artykul/1632955,Bardzo-popularne-slowo-remis',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.polskieradio.pl/7/5102/Artykul/1587943',
+        'only_matching': True,
+    }, {
+        # with mp4 video
+        'url': 'http://www.polskieradio.pl/9/299/Artykul/1634903,Brexit-Leszek-Miller-swiat-sie-nie-zawali-Europa-bedzie-trwac-dalej',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        content = self._search_regex(
+            r'(?s)<div[^>]+class="audio atarticle"[^>]*>(.+?)<script>',
+            webpage, 'content')
+
+        timestamp = unified_timestamp(self._html_search_regex(
+            r'(?s)<span[^>]+id="datetime2"[^>]*>(.+?)</span>',
+            webpage, 'timestamp', fatal=False))
+
+        thumbnail_url = self._og_search_thumbnail(webpage)
+
+        entries = []
+
+        media_urls = set()
+
+        for data_media in re.findall(r'<[^>]+data-media=({[^>]+})', content):
+            media = self._parse_json(data_media, playlist_id, fatal=False)
+            if not media.get('file') or not media.get('desc'):
+                continue
+            media_url = self._proto_relative_url(media['file'], 'http:')
+            if media_url in media_urls:
+                continue
+            media_urls.add(media_url)
+            entries.append({
+                'id': compat_str(media['id']),
+                'url': media_url,
+                'title': compat_urllib_parse_unquote(media['desc']),
+                'duration': int_or_none(media.get('length')),
+                'vcodec': 'none' if media.get('provider') == 'audio' else None,
+                'timestamp': timestamp,
+                'thumbnail': thumbnail_url
+            })
+
+        title = self._og_search_title(webpage).strip()
+        description = strip_or_none(self._og_search_description(webpage))
+
+        return self.playlist_result(entries, playlist_id, title, description)
+
+
+class PolskieRadioCategoryIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?polskieradio\.pl/\d+(?:,[^/]+)?/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.polskieradio.pl/7/5102,HISTORIA-ZYWA',
+        'info_dict': {
+            'id': '5102',
+            'title': 'HISTORIA ŻYWA',
+        },
+        'playlist_mincount': 38,
+    }, {
+        'url': 'http://www.polskieradio.pl/7/4807',
+        'info_dict': {
+            'id': '4807',
+            'title': 'Vademecum 1050. rocznicy Chrztu Polski'
+        },
+        'playlist_mincount': 5
+    }, {
+        'url': 'http://www.polskieradio.pl/7/129,Sygnaly-dnia?ref=source',
+        'only_matching': True
+    }, {
+        'url': 'http://www.polskieradio.pl/37,RedakcjaKatolicka/4143,Kierunek-Krakow',
+        'info_dict': {
+            'id': '4143',
+            'title': 'Kierunek Kraków',
+        },
+        'playlist_mincount': 61
+    }, {
+        'url': 'http://www.polskieradio.pl/10,czworka/214,muzyka',
+        'info_dict': {
+            'id': '214',
+            'title': 'Muzyka',
+        },
+        'playlist_mincount': 61
+    }, {
+        'url': 'http://www.polskieradio.pl/7,Jedynka/5102,HISTORIA-ZYWA',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.polskieradio.pl/8,Dwojka/196,Publicystyka',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if PolskieRadioIE.suitable(url) else super(PolskieRadioCategoryIE, cls).suitable(url)
+
+    def _entries(self, url, page, category_id):
+        content = page
+        for page_num in itertools.count(2):
+            for a_entry, entry_id in re.findall(
+                    r'(?s)<article[^>]+>.*?(<a[^>]+href=["\']/\d+/\d+/Artykul/(\d+)[^>]+>).*?</article>',
+                    content):
+                entry = extract_attributes(a_entry)
+                href = entry.get('href')
+                if not href:
+                    continue
+                yield self.url_result(
+                    compat_urlparse.urljoin(url, href), PolskieRadioIE.ie_key(),
+                    entry_id, entry.get('title'))
+            mobj = re.search(
+                r'<div[^>]+class=["\']next["\'][^>]*>\s*<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1',
+                content)
+            if not mobj:
+                break
+            next_url = compat_urlparse.urljoin(url, mobj.group('url'))
+            content = self._download_webpage(
+                next_url, category_id, 'Downloading page %s' % page_num)
+
+    def _real_extract(self, url):
+        category_id = self._match_id(url)
+        webpage = self._download_webpage(url, category_id)
+        title = self._html_search_regex(
+            r'<title>([^<]+) - [^<]+ - [^<]+</title>',
+            webpage, 'title', fatal=False)
+        return self.playlist_result(
+            self._entries(url, webpage, category_id),
+            category_id, title)
diff --git a/youtube_dl/extractor/porn91.py b/youtube_dl/extractor/porn91.py

index 9894f32620c1692830df023423ae02a6199121b1..073fc3e21db07f05deef1a337aca7685f62b4079 100644 (file)
--- a/youtube_dl/extractor/porn91.py
+++ b/youtube_dl/extractor/porn91.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from ..compat import (
diff --git a/youtube_dl/extractor/porncom.py b/youtube_dl/extractor/porncom.py

new file mode 100644 (file)

index 0000000..d85e029
--- /dev/null
+++ b/youtube_dl/extractor/porncom.py
@@ -0,0 +1,100 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    int_or_none,
+    js_to_json,
+    parse_filesize,
+    str_to_int,
+)
+
+
+class PornComIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:[a-zA-Z]+\.)?porn\.com/videos/(?:(?P<display_id>[^/]+)-)?(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.porn.com/videos/teen-grabs-a-dildo-and-fucks-her-pussy-live-on-1hottie-i-rec-2603339',
+        'md5': '3f30ce76267533cd12ba999263156de7',
+        'info_dict': {
+            'id': '2603339',
+            'display_id': 'teen-grabs-a-dildo-and-fucks-her-pussy-live-on-1hottie-i-rec',
+            'ext': 'mp4',
+            'title': 'Teen grabs a dildo and fucks her pussy live on 1hottie, I rec',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 551,
+            'view_count': int,
+            'age_limit': 18,
+            'categories': list,
+            'tags': list,
+        },
+    }, {
+        'url': 'http://se.porn.com/videos/marsha-may-rides-seth-on-top-of-his-thick-cock-2658067',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id') or video_id
+
+        webpage = self._download_webpage(url, display_id)
+
+        config = self._parse_json(
+            self._search_regex(
+                r'=\s*({.+?})\s*,\s*[\da-zA-Z_]+\s*=',
+                webpage, 'config', default='{}'),
+            display_id, transform_source=js_to_json, fatal=False)
+
+        if config:
+            title = config['title']
+            formats = [{
+                'url': stream['url'],
+                'format_id': stream.get('id'),
+                'height': int_or_none(self._search_regex(
+                    r'^(\d+)[pP]', stream.get('id') or '', 'height', default=None))
+            } for stream in config['streams'] if stream.get('url')]
+            thumbnail = (compat_urlparse.urljoin(
+                config['thumbCDN'], config['poster'])
+                if config.get('thumbCDN') and config.get('poster') else None)
+            duration = int_or_none(config.get('length'))
+        else:
+            title = self._search_regex(
+                (r'<title>([^<]+)</title>', r'<h1[^>]*>([^<]+)</h1>'),
+                webpage, 'title')
+            formats = [{
+                'url': compat_urlparse.urljoin(url, format_url),
+                'format_id': '%sp' % height,
+                'height': int(height),
+                'filesize_approx': parse_filesize(filesize),
+            } for format_url, height, filesize in re.findall(
+                r'<a[^>]+href="(/download/[^"]+)">MPEG4 (\d+)p<span[^>]*>(\d+\s+[a-zA-Z]+)<',
+                webpage)]
+            thumbnail = None
+            duration = None
+
+        self._sort_formats(formats)
+
+        view_count = str_to_int(self._search_regex(
+            r'class=["\']views["\'][^>]*><p>([\d,.]+)', webpage,
+            'view count', fatal=False))
+
+        def extract_list(kind):
+            s = self._search_regex(
+                r'(?s)<p[^>]*>%s:(.+?)</p>' % kind.capitalize(),
+                webpage, kind, fatal=False)
+            return re.findall(r'<a[^>]+>([^<]+)</a>', s or '')
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'view_count': view_count,
+            'formats': formats,
+            'age_limit': 18,
+            'categories': extract_list('categories'),
+            'tags': extract_list('tags'),
+        }
diff --git a/youtube_dl/extractor/pornhd.py b/youtube_dl/extractor/pornhd.py

index 39b53ecf68c77786f18956040bf7ccac4fd6dbc5..8df12eec0d44c371d99b536b55694cfd2211f9d0 100644 (file)
--- a/youtube_dl/extractor/pornhd.py
+++ b/youtube_dl/extractor/pornhd.py
@@ -1,19 +1,32 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
  from ..utils import (
+    ExtractorError,
      int_or_none,
      js_to_json,
-    qualities,
  )
  
  
  class PornHdIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<id>\d+)(?:/(?P<display_id>.+))?'
-    _TEST = {
+    _TESTS = [{
+        'url': 'http://www.pornhd.com/videos/9864/selfie-restroom-masturbation-fun-with-chubby-cutie-hd-porn-video',
+        'md5': 'c8b964b1f0a4b5f7f28ae3a5c9f86ad5',
+        'info_dict': {
+            'id': '9864',
+            'display_id': 'selfie-restroom-masturbation-fun-with-chubby-cutie-hd-porn-video',
+            'ext': 'mp4',
+            'title': 'Restroom selfie masturbation',
+            'description': 'md5:3748420395e03e31ac96857a8f125b2b',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'view_count': int,
+            'age_limit': 18,
+        }
+    }, {
+        # removed video
          'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
          'md5': '956b8ca569f7f4d8ec563e2c41598441',
          'info_dict': {
@@ -25,8 +38,9 @@ class PornHdIE(InfoExtractor):
              'thumbnail': 're:^https?://.*\.jpg',
              'view_count': int,
              'age_limit': 18,
-        }
-    }
+        },
+        'skip': 'Not available anymore',
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -38,28 +52,38 @@ class PornHdIE(InfoExtractor):
          title = self._html_search_regex(
              [r'<span[^>]+class=["\']video-name["\'][^>]*>([^<]+)',
               r'<title>(.+?) - .*?[Pp]ornHD.*?</title>'], webpage, 'title')
-        description = self._html_search_regex(
-            r'<div class="description">([^<]+)</div>', webpage, 'description', fatal=False)
-        view_count = int_or_none(self._html_search_regex(
-            r'(\d+) views\s*</span>', webpage, 'view count', fatal=False))
-        thumbnail = self._search_regex(
-            r"'poster'\s*:\s*'([^']+)'", webpage, 'thumbnail', fatal=False)
  
-        quality = qualities(['sd', 'hd'])
-        sources = json.loads(js_to_json(self._search_regex(
+        sources = self._parse_json(js_to_json(self._search_regex(
              r"(?s)'sources'\s*:\s*(\{.+?\})\s*\}[;,)]",
-            webpage, 'sources')))
+            webpage, 'sources', default='{}')), video_id)
+
+        if not sources:
+            message = self._html_search_regex(
+                r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
+                webpage, 'error message', group='value')
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
+
          formats = []
-        for qname, video_url in sources.items():
+        for format_id, video_url in sources.items():
              if not video_url:
                  continue
+            height = int_or_none(self._search_regex(
+                r'^(\d+)[pP]', format_id, 'height', default=None))
              formats.append({
                  'url': video_url,
-                'format_id': qname,
-                'quality': quality(qname),
+                'format_id': format_id,
+                'height': height,
              })
          self._sort_formats(formats)
  
+        description = self._html_search_regex(
+            r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1',
+            webpage, 'description', fatal=False, group='value')
+        view_count = int_or_none(self._html_search_regex(
+            r'(\d+) views\s*<', webpage, 'view count', fatal=False))
+        thumbnail = self._search_regex(
+            r"'poster'\s*:\s*'([^']+)'", webpage, 'thumbnail', fatal=False)
+
          return {
              'id': video_id,
              'display_id': display_id,
diff --git a/youtube_dl/extractor/pornhub.py b/youtube_dl/extractor/pornhub.py

index 407ea08d4350b52666150e2784652535625c5e31..40dbe6967fac2126b7bf6e6a1245768b3c039c8e 100644 (file)
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@@ -1,3 +1,4 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  import itertools
@@ -14,6 +15,7 @@ from ..compat import (
  from ..utils import (
      ExtractorError,
      int_or_none,
+    js_to_json,
      orderedSet,
      sanitized_Request,
      str_to_int,
@@ -24,7 +26,15 @@ from ..aes import (
  
  
  class PornHubIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:[a-z]+\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)(?P<id>[0-9a-z]+)'
+    IE_DESC = 'PornHub and Thumbzilla'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            (?:[a-z]+\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)|
+                            (?:www\.)?thumbzilla\.com/video/
+                        )
+                        (?P<id>[\da-z]+)
+                    '''
      _TESTS = [{
          'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
          'md5': '1e19b41231a02eba417839222ac9d58e',
@@ -39,21 +49,58 @@ class PornHubIE(InfoExtractor):
              'dislike_count': int,
              'comment_count': int,
              'age_limit': 18,
-        }
+            'tags': list,
+            'categories': list,
+        },
+    }, {
+        # non-ASCII title
+        'url': 'http://www.pornhub.com/view_video.php?viewkey=1331683002',
+        'info_dict': {
+            'id': '1331683002',
+            'ext': 'mp4',
+            'title': '重庆婷婷女王足交',
+            'uploader': 'cj397186295',
+            'duration': 1753,
+            'view_count': int,
+            'like_count': int,
+            'dislike_count': int,
+            'comment_count': int,
+            'age_limit': 18,
+            'tags': list,
+            'categories': list,
+        },
+        'params': {
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d',
          'only_matching': True,
      }, {
+        # removed at the request of cam4.com
          'url': 'http://fr.pornhub.com/view_video.php?viewkey=ph55ca2f9760862',
          'only_matching': True,
+    }, {
+        # removed at the request of the copyright owner
+        'url': 'http://www.pornhub.com/view_video.php?viewkey=788152859',
+        'only_matching': True,
+    }, {
+        # removed by uploader
+        'url': 'http://www.pornhub.com/view_video.php?viewkey=ph572716d15a111',
+        'only_matching': True,
+    }, {
+        # private video
+        'url': 'http://www.pornhub.com/view_video.php?viewkey=ph56fd731fce6b7',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.thumbzilla.com/video/ph56c6114abd99a/horny-girlfriend-sex',
+        'only_matching': True,
      }]
  
-    @classmethod
-    def _extract_url(cls, webpage):
-        mobj = re.search(
-            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?pornhub\.com/embed/\d+)\1', webpage)
-        if mobj:
-            return mobj.group('url')
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub\.com/embed/[\da-z]+)',
+            webpage)
  
      def _extract_count(self, pattern, webpage, name):
          return str_to_int(self._search_regex(
@@ -68,27 +115,33 @@ class PornHubIE(InfoExtractor):
          webpage = self._download_webpage(req, video_id)
  
          error_msg = self._html_search_regex(
-            r'(?s)<div class="userMessageSection[^"]*".*?>(.*?)</div>',
-            webpage, 'error message', default=None)
+            r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
+            webpage, 'error message', default=None, group='error')
          if error_msg:
              error_msg = re.sub(r'\s+', ' ', error_msg)
              raise ExtractorError(
                  'PornHub said: %s' % error_msg,
                  expected=True, video_id=video_id)
  
+        # video_title from flashvars contains whitespace instead of non-ASCII (see
+        # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
+        # on that anymore.
+        title = self._html_search_meta(
+            'twitter:title', webpage, default=None) or self._search_regex(
+            (r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
+             r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
+             r'shareTitle\s*=\s*(["\'])(?P<title>.+?)\1'),
+            webpage, 'title', group='title')
+
          flashvars = self._parse_json(
              self._search_regex(
                  r'var\s+flashvars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'),
              video_id)
          if flashvars:
-            video_title = flashvars.get('video_title')
              thumbnail = flashvars.get('image_url')
              duration = int_or_none(flashvars.get('video_duration'))
          else:
-            video_title, thumbnail, duration = [None] * 3
-
-        if not video_title:
-            video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
+            title, thumbnail, duration = [None] * 3
  
          video_uploader = self._html_search_regex(
              r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<',
@@ -134,10 +187,19 @@ class PornHubIE(InfoExtractor):
              })
          self._sort_formats(formats)
  
+        page_params = self._parse_json(self._search_regex(
+            r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
+            webpage, 'page parameters', group='data', default='{}'),
+            video_id, transform_source=js_to_json, fatal=False)
+        tags = categories = None
+        if page_params:
+            tags = page_params.get('tags', '').split(',')
+            categories = page_params.get('categories', '').split(',')
+
          return {
              'id': video_id,
              'uploader': video_uploader,
-            'title': video_title,
+            'title': title,
              'thumbnail': thumbnail,
              'duration': duration,
              'view_count': view_count,
@@ -146,6 +208,8 @@ class PornHubIE(InfoExtractor):
              'comment_count': comment_count,
              'formats': formats,
              'age_limit': 18,
+            'tags': tags,
+            'categories': categories,
          }
  
  
diff --git a/youtube_dl/extractor/pornotube.py b/youtube_dl/extractor/pornotube.py

index 5398e708b68337b76739282abf6c00e8a39745ab..63816c3588cebe889e77a24a928cc789ef07c7d5 100644 (file)
--- a/youtube_dl/extractor/pornotube.py
+++ b/youtube_dl/extractor/pornotube.py
@@ -3,10 +3,7 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    sanitized_Request,
-)
+from ..utils import int_or_none
  
  
  class PornotubeIE(InfoExtractor):
@@ -31,59 +28,55 @@ class PornotubeIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        # Fetch origin token
-        js_config = self._download_webpage(
-            'http://www.pornotube.com/assets/src/app/config.js', video_id,
-            note='Download JS config')
-        originAuthenticationSpaceKey = self._search_regex(
-            r"constant\('originAuthenticationSpaceKey',\s*'([^']+)'",
-            js_config, 'originAuthenticationSpaceKey')
+        token = self._download_json(
+            'https://api.aebn.net/auth/v2/origins/authenticate',
+            video_id, note='Downloading token',
+            data=json.dumps({'credentials': 'Clip Application'}).encode('utf-8'),
+            headers={
+                'Content-Type': 'application/json',
+                'Origin': 'http://www.pornotube.com',
+            })['tokenKey']
  
-        # Fetch actual token
-        token_req_data = {
-            'authenticationSpaceKey': originAuthenticationSpaceKey,
-            'credentials': 'Clip Application',
-        }
-        token_req = sanitized_Request(
-            'https://api.aebn.net/auth/v1/token/primal',
-            data=json.dumps(token_req_data).encode('utf-8'))
-        token_req.add_header('Content-Type', 'application/json')
-        token_req.add_header('Origin', 'http://www.pornotube.com')
-        token_answer = self._download_json(
-            token_req, video_id, note='Requesting primal token')
-        token = token_answer['tokenKey']
+        video_url = self._download_json(
+            'https://api.aebn.net/delivery/v1/clips/%s/MP4' % video_id,
+            video_id, note='Downloading delivery information',
+            headers={'Authorization': token})['mediaUrl']
  
-        # Get video URL
-        delivery_req = sanitized_Request(
-            'https://api.aebn.net/delivery/v1/clips/%s/MP4' % video_id)
-        delivery_req.add_header('Authorization', token)
-        delivery_info = self._download_json(
-            delivery_req, video_id, note='Downloading delivery information')
-        video_url = delivery_info['mediaUrl']
+        FIELDS = (
+            'title', 'description', 'startSecond', 'endSecond', 'publishDate',
+            'studios{name}', 'categories{name}', 'movieId', 'primaryImageNumber'
+        )
  
-        # Get additional info (title etc.)
-        info_req = sanitized_Request(
-            'https://api.aebn.net/content/v1/clips/%s?expand='
-            'title,description,primaryImageNumber,startSecond,endSecond,'
-            'movie.title,movie.MovieId,movie.boxCoverFront,movie.stars,'
-            'movie.studios,stars.name,studios.name,categories.name,'
-            'clipActive,movieActive,publishDate,orientations' % video_id)
-        info_req.add_header('Authorization', token)
          info = self._download_json(
-            info_req, video_id, note='Downloading metadata')
+            'https://api.aebn.net/content/v2/clips/%s?fields=%s'
+            % (video_id, ','.join(FIELDS)), video_id,
+            note='Downloading metadata',
+            headers={'Authorization': token})
+
+        if isinstance(info, list):
+            info = info[0]
+
+        title = info['title']
  
          timestamp = int_or_none(info.get('publishDate'), scale=1000)
          uploader = info.get('studios', [{}])[0].get('name')
-        movie_id = info['movie']['movieId']
-        thumbnail = 'http://pic.aebn.net/dis/t/%s/%s_%08d.jpg' % (
-            movie_id, movie_id, info['primaryImageNumber'])
-        categories = [c['name'] for c in info.get('categories')]
+        movie_id = info.get('movieId')
+        primary_image_number = info.get('primaryImageNumber')
+        thumbnail = None
+        if movie_id and primary_image_number:
+            thumbnail = 'http://pic.aebn.net/dis/t/%s/%s_%08d.jpg' % (
+                movie_id, movie_id, primary_image_number)
+        start = int_or_none(info.get('startSecond'))
+        end = int_or_none(info.get('endSecond'))
+        duration = end - start if start and end else None
+        categories = [c['name'] for c in info.get('categories', []) if c.get('name')]
  
          return {
              'id': video_id,
              'url': video_url,
-            'title': info['title'],
+            'title': title,
              'description': info.get('description'),
+            'duration': duration,
              'timestamp': timestamp,
              'uploader': uploader,
              'thumbnail': thumbnail,
diff --git a/youtube_dl/extractor/pornovoisines.py b/youtube_dl/extractor/pornovoisines.py

index 6b51e5c5400ee59859eb0d29cb740a31f34f3a96..58f557e3995f25a3787018150c953cb088e4fe81 100644 (file)
--- a/youtube_dl/extractor/pornovoisines.py
+++ b/youtube_dl/extractor/pornovoisines.py
@@ -2,7 +2,6 @@
  from __future__ import unicode_literals
  
  import re
-import random
  
  from .common import InfoExtractor
  from ..utils import (
@@ -13,61 +12,69 @@ from ..utils import (
  
  
  class PornoVoisinesIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?pornovoisines\.com/showvideo/(?P<id>\d+)/(?P<display_id>[^/]+)'
-
-    _VIDEO_URL_TEMPLATE = 'http://stream%d.pornovoisines.com' \
-        '/static/media/video/transcoded/%s-640x360-1000-trscded.mp4'
-
-    _SERVER_NUMBERS = (1, 2)
+    _VALID_URL = r'https?://(?:www\.)?pornovoisines\.com/videos/show/(?P<id>\d+)/(?P<display_id>[^/.]+)'
  
      _TEST = {
-        'url': 'http://www.pornovoisines.com/showvideo/1285/recherche-appartement/',
-        'md5': '5ac670803bc12e9e7f9f662ce64cf1d1',
+        'url': 'http://www.pornovoisines.com/videos/show/919/recherche-appartement.html',
+        'md5': '6f8aca6a058592ab49fe701c8ba8317b',
          'info_dict': {
-            'id': '1285',
+            'id': '919',
              'display_id': 'recherche-appartement',
              'ext': 'mp4',
              'title': 'Recherche appartement',
-            'description': 'md5:819ea0b785e2a04667a1a01cdc89594e',
+            'description': 'md5:fe10cb92ae2dd3ed94bb4080d11ff493',
              'thumbnail': 're:^https?://.*\.jpg$',
              'upload_date': '20140925',
              'duration': 120,
              'view_count': int,
              'average_rating': float,
-            'categories': ['Débutantes', 'Scénario', 'Sodomie'],
+            'categories': ['Débutante', 'Débutantes', 'Scénario', 'Sodomie'],
              'age_limit': 18,
+            'subtitles': {
+                'fr': [{
+                    'ext': 'vtt',
+                }]
+            },
          }
      }
  
-    @classmethod
-    def build_video_url(cls, num):
-        return cls._VIDEO_URL_TEMPLATE % (random.choice(cls._SERVER_NUMBERS), num)
-
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
          display_id = mobj.group('display_id')
  
-        webpage = self._download_webpage(url, video_id)
+        settings_url = self._download_json(
+            'http://www.pornovoisines.com/api/video/%s/getsettingsurl/' % video_id,
+            video_id, note='Getting settings URL')['video_settings_url']
+        settings = self._download_json(settings_url, video_id)['data']
+
+        formats = []
+        for kind, data in settings['variants'].items():
+            if kind == 'HLS':
+                formats.extend(self._extract_m3u8_formats(
+                    data, video_id, ext='mp4', entry_protocol='m3u8_native', m3u8_id='hls'))
+            elif kind == 'MP4':
+                for item in data:
+                    formats.append({
+                        'url': item['url'],
+                        'height': item.get('height'),
+                        'bitrate': item.get('bitrate'),
+                    })
+        self._sort_formats(formats)
  
-        video_url = self.build_video_url(video_id)
+        webpage = self._download_webpage(url, video_id)
  
-        title = self._html_search_regex(
-            r'<h1>(.+?)</h1>', webpage, 'title', flags=re.DOTALL)
-        description = self._html_search_regex(
-            r'<article id="descriptif">(.+?)</article>',
-            webpage, 'description', fatal=False, flags=re.DOTALL)
+        title = self._og_search_title(webpage)
+        description = self._og_search_description(webpage)
  
-        thumbnail = self._search_regex(
-            r'<div id="mediaspace%s">\s*<img src="/?([^"]+)"' % video_id,
-            webpage, 'thumbnail', fatal=False)
-        if thumbnail:
-            thumbnail = 'http://www.pornovoisines.com/%s' % thumbnail
+        # The webpage has a bug - there's no space between "thumb" and src=
+        thumbnail = self._html_search_regex(
+            r'<img[^>]+class=([\'"])thumb\1[^>]*src=([\'"])(?P<url>[^"]+)\2',
+            webpage, 'thumbnail', fatal=False, group='url')
  
          upload_date = unified_strdate(self._search_regex(
-            r'Publié le ([\d-]+)', webpage, 'upload date', fatal=False))
-        duration = int_or_none(self._search_regex(
-            'Durée (\d+)', webpage, 'duration', fatal=False))
+            r'Le\s*<b>([\d/]+)', webpage, 'upload date', fatal=False))
+        duration = settings.get('main', {}).get('duration')
          view_count = int_or_none(self._search_regex(
              r'(\d+) vues', webpage, 'view count', fatal=False))
          average_rating = self._search_regex(
@@ -75,15 +82,19 @@ class PornoVoisinesIE(InfoExtractor):
          if average_rating:
              average_rating = float_or_none(average_rating.replace(',', '.'))
  
-        categories = self._html_search_meta(
-            'keywords', webpage, 'categories', fatal=False)
+        categories = self._html_search_regex(
+            r'(?s)Catégories\s*:\s*<b>(.+?)</b>', webpage, 'categories', fatal=False)
          if categories:
              categories = [category.strip() for category in categories.split(',')]
  
+        subtitles = {'fr': [{
+            'url': subtitle,
+        } for subtitle in settings.get('main', {}).get('vtt_tracks', {}).values()]}
+
          return {
              'id': video_id,
              'display_id': display_id,
-            'url': video_url,
+            'formats': formats,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
@@ -93,4 +104,5 @@ class PornoVoisinesIE(InfoExtractor):
              'average_rating': average_rating,
              'categories': categories,
              'age_limit': 18,
+            'subtitles': subtitles,
          }
diff --git a/youtube_dl/extractor/pornoxo.py b/youtube_dl/extractor/pornoxo.py

index 202f58673ae4f1dd77caee159f37dc24be9aad64..3c9087f2dfe3caa30c879f4905e857a046fd789c 100644 (file)
--- a/youtube_dl/extractor/pornoxo.py
+++ b/youtube_dl/extractor/pornoxo.py
@@ -2,13 +2,13 @@ from __future__ import unicode_literals
  
  import re
  
-from .common import InfoExtractor
+from .jwplatform import JWPlatformBaseIE
  from ..utils import (
      str_to_int,
  )
  
  
-class PornoXOIE(InfoExtractor):
+class PornoXOIE(JWPlatformBaseIE):
      _VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html'
      _TEST = {
          'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html',
@@ -17,7 +17,8 @@ class PornoXOIE(InfoExtractor):
              'id': '7564',
              'ext': 'flv',
              'title': 'Striptease From Sexy Secretary!',
-            'description': 'Striptease From Sexy Secretary!',
+            'display_id': 'striptease-from-sexy-secretary',
+            'description': 'md5:0ee35252b685b3883f4a1d38332f9980',
              'categories': list,  # NSFW
              'thumbnail': 're:https?://.*\.jpg$',
              'age_limit': 18,
@@ -26,23 +27,14 @@ class PornoXOIE(InfoExtractor):
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id, display_id = mobj.groups()
  
          webpage = self._download_webpage(url, video_id)
-
-        video_url = self._html_search_regex(
-            r'\'file\'\s*:\s*"([^"]+)"', webpage, 'video_url')
+        video_data = self._extract_jwplayer_data(webpage, video_id, require_title=False)
  
          title = self._html_search_regex(
              r'<title>([^<]+)\s*-\s*PornoXO', webpage, 'title')
  
-        description = self._html_search_regex(
-            r'<meta name="description" content="([^"]+)\s*featuring',
-            webpage, 'description', fatal=False)
-
-        thumbnail = self._html_search_regex(
-            r'\'image\'\s*:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
-
          view_count = str_to_int(self._html_search_regex(
              r'[vV]iews:\s*([0-9,]+)', webpage, 'view count', fatal=False))
  
@@ -53,13 +45,14 @@ class PornoXOIE(InfoExtractor):
              None if categories_str is None
              else categories_str.split(','))
  
-        return {
+        video_data.update({
              'id': video_id,
-            'url': video_url,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
+            'display_id': display_id,
+            'description': self._html_search_meta('description', webpage),
              'categories': categories,
              'view_count': view_count,
              'age_limit': 18,
-        }
+        })
+
+        return video_data
diff --git a/youtube_dl/extractor/presstv.py b/youtube_dl/extractor/presstv.py

new file mode 100644 (file)

index 0000000..2da93ed
--- /dev/null
+++ b/youtube_dl/extractor/presstv.py
@@ -0,0 +1,74 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import remove_start
+
+
+class PressTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?presstv\.ir/[^/]+/(?P<y>\d+)/(?P<m>\d+)/(?P<d>\d+)/(?P<id>\d+)/(?P<display_id>[^/]+)?'
+
+    _TEST = {
+        'url': 'http://www.presstv.ir/Detail/2016/04/09/459911/Australian-sewerage-treatment-facility-/',
+        'md5': '5d7e3195a447cb13e9267e931d8dd5a5',
+        'info_dict': {
+            'id': '459911',
+            'display_id': 'Australian-sewerage-treatment-facility-',
+            'ext': 'mp4',
+            'title': 'Organic mattresses used to clean waste water',
+            'upload_date': '20160409',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'description': 'md5:20002e654bbafb6908395a5c0cfcd125'
+        }
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id') or video_id
+
+        webpage = self._download_webpage(url, display_id)
+
+        # extract video URL from webpage
+        video_url = self._hidden_inputs(webpage)['inpPlayback']
+
+        # build list of available formats
+        # specified in http://www.presstv.ir/Scripts/playback.js
+        base_url = 'http://192.99.219.222:82/presstv'
+        _formats = [
+            (180, '_low200.mp4'),
+            (360, '_low400.mp4'),
+            (720, '_low800.mp4'),
+            (1080, '.mp4')
+        ]
+
+        formats = [{
+            'url': base_url + video_url[:-4] + extension,
+            'format_id': '%dp' % height,
+            'height': height,
+        } for height, extension in _formats]
+
+        # extract video metadata
+        title = remove_start(
+            self._html_search_meta('title', webpage, fatal=True), 'PressTV-')
+
+        thumbnail = self._og_search_thumbnail(webpage)
+        description = self._og_search_description(webpage)
+
+        upload_date = '%04d%02d%02d' % (
+            int(mobj.group('y')),
+            int(mobj.group('m')),
+            int(mobj.group('d')),
+        )
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'formats': formats,
+            'thumbnail': thumbnail,
+            'upload_date': upload_date,
+            'description': description
+        }
diff --git a/youtube_dl/extractor/promptfile.py b/youtube_dl/extractor/promptfile.py

index f93bd19ff6dde40c87672b4fd18a3f1aab11382e..d40cca06f989b7c99329e1650497a06e9a6390e4 100644 (file)
--- a/youtube_dl/extractor/promptfile.py
+++ b/youtube_dl/extractor/promptfile.py
@@ -7,7 +7,6 @@ from .common import InfoExtractor
  from ..utils import (
      determine_ext,
      ExtractorError,
-    sanitized_Request,
      urlencode_postdata,
  )
  
@@ -15,12 +14,12 @@ from ..utils import (
  class PromptFileIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?promptfile\.com/l/(?P<id>[0-9A-Z\-]+)'
      _TEST = {
-        'url': 'http://www.promptfile.com/l/D21B4746E9-F01462F0FF',
-        'md5': 'd1451b6302da7215485837aaea882c4c',
+        'url': 'http://www.promptfile.com/l/86D1CE8462-576CAAE416',
+        'md5': '5a7e285a26e0d66d9a263fae91bc92ce',
          'info_dict': {
-            'id': 'D21B4746E9-F01462F0FF',
+            'id': '86D1CE8462-576CAAE416',
              'ext': 'mp4',
-            'title': 'Birds.mp4',
+            'title': 'oceans.mp4',
              'thumbnail': 're:^https?://.*\.jpg$',
          }
      }
@@ -33,14 +32,23 @@ class PromptFileIE(InfoExtractor):
              raise ExtractorError('Video %s does not exist' % video_id,
                                   expected=True)
  
+        chash = self._search_regex(
+            r'val\("([^"]*)"\s*\+\s*\$\("#chash"\)', webpage, 'chash')
          fields = self._hidden_inputs(webpage)
-        post = urlencode_postdata(fields)
-        req = sanitized_Request(url, post)
-        req.add_header('Content-type', 'application/x-www-form-urlencoded')
+        keys = list(fields.keys())
+        chash_key = keys[0] if len(keys) == 1 else next(
+            key for key in keys if key.startswith('cha'))
+        fields[chash_key] = chash + fields[chash_key]
+
          webpage = self._download_webpage(
-            req, video_id, 'Downloading video page')
+            url, video_id, 'Downloading video page',
+            data=urlencode_postdata(fields),
+            headers={'Content-type': 'application/x-www-form-urlencoded'})
  
-        url = self._html_search_regex(r'url:\s*\'([^\']+)\'', webpage, 'URL')
+        video_url = self._search_regex(
+            (r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>\s*Download File',
+             r'<a[^>]+href=(["\'])(?P<url>https?://(?:www\.)?promptfile\.com/file/(?:(?!\1).)+)\1'),
+            webpage, 'video url', group='url')
          title = self._html_search_regex(
              r'<span.+title="([^"]+)">', webpage, 'title')
          thumbnail = self._html_search_regex(
@@ -49,7 +57,7 @@ class PromptFileIE(InfoExtractor):
  
          formats = [{
              'format_id': 'sd',
-            'url': url,
+            'url': video_url,
              'ext': determine_ext(title),
          }]
          self._sort_formats(formats)
diff --git a/youtube_dl/extractor/prosiebensat1.py b/youtube_dl/extractor/prosiebensat1.py

index 07d49d489d6779b0f6bb7bd12bc610497c576c2e..7cc07a2ad5b88c51aa9f5d339839fd743727e17e 100644 (file)
--- a/youtube_dl/extractor/prosiebensat1.py
+++ b/youtube_dl/extractor/prosiebensat1.py
@@ -1,11 +1,11 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from hashlib import sha1
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_urlencode
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
      determine_ext,
@@ -15,10 +15,124 @@ from ..utils import (
  )
  
  
-class ProSiebenSat1IE(InfoExtractor):
+class ProSiebenSat1BaseIE(InfoExtractor):
+    def _extract_video_info(self, url, clip_id):
+        client_location = url
+
+        video = self._download_json(
+            'http://vas.sim-technik.de/vas/live/v2/videos',
+            clip_id, 'Downloading videos JSON', query={
+                'access_token': self._TOKEN,
+                'client_location': client_location,
+                'client_name': self._CLIENT_NAME,
+                'ids': clip_id,
+            })[0]
+
+        if video.get('is_protected') is True:
+            raise ExtractorError('This video is DRM protected.', expected=True)
+
+        duration = float_or_none(video.get('duration'))
+        source_ids = [compat_str(source['id']) for source in video['sources']]
+
+        client_id = self._SALT[:2] + sha1(''.join([clip_id, self._SALT, self._TOKEN, client_location, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
+
+        sources = self._download_json(
+            'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources' % clip_id,
+            clip_id, 'Downloading sources JSON', query={
+                'access_token': self._TOKEN,
+                'client_id': client_id,
+                'client_location': client_location,
+                'client_name': self._CLIENT_NAME,
+            })
+        server_id = sources['server_id']
+
+        def fix_bitrate(bitrate):
+            bitrate = int_or_none(bitrate)
+            if not bitrate:
+                return None
+            return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
+
+        formats = []
+        for source_id in source_ids:
+            client_id = self._SALT[:2] + sha1(''.join([self._SALT, clip_id, self._TOKEN, server_id, client_location, source_id, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
+            urls = self._download_json(
+                'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url' % clip_id,
+                clip_id, 'Downloading urls JSON', fatal=False, query={
+                    'access_token': self._TOKEN,
+                    'client_id': client_id,
+                    'client_location': client_location,
+                    'client_name': self._CLIENT_NAME,
+                    'server_id': server_id,
+                    'source_ids': source_id,
+                })
+            if not urls:
+                continue
+            if urls.get('status_code') != 0:
+                raise ExtractorError('This video is unavailable', expected=True)
+            urls_sources = urls['sources']
+            if isinstance(urls_sources, dict):
+                urls_sources = urls_sources.values()
+            for source in urls_sources:
+                source_url = source.get('url')
+                if not source_url:
+                    continue
+                protocol = source.get('protocol')
+                mimetype = source.get('mimetype')
+                if mimetype == 'application/f4m+xml' or 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
+                    formats.extend(self._extract_f4m_formats(
+                        source_url, clip_id, f4m_id='hds', fatal=False))
+                elif mimetype == 'application/x-mpegURL':
+                    formats.extend(self._extract_m3u8_formats(
+                        source_url, clip_id, 'mp4', 'm3u8_native',
+                        m3u8_id='hls', fatal=False))
+                else:
+                    tbr = fix_bitrate(source['bitrate'])
+                    if protocol in ('rtmp', 'rtmpe'):
+                        mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
+                        if not mobj:
+                            continue
+                        path = mobj.group('path')
+                        mp4colon_index = path.rfind('mp4:')
+                        app = path[:mp4colon_index]
+                        play_path = path[mp4colon_index:]
+                        formats.append({
+                            'url': '%s/%s' % (mobj.group('url'), app),
+                            'app': app,
+                            'play_path': play_path,
+                            'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
+                            'page_url': 'http://www.prosieben.de',
+                            'tbr': tbr,
+                            'ext': 'flv',
+                            'format_id': 'rtmp%s' % ('-%d' % tbr if tbr else ''),
+                        })
+                    else:
+                        formats.append({
+                            'url': source_url,
+                            'tbr': tbr,
+                            'format_id': 'http%s' % ('-%d' % tbr if tbr else ''),
+                        })
+        self._sort_formats(formats)
+
+        return {
+            'duration': duration,
+            'formats': formats,
+        }
+
+
+class ProSiebenSat1IE(ProSiebenSat1BaseIE):
      IE_NAME = 'prosiebensat1'
      IE_DESC = 'ProSiebenSat.1 Digital'
-    _VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany|7tv)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        (?:
+                            (?:
+                                prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
+                            )\.(?:de|at|ch)|
+                            ran\.de|fem\.com|advopedia\.de
+                        )
+                        /(?P<id>.+)
+                    '''
  
      _TESTS = [
          {
@@ -71,6 +185,7 @@ class ProSiebenSat1IE(InfoExtractor):
                  # rtmp download
                  'skip_download': True,
              },
+            'skip': 'This video is unavailable',
          },
          {
              'url': 'http://www.sixx.de/stars-style/video/sexy-laufen-in-ugg-boots-clip',
@@ -86,6 +201,7 @@ class ProSiebenSat1IE(InfoExtractor):
                  # rtmp download
                  'skip_download': True,
              },
+            'skip': 'This video is unavailable',
          },
          {
              'url': 'http://www.sat1.de/film/der-ruecktritt/video/im-interview-kai-wiesinger-clip',
@@ -101,6 +217,7 @@ class ProSiebenSat1IE(InfoExtractor):
                  # rtmp download
                  'skip_download': True,
              },
+            'skip': 'This video is unavailable',
          },
          {
              'url': 'http://www.kabeleins.de/tv/rosins-restaurants/videos/jagd-auf-fertigkost-im-elsthal-teil-2-ganze-folge',
@@ -116,6 +233,7 @@ class ProSiebenSat1IE(InfoExtractor):
                  # rtmp download
                  'skip_download': True,
              },
+            'skip': 'This video is unavailable',
          },
          {
              'url': 'http://www.ran.de/fussball/bundesliga/video/schalke-toennies-moechte-raul-zurueck-ganze-folge',
@@ -131,6 +249,7 @@ class ProSiebenSat1IE(InfoExtractor):
                  # rtmp download
                  'skip_download': True,
              },
+            'skip': 'This video is unavailable',
          },
          {
              'url': 'http://www.the-voice-of-germany.de/video/31-andreas-kuemmert-rocket-man-clip',
@@ -181,8 +300,29 @@ class ProSiebenSat1IE(InfoExtractor):
                  'skip_download': True,
              },
          },
+        {
+            # geo restricted to Germany
+            'url': 'http://www.kabeleinsdoku.de/tv/mayday-alarm-im-cockpit/video/102-notlandung-im-hudson-river-ganze-folge',
+            'only_matching': True,
+        },
+        {
+            # geo restricted to Germany
+            'url': 'http://www.sat1gold.de/tv/edel-starck/video/11-staffel-1-episode-1-partner-wider-willen-ganze-folge',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://www.sat1gold.de/tv/edel-starck/playlist/die-gesamte-1-staffel',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://www.advopedia.de/videos/lenssen-klaert-auf/lenssen-klaert-auf-folge-8-staffel-3-feiertage-und-freie-tage',
+            'only_matching': True,
+        },
      ]
  
+    _TOKEN = 'prosieben'
+    _SALT = '01!8d8F_)r9]4s[qeuXfP%'
+    _CLIENT_NAME = 'kolibri-2.0.19-splec4'
      _CLIPID_REGEXES = [
          r'"clip_id"\s*:\s+"(\d+)"',
          r'clipid: "(\d+)"',
@@ -227,134 +367,50 @@ class ProSiebenSat1IE(InfoExtractor):
      ]
  
      def _extract_clip(self, url, webpage):
-        clip_id = self._html_search_regex(self._CLIPID_REGEXES, webpage, 'clip id')
-
-        access_token = 'prosieben'
-        client_name = 'kolibri-2.0.19-splec4'
-        client_location = url
-
-        videos_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos?%s' % compat_urllib_parse_urlencode({
-            'access_token': access_token,
-            'client_location': client_location,
-            'client_name': client_name,
-            'ids': clip_id,
-        })
-
-        video = self._download_json(videos_api_url, clip_id, 'Downloading videos JSON')[0]
-
-        if video.get('is_protected') is True:
-            raise ExtractorError('This video is DRM protected.', expected=True)
-
-        duration = float_or_none(video.get('duration'))
-        source_ids = [source['id'] for source in video['sources']]
-        source_ids_str = ','.join(map(str, source_ids))
-
-        g = '01!8d8F_)r9]4s[qeuXfP%'
-
-        client_id = g[:2] + sha1(''.join([clip_id, g, access_token, client_location, g, client_name])
-                                 .encode('utf-8')).hexdigest()
-
-        sources_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources?%s' % (clip_id, compat_urllib_parse_urlencode({
-            'access_token': access_token,
-            'client_id': client_id,
-            'client_location': client_location,
-            'client_name': client_name,
-        }))
-
-        sources = self._download_json(sources_api_url, clip_id, 'Downloading sources JSON')
-        server_id = sources['server_id']
-
-        client_id = g[:2] + sha1(''.join([g, clip_id, access_token, server_id,
-                                          client_location, source_ids_str, g, client_name])
-                                 .encode('utf-8')).hexdigest()
-
-        url_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url?%s' % (clip_id, compat_urllib_parse_urlencode({
-            'access_token': access_token,
-            'client_id': client_id,
-            'client_location': client_location,
-            'client_name': client_name,
-            'server_id': server_id,
-            'source_ids': source_ids_str,
-        }))
-
-        urls = self._download_json(url_api_url, clip_id, 'Downloading urls JSON')
-
+        clip_id = self._html_search_regex(
+            self._CLIPID_REGEXES, webpage, 'clip id')
          title = self._html_search_regex(self._TITLE_REGEXES, webpage, 'title')
-        description = self._html_search_regex(self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
+        info = self._extract_video_info(url, clip_id)
+        description = self._html_search_regex(
+            self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
          thumbnail = self._og_search_thumbnail(webpage)
-
          upload_date = unified_strdate(self._html_search_regex(
              self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
  
-        formats = []
-
-        urls_sources = urls['sources']
-        if isinstance(urls_sources, dict):
-            urls_sources = urls_sources.values()
-
-        def fix_bitrate(bitrate):
-            bitrate = int_or_none(bitrate)
-            if not bitrate:
-                return None
-            return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
-
-        for source in urls_sources:
-            protocol = source['protocol']
-            source_url = source['url']
-            if protocol == 'rtmp' or protocol == 'rtmpe':
-                mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
-                if not mobj:
-                    continue
-                path = mobj.group('path')
-                mp4colon_index = path.rfind('mp4:')
-                app = path[:mp4colon_index]
-                play_path = path[mp4colon_index:]
-                formats.append({
-                    'url': '%s/%s' % (mobj.group('url'), app),
-                    'app': app,
-                    'play_path': play_path,
-                    'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
-                    'page_url': 'http://www.prosieben.de',
-                    'vbr': fix_bitrate(source['bitrate']),
-                    'ext': 'mp4',
-                    'format_id': '%s_%s' % (source['cdn'], source['bitrate']),
-                })
-            elif 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
-                formats.extend(self._extract_f4m_formats(source_url, clip_id))
-            else:
-                formats.append({
-                    'url': source_url,
-                    'vbr': fix_bitrate(source['bitrate']),
-                })
-
-        self._sort_formats(formats)
-
-        return {
+        info.update({
              'id': clip_id,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
              'upload_date': upload_date,
-            'duration': duration,
-            'formats': formats,
-        }
+        })
+        return info
  
      def _extract_playlist(self, url, webpage):
          playlist_id = self._html_search_regex(
              self._PLAYLIST_ID_REGEXES, webpage, 'playlist id')
-        for regex in self._PLAYLIST_CLIP_REGEXES:
-            playlist_clips = re.findall(regex, webpage)
-            if playlist_clips:
-                title = self._html_search_regex(
-                    self._TITLE_REGEXES, webpage, 'title')
-                description = self._html_search_regex(
-                    self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
-                entries = [
-                    self.url_result(
-                        re.match('(.+?//.+?)/', url).group(1) + clip_path,
-                        'ProSiebenSat1')
-                    for clip_path in playlist_clips]
-                return self.playlist_result(entries, playlist_id, title, description)
+        playlist = self._parse_json(
+            self._search_regex(
+                'var\s+contentResources\s*=\s*(\[.+?\]);\s*</script',
+                webpage, 'playlist'),
+            playlist_id)
+        entries = []
+        for item in playlist:
+            clip_id = item.get('id') or item.get('upc')
+            if not clip_id:
+                continue
+            info = self._extract_video_info(url, clip_id)
+            info.update({
+                'id': clip_id,
+                'title': item.get('title') or item.get('teaser', {}).get('headline'),
+                'description': item.get('teaser', {}).get('description'),
+                'thumbnail': item.get('poster'),
+                'duration': float_or_none(item.get('duration')),
+                'series': item.get('tvShowTitle'),
+                'uploader': item.get('broadcastPublisher'),
+            })
+            entries.append(info)
+        return self.playlist_result(entries, playlist_id)
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/puls4.py b/youtube_dl/extractor/puls4.py

index cce84b9e4d95e53731f01d334830faac9f1e008d..1c54af0022f087788d6bb11a25639f1a184b42b8 100644 (file)
--- a/youtube_dl/extractor/puls4.py
+++ b/youtube_dl/extractor/puls4.py
@@ -1,88 +1,51 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
+from .prosiebensat1 import ProSiebenSat1BaseIE
  from ..utils import (
-    ExtractorError,
      unified_strdate,
-    int_or_none,
+    parse_duration,
+    compat_str,
  )
  
  
-class Puls4IE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?puls4\.com/video/[^/]+/play/(?P<id>[0-9]+)'
+class Puls4IE(ProSiebenSat1BaseIE):
+    _VALID_URL = r'https?://(?:www\.)?puls4\.com/(?P<id>(?:[^/]+/)*?videos/[^?#]+)'
      _TESTS = [{
-        'url': 'http://www.puls4.com/video/pro-und-contra/play/2716816',
-        'md5': '49f6a6629747eeec43cef6a46b5df81d',
+        'url': 'http://www.puls4.com/2-minuten-2-millionen/staffel-3/videos/2min2miotalk/Tobias-Homberger-von-myclubs-im-2min2miotalk-118118',
+        'md5': 'fd3c6b0903ac72c9d004f04bc6bb3e03',
          'info_dict': {
-            'id': '2716816',
-            'ext': 'mp4',
-            'title': 'Pro und Contra vom 23.02.2015',
-            'description': 'md5:293e44634d9477a67122489994675db6',
-            'duration': 2989,
-            'upload_date': '20150224',
+            'id': '118118',
+            'ext': 'flv',
+            'title': 'Tobias Homberger von myclubs im #2min2miotalk',
+            'description': 'md5:f9def7c5e8745d6026d8885487d91955',
+            'upload_date': '20160830',
              'uploader': 'PULS_4',
          },
-        'skip': 'Only works from Germany',
-    }, {
-        'url': 'http://www.puls4.com/video/kult-spielfilme/play/1298106',
-        'md5': '6a48316c8903ece8dab9b9a7bf7a59ec',
-        'info_dict': {
-            'id': '1298106',
-            'ext': 'mp4',
-            'title': 'Lucky Fritz',
-        },
-        'skip': 'Only works from Germany',
      }]
+    _TOKEN = 'puls4'
+    _SALT = '01!kaNgaiNgah1Ie4AeSha'
+    _CLIENT_NAME = ''
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        error_message = self._html_search_regex(
-            r'<div class="message-error">(.+?)</div>',
-            webpage, 'error message', default=None)
-        if error_message:
-            raise ExtractorError(
-                '%s returned error: %s' % (self.IE_NAME, error_message), expected=True)
-
-        real_url = self._html_search_regex(
-            r'\"fsk-button\".+?href=\"([^"]+)',
-            webpage, 'fsk_button', default=None)
-        if real_url:
-            webpage = self._download_webpage(real_url, video_id)
-
-        player = self._search_regex(
-            r'p4_video_player(?:_iframe)?\("video_\d+_container"\s*,(.+?)\);\s*\}',
-            webpage, 'player')
-
-        player_json = self._parse_json(
-            '[%s]' % player, video_id,
-            transform_source=lambda s: s.replace('undefined,', ''))
-
-        formats = None
-        result = None
-
-        for v in player_json:
-            if isinstance(v, list) and not formats:
-                formats = [{
-                    'url': f['url'],
-                    'format': 'hd' if f.get('hd') else 'sd',
-                    'width': int_or_none(f.get('size_x')),
-                    'height': int_or_none(f.get('size_y')),
-                    'tbr': int_or_none(f.get('bitrate')),
-                } for f in v]
-                self._sort_formats(formats)
-            elif isinstance(v, dict) and not result:
-                result = {
-                    'id': video_id,
-                    'title': v['videopartname'].strip(),
-                    'description': v.get('videotitle'),
-                    'duration': int_or_none(v.get('videoduration') or v.get('episodeduration')),
-                    'upload_date': unified_strdate(v.get('clipreleasetime')),
-                    'uploader': v.get('channel'),
-                }
-
-        result['formats'] = formats
-
-        return result
+        path = self._match_id(url)
+        content_path = self._download_json(
+            'http://www.puls4.com/api/json-fe/page/' + path, path)['content'][0]['url']
+        media = self._download_json(
+            'http://www.puls4.com' + content_path,
+            content_path)['mediaCurrent']
+        player_content = media['playerContent']
+        info = self._extract_video_info(url, player_content['id'])
+        info.update({
+            'id': compat_str(media['objectId']),
+            'title': player_content['title'],
+            'description': media.get('description'),
+            'thumbnail': media.get('previewLink'),
+            'upload_date': unified_strdate(media.get('date')),
+            'duration': parse_duration(player_content.get('duration')),
+            'episode': player_content.get('episodePartName'),
+            'show': media.get('channel'),
+            'season_id': player_content.get('seasonId'),
+            'uploader': player_content.get('sourceCompany'),
+        })
+        return info
diff --git a/youtube_dl/extractor/pyvideo.py b/youtube_dl/extractor/pyvideo.py

index cc0416cb81eb23ed87d1dae0cdf2573a6df8936a..b8ac93a62c4157ae51335efcadf83ca363272f19 100644 (file)
--- a/youtube_dl/extractor/pyvideo.py
+++ b/youtube_dl/extractor/pyvideo.py
@@ -1,59 +1,72 @@
  from __future__ import unicode_literals
  
  import re
-import os
  
  from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import int_or_none
  
  
  class PyvideoIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?pyvideo\.org/video/(?P<id>\d+)/(.*)'
-
-    _TESTS = [
-        {
-            'url': 'http://pyvideo.org/video/1737/become-a-logging-expert-in-30-minutes',
-            'md5': '520915673e53a5c5d487c36e0c4d85b5',
-            'info_dict': {
-                'id': '24_4WWkSmNo',
-                'ext': 'webm',
-                'title': 'Become a logging expert in 30 minutes',
-                'description': 'md5:9665350d466c67fb5b1598de379021f7',
-                'upload_date': '20130320',
-                'uploader': 'Next Day Video',
-                'uploader_id': 'NextDayVideo',
-            },
-            'add_ie': ['Youtube'],
+    _VALID_URL = r'https?://(?:www\.)?pyvideo\.org/(?P<category>[^/]+)/(?P<id>[^/?#&.]+)'
+
+    _TESTS = [{
+        'url': 'http://pyvideo.org/pycon-us-2013/become-a-logging-expert-in-30-minutes.html',
+        'info_dict': {
+            'id': 'become-a-logging-expert-in-30-minutes',
          },
-        {
-            'url': 'http://pyvideo.org/video/2542/gloriajw-spotifywitherikbernhardsson182m4v',
-            'md5': '5fe1c7e0a8aa5570330784c847ff6d12',
-            'info_dict': {
-                'id': '2542',
-                'ext': 'm4v',
-                'title': 'Gloriajw-SpotifyWithErikBernhardsson182',
-            },
+        'playlist_count': 2,
+    }, {
+        'url': 'http://pyvideo.org/pygotham-2012/gloriajw-spotifywitherikbernhardsson182m4v.html',
+        'md5': '5fe1c7e0a8aa5570330784c847ff6d12',
+        'info_dict': {
+            'id': '2542',
+            'ext': 'm4v',
+            'title': 'Gloriajw-SpotifyWithErikBernhardsson182.m4v',
          },
-    ]
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
+        category = mobj.group('category')
          video_id = mobj.group('id')
  
-        webpage = self._download_webpage(url, video_id)
+        entries = []
  
-        m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', webpage)
-        if m_youtube is not None:
-            return self.url_result(m_youtube.group(1), 'Youtube')
+        data = self._download_json(
+            'https://raw.githubusercontent.com/pyvideo/data/master/%s/videos/%s.json'
+            % (category, video_id), video_id, fatal=False)
  
-        title = self._html_search_regex(
-            r'<div class="section">\s*<h3(?:\s+class="[^"]*"[^>]*)?>([^>]+?)</h3>',
-            webpage, 'title', flags=re.DOTALL)
-        video_url = self._search_regex(
-            [r'<source src="(.*?)"', r'<dt>Download</dt>.*?<a href="(.+?)"'],
-            webpage, 'video url', flags=re.DOTALL)
+        if data:
+            for video in data['videos']:
+                video_url = video.get('url')
+                if video_url:
+                    if video.get('type') == 'youtube':
+                        entries.append(self.url_result(video_url, 'Youtube'))
+                    else:
+                        entries.append({
+                            'id': compat_str(data.get('id') or video_id),
+                            'url': video_url,
+                            'title': data['title'],
+                            'description': data.get('description') or data.get('summary'),
+                            'thumbnail': data.get('thumbnail_url'),
+                            'duration': int_or_none(data.get('duration')),
+                        })
+        else:
+            webpage = self._download_webpage(url, video_id)
+            title = self._og_search_title(webpage)
+            media_urls = self._search_regex(
+                r'(?s)Media URL:(.+?)</li>', webpage, 'media urls')
+            for m in re.finditer(
+                    r'<a[^>]+href=(["\'])(?P<url>http.+?)\1', media_urls):
+                media_url = m.group('url')
+                if re.match(r'https?://www\.youtube\.com/watch\?v=.*', media_url):
+                    entries.append(self.url_result(media_url, 'Youtube'))
+                else:
+                    entries.append({
+                        'id': video_id,
+                        'url': media_url,
+                        'title': title,
+                    })
  
-        return {
-            'id': video_id,
-            'title': os.path.splitext(title)[0],
-            'url': video_url,
-        }
+        return self.playlist_result(entries, video_id)
diff --git a/youtube_dl/extractor/qqmusic.py b/youtube_dl/extractor/qqmusic.py

index ff0af9543c2b5e5527f406958e9ae5ae4d1adbda..37cb9e2c9dded7c9fa6e1e9eeef4ebeccdf9b4a9 100644 (file)
--- a/youtube_dl/extractor/qqmusic.py
+++ b/youtube_dl/extractor/qqmusic.py
@@ -18,7 +18,7 @@ from ..utils import (
  class QQMusicIE(InfoExtractor):
      IE_NAME = 'qqmusic'
      IE_DESC = 'QQ音乐'
-    _VALID_URL = r'https?://y.qq.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
+    _VALID_URL = r'https?://y\.qq\.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
      _TESTS = [{
          'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD',
          'md5': '9ce1c1c8445f561506d2e3cfb0255705',
@@ -172,7 +172,7 @@ class QQPlaylistBaseIE(InfoExtractor):
  class QQMusicSingerIE(QQPlaylistBaseIE):
      IE_NAME = 'qqmusic:singer'
      IE_DESC = 'QQ音乐 - 歌手'
-    _VALID_URL = r'https?://y.qq.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
+    _VALID_URL = r'https?://y\.qq\.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
      _TEST = {
          'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2',
          'info_dict': {
@@ -217,7 +217,7 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
  class QQMusicAlbumIE(QQPlaylistBaseIE):
      IE_NAME = 'qqmusic:album'
      IE_DESC = 'QQ音乐 - 专辑'
-    _VALID_URL = r'https?://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
+    _VALID_URL = r'https?://y\.qq\.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
  
      _TESTS = [{
          'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1',
diff --git a/youtube_dl/extractor/quickvid.py b/youtube_dl/extractor/quickvid.py

deleted file mode 100644 (file)

index f414e23..0000000
--- a/youtube_dl/extractor/quickvid.py
+++ /dev/null
@@ -1,54 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_urlparse,
-)
-from ..utils import (
-    determine_ext,
-    int_or_none,
-)
-
-
-class QuickVidIE(InfoExtractor):
-    _VALID_URL = r'https?://(www\.)?quickvid\.org/watch\.php\?v=(?P<id>[a-zA-Z_0-9-]+)'
-    _TEST = {
-        'url': 'http://quickvid.org/watch.php?v=sUQT3RCG8dx',
-        'md5': 'c0c72dd473f260c06c808a05d19acdc5',
-        'info_dict': {
-            'id': 'sUQT3RCG8dx',
-            'ext': 'mp4',
-            'title': 'Nick Offerman\'s Summer Reading Recap',
-            'thumbnail': 're:^https?://.*\.(?:png|jpg|gif)$',
-            'view_count': int,
-        },
-        'skip': 'Not accessible from Travis CI server',
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._html_search_regex(r'<h2>(.*?)</h2>', webpage, 'title')
-        view_count = int_or_none(self._html_search_regex(
-            r'(?s)<div id="views">(.*?)</div>',
-            webpage, 'view count', fatal=False))
-        video_code = self._search_regex(
-            r'(?s)<video id="video"[^>]*>(.*?)</video>', webpage, 'video code')
-        formats = [
-            {
-                'url': compat_urlparse.urljoin(url, src),
-                'format_id': determine_ext(src, None),
-            } for src in re.findall('<source\s+src="([^"]+)"', video_code)
-        ]
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-            'thumbnail': self._og_search_thumbnail(webpage),
-            'view_count': view_count,
-        }
diff --git a/youtube_dl/extractor/r7.py b/youtube_dl/extractor/r7.py

index 976c8feec657f8de731d3ffadfa09189ed1628cf..069dbfaed0638e396d024ec81d5142d18f9ad90f 100644 (file)
--- a/youtube_dl/extractor/r7.py
+++ b/youtube_dl/extractor/r7.py
@@ -2,22 +2,19 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import (
-    js_to_json,
-    unescapeHTML,
-    int_or_none,
-)
+from ..utils import int_or_none
  
  
  class R7IE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://
+    _VALID_URL = r'''(?x)
+                        https?://
                          (?:
                              (?:[a-zA-Z]+)\.r7\.com(?:/[^/]+)+/idmedia/|
                              noticias\.r7\.com(?:/[^/]+)+/[^/]+-|
                              player\.r7\.com/video/i/
                          )
                          (?P<id>[\da-f]{24})
-                        '''
+                    '''
      _TESTS = [{
          'url': 'http://videos.r7.com/policiais-humilham-suspeito-a-beira-da-morte-morre-com-dignidade-/idmedia/54e7050b0cf2ff57e0279389.html',
          'md5': '403c4e393617e8e8ddc748978ee8efde',
@@ -25,6 +22,7 @@ class R7IE(InfoExtractor):
              'id': '54e7050b0cf2ff57e0279389',
              'ext': 'mp4',
              'title': 'Policiais humilham suspeito à beira da morte: "Morre com dignidade"',
+            'description': 'md5:01812008664be76a6479aa58ec865b72',
              'thumbnail': 're:^https?://.*\.jpg$',
              'duration': 98,
              'like_count': int,
@@ -44,45 +42,72 @@ class R7IE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(
-            'http://player.r7.com/video/i/%s' % video_id, video_id)
+        video = self._download_json(
+            'http://player-api.r7.com/video/i/%s' % video_id, video_id)
  
-        item = self._parse_json(js_to_json(self._search_regex(
-            r'(?s)var\s+item\s*=\s*({.+?});', webpage, 'player')), video_id)
-
-        title = unescapeHTML(item['title'])
-        thumbnail = item.get('init', {}).get('thumbUri')
-        duration = None
-
-        statistics = item.get('statistics', {})
-        like_count = int_or_none(statistics.get('likes'))
-        view_count = int_or_none(statistics.get('views'))
+        title = video['title']
  
          formats = []
-        for format_key, format_dict in item['playlist'][0].items():
-            src = format_dict.get('src')
-            if not src:
-                continue
-            format_id = format_dict.get('format') or format_key
-            if duration is None:
-                duration = format_dict.get('duration')
-            if '.f4m' in src:
-                formats.extend(self._extract_f4m_formats(src, video_id, preference=-1))
-            elif src.endswith('.m3u8'):
-                formats.extend(self._extract_m3u8_formats(src, video_id, 'mp4', preference=-2))
-            else:
-                formats.append({
-                    'url': src,
-                    'format_id': format_id,
-                })
+        media_url_hls = video.get('media_url_hls')
+        if media_url_hls:
+            formats.extend(self._extract_m3u8_formats(
+                media_url_hls, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+        media_url = video.get('media_url')
+        if media_url:
+            f = {
+                'url': media_url,
+                'format_id': 'http',
+            }
+            # m3u8 format always matches the http format, let's copy metadata from
+            # one to another
+            m3u8_formats = list(filter(
+                lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
+                formats))
+            if len(m3u8_formats) == 1:
+                f_copy = m3u8_formats[0].copy()
+                f_copy.update(f)
+                f_copy['protocol'] = 'http'
+                f = f_copy
+            formats.append(f)
          self._sort_formats(formats)
  
+        description = video.get('description')
+        thumbnail = video.get('thumb')
+        duration = int_or_none(video.get('media_duration'))
+        like_count = int_or_none(video.get('likes'))
+        view_count = int_or_none(video.get('views'))
+
          return {
              'id': video_id,
              'title': title,
+            'description': description,
              'thumbnail': thumbnail,
              'duration': duration,
              'like_count': like_count,
              'view_count': view_count,
              'formats': formats,
          }
+
+
+class R7ArticleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:[a-zA-Z]+)\.r7\.com/(?:[^/]+/)+[^/?#&]+-(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://tv.r7.com/record-play/balanco-geral/videos/policiais-humilham-suspeito-a-beira-da-morte-morre-com-dignidade-16102015',
+        'only_matching': True,
+    }
+
+    @classmethod
+    def suitable(cls, url):
+        return False if R7IE.suitable(url) else super(R7ArticleIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        video_id = self._search_regex(
+            r'<div[^>]+(?:id=["\']player-|class=["\']embed["\'][^>]+id=["\'])([\da-f]{24})',
+            webpage, 'video id')
+
+        return self.url_result('http://player.r7.com/video/i/%s' % video_id, R7IE.ie_key())
diff --git a/youtube_dl/extractor/radiobremen.py b/youtube_dl/extractor/radiobremen.py

index 0cbb15f086f4b3c747f2da80f8af813f8dbf50f0..0aa8d059bf81dffd28df727650b20aafc49302eb 100644 (file)
--- a/youtube_dl/extractor/radiobremen.py
+++ b/youtube_dl/extractor/radiobremen.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  
  from __future__ import unicode_literals
  
@@ -13,15 +13,15 @@ class RadioBremenIE(InfoExtractor):
      IE_NAME = 'radiobremen'
  
      _TEST = {
-        'url': 'http://www.radiobremen.de/mediathek/index.html?id=114720',
+        'url': 'http://www.radiobremen.de/mediathek/?id=141876',
          'info_dict': {
-            'id': '114720',
+            'id': '141876',
              'ext': 'mp4',
-            'duration': 1685,
+            'duration': 178,
              'width': 512,
-            'title': 'buten un binnen vom 22. Dezember',
+            'title': 'Druck auf Patrick Öztürk',
              'thumbnail': 're:https?://.*\.jpg$',
-            'description': 'Unter anderem mit diesen Themen: 45 Flüchtlinge sind in Worpswede angekommen +++ Freies Internet für alle: Bremer arbeiten an einem flächendeckenden W-Lan-Netzwerk +++ Aktivisten kämpfen für das Unibad +++ So war das Wetter 2014 +++',
+            'description': 'Gegen den SPD-Bürgerschaftsabgeordneten Patrick Öztürk wird wegen Beihilfe zum gewerbsmäßigen Betrug ermittelt. Am Donnerstagabend sollte er dem Vorstand des SPD-Unterbezirks Bremerhaven dazu Rede und Antwort stehen.',
          },
      }
  
diff --git a/youtube_dl/extractor/radiocanada.py b/youtube_dl/extractor/radiocanada.py

new file mode 100644 (file)

index 0000000..321917a
--- /dev/null
+++ b/youtube_dl/extractor/radiocanada.py
@@ -0,0 +1,170 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    xpath_text,
+    find_xpath_attr,
+    determine_ext,
+    int_or_none,
+    unified_strdate,
+    xpath_element,
+    ExtractorError,
+    determine_protocol,
+    unsmuggle_url,
+)
+
+
+class RadioCanadaIE(InfoExtractor):
+    IE_NAME = 'radiocanada'
+    _VALID_URL = r'(?:radiocanada:|https?://ici\.radio-canada\.ca/widgets/mediaconsole/)(?P<app_code>[^:/]+)[:/](?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7184272',
+        'info_dict': {
+            'id': '7184272',
+            'ext': 'mp4',
+            'title': 'Le parcours du tireur capté sur vidéo',
+            'description': 'Images des caméras de surveillance fournies par la GRC montrant le parcours du tireur d\'Ottawa',
+            'upload_date': '20141023',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        url, smuggled_data = unsmuggle_url(url, {})
+        app_code, video_id = re.match(self._VALID_URL, url).groups()
+
+        metadata = self._download_xml(
+            'http://api.radio-canada.ca/metaMedia/v1/index.ashx',
+            video_id, note='Downloading metadata XML', query={
+                'appCode': app_code,
+                'idMedia': video_id,
+            })
+
+        def get_meta(name):
+            el = find_xpath_attr(metadata, './/Meta', 'name', name)
+            return el.text if el is not None else None
+
+        if get_meta('protectionType'):
+            raise ExtractorError('This video is DRM protected.', expected=True)
+
+        device_types = ['ipad']
+        if app_code != 'toutv':
+            device_types.append('flash')
+        if not smuggled_data:
+            device_types.append('android')
+
+        formats = []
+        # TODO: extract f4m formats
+        # f4m formats can be extracted using flashhd device_type but they produce unplayable file
+        for device_type in device_types:
+            validation_url = 'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx'
+            query = {
+                'appCode': app_code,
+                'idMedia': video_id,
+                'connectionType': 'broadband',
+                'multibitrate': 'true',
+                'deviceType': device_type,
+            }
+            if smuggled_data:
+                validation_url = 'https://services.radio-canada.ca/media/validation/v2/'
+                query.update(smuggled_data)
+            else:
+                query.update({
+                    # paysJ391wsHjbOJwvCs26toz and bypasslock are used to bypass geo-restriction
+                    'paysJ391wsHjbOJwvCs26toz': 'CA',
+                    'bypasslock': 'NZt5K62gRqfc',
+                })
+            v_data = self._download_xml(validation_url, video_id, note='Downloading %s XML' % device_type, query=query, fatal=False)
+            v_url = xpath_text(v_data, 'url')
+            if not v_url:
+                continue
+            if v_url == 'null':
+                raise ExtractorError('%s said: %s' % (
+                    self.IE_NAME, xpath_text(v_data, 'message')), expected=True)
+            ext = determine_ext(v_url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    v_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    v_url, video_id, f4m_id='hds', fatal=False))
+            else:
+                ext = determine_ext(v_url)
+                bitrates = xpath_element(v_data, 'bitrates')
+                for url_e in bitrates.findall('url'):
+                    tbr = int_or_none(url_e.get('bitrate'))
+                    if not tbr:
+                        continue
+                    f_url = re.sub(r'\d+\.%s' % ext, '%d.%s' % (tbr, ext), v_url)
+                    protocol = determine_protocol({'url': f_url})
+                    formats.append({
+                        'format_id': '%s-%d' % (protocol, tbr),
+                        'url': f_url,
+                        'ext': 'flv' if protocol == 'rtmp' else ext,
+                        'protocol': protocol,
+                        'width': int_or_none(url_e.get('width')),
+                        'height': int_or_none(url_e.get('height')),
+                        'tbr': tbr,
+                    })
+                    if protocol == 'rtsp':
+                        base_url = self._search_regex(
+                            r'rtsp://([^?]+)', f_url, 'base url', default=None)
+                        if base_url:
+                            base_url = 'http://' + base_url
+                            formats.extend(self._extract_m3u8_formats(
+                                base_url + '/playlist.m3u8', video_id, 'mp4',
+                                'm3u8_native', m3u8_id='hls', fatal=False))
+                            formats.extend(self._extract_f4m_formats(
+                                base_url + '/manifest.f4m', video_id,
+                                f4m_id='hds', fatal=False))
+        self._sort_formats(formats)
+
+        subtitles = {}
+        closed_caption_url = get_meta('closedCaption') or get_meta('closedCaptionHTML5')
+        if closed_caption_url:
+            subtitles['fr'] = [{
+                'url': closed_caption_url,
+                'ext': determine_ext(closed_caption_url, 'vtt'),
+            }]
+
+        return {
+            'id': video_id,
+            'title': get_meta('Title'),
+            'description': get_meta('Description') or get_meta('ShortDescription'),
+            'thumbnail': get_meta('imageHR') or get_meta('imageMR') or get_meta('imageBR'),
+            'duration': int_or_none(get_meta('length')),
+            'series': get_meta('Emission'),
+            'season_number': int_or_none('SrcSaison'),
+            'episode_number': int_or_none('SrcEpisode'),
+            'upload_date': unified_strdate(get_meta('Date')),
+            'subtitles': subtitles,
+            'formats': formats,
+        }
+
+
+class RadioCanadaAudioVideoIE(InfoExtractor):
+    'radiocanada:audiovideo'
+    _VALID_URL = r'https?://ici\.radio-canada\.ca/audio-video/media-(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://ici.radio-canada.ca/audio-video/media-7527184/barack-obama-au-vietnam',
+        'info_dict': {
+            'id': '7527184',
+            'ext': 'mp4',
+            'title': 'Barack Obama au Vietnam',
+            'description': 'Les États-Unis lèvent l\'embargo sur la vente d\'armes qui datait de la guerre du Vietnam',
+            'upload_date': '20160523',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        return self.url_result('radiocanada:medianet:%s' % self._match_id(url))
diff --git a/youtube_dl/extractor/radiojavan.py b/youtube_dl/extractor/radiojavan.py

index 884c284206cb73303a6631695872ce79ecf7795e..ec4fa6e602ea779dd6d3a530ea6cfb639eee3cf4 100644 (file)
--- a/youtube_dl/extractor/radiojavan.py
+++ b/youtube_dl/extractor/radiojavan.py
@@ -3,7 +3,7 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import(
+from ..utils import (
      unified_strdate,
      str_to_int,
  )
diff --git a/youtube_dl/extractor/rai.py b/youtube_dl/extractor/rai.py

index e36ce1aa1940deafd5a633bec814e7462008c3b1..dc640b1bcb58ddb79c89e5f2346a5bc5c63a3547 100644 (file)
--- a/youtube_dl/extractor/rai.py
+++ b/youtube_dl/extractor/rai.py
@@ -1,47 +1,141 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
-    ExtractorError,
      determine_ext,
+    ExtractorError,
+    find_xpath_attr,
+    fix_xml_ampersands,
+    int_or_none,
      parse_duration,
      unified_strdate,
-    int_or_none,
+    update_url_query,
      xpath_text,
  )
  
  
-class RaiTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/(?:[^/]+/)+media/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
+class RaiBaseIE(InfoExtractor):
+    def _extract_relinker_formats(self, relinker_url, video_id):
+        formats = []
+
+        for platform in ('mon', 'flash', 'native'):
+            relinker = self._download_xml(
+                relinker_url, video_id,
+                note='Downloading XML metadata for platform %s' % platform,
+                transform_source=fix_xml_ampersands,
+                query={'output': 45, 'pl': platform},
+                headers=self.geo_verification_headers())
+
+            media_url = find_xpath_attr(relinker, './url', 'type', 'content').text
+            if media_url == 'http://download.rai.it/video_no_available.mp4':
+                self.raise_geo_restricted()
+
+            ext = determine_ext(media_url)
+            if (ext == 'm3u8' and platform != 'mon') or (ext == 'f4m' and platform != 'flash'):
+                continue
+
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    media_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif ext == 'f4m':
+                manifest_url = update_url_query(
+                    media_url.replace('manifest#live_hds.f4m', 'manifest.f4m'),
+                    {'hdcore': '3.7.0', 'plugin': 'aasp-3.7.0.39.44'})
+                formats.extend(self._extract_f4m_formats(
+                    manifest_url, video_id, f4m_id='hds', fatal=False))
+            else:
+                bitrate = int_or_none(xpath_text(relinker, 'bitrate'))
+                formats.append({
+                    'url': media_url,
+                    'tbr': bitrate if bitrate > 0 else None,
+                    'format_id': 'http-%d' % bitrate if bitrate > 0 else 'http',
+                })
+
+        return formats
+
+    def _extract_from_content_id(self, content_id, base_url):
+        media = self._download_json(
+            'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-%s.html?json' % content_id,
+            content_id, 'Downloading video JSON')
+
+        thumbnails = []
+        for image_type in ('image', 'image_medium', 'image_300'):
+            thumbnail_url = media.get(image_type)
+            if thumbnail_url:
+                thumbnails.append({
+                    'url': compat_urlparse.urljoin(base_url, thumbnail_url),
+                })
+
+        formats = []
+        media_type = media['type']
+        if 'Audio' in media_type:
+            formats.append({
+                'format_id': media.get('formatoAudio'),
+                'url': media['audioUrl'],
+                'ext': media.get('formatoAudio'),
+            })
+        elif 'Video' in media_type:
+            formats.extend(self._extract_relinker_formats(media['mediaUri'], content_id))
+            self._sort_formats(formats)
+        else:
+            raise ExtractorError('not a media file')
+
+        subtitles = {}
+        captions = media.get('subtitlesUrl')
+        if captions:
+            STL_EXT = '.stl'
+            SRT_EXT = '.srt'
+            if captions.endswith(STL_EXT):
+                captions = captions[:-len(STL_EXT)] + SRT_EXT
+            subtitles['it'] = [{
+                'ext': 'srt',
+                'url': captions,
+            }]
+
+        return {
+            'id': content_id,
+            'title': media['name'],
+            'description': media.get('desc'),
+            'thumbnails': thumbnails,
+            'uploader': media.get('author'),
+            'upload_date': unified_strdate(media.get('date')),
+            'duration': parse_duration(media.get('length')),
+            'formats': formats,
+            'subtitles': subtitles,
+        }
+
+
+class RaiTVIE(RaiBaseIE):
+    _VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/(?:[^/]+/)+(?:media|ondemand)/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
      _TESTS = [
          {
              'url': 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-cb27157f-9dd0-4aee-b788-b1f67643a391.html',
-            'md5': '96382709b61dd64a6b88e0f791e6df4c',
+            'md5': '8970abf8caf8aef4696e7b1f2adfc696',
              'info_dict': {
                  'id': 'cb27157f-9dd0-4aee-b788-b1f67643a391',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Report del 07/04/2014',
                  'description': 'md5:f27c544694cacb46a078db84ec35d2d9',
                  'upload_date': '20140407',
                  'duration': 6160,
+                'thumbnail': 're:^https?://.*\.jpg$',
              }
          },
          {
+            # no m3u8 stream
              'url': 'http://www.raisport.rai.it/dl/raiSport/media/rassegna-stampa-04a9f4bd-b563-40cf-82a6-aad3529cb4a9.html',
-            'md5': 'd9751b78eac9710d62c2447b224dea39',
+            # HDS download, MD5 is unstable
              'info_dict': {
                  'id': '04a9f4bd-b563-40cf-82a6-aad3529cb4a9',
                  'ext': 'flv',
                  'title': 'TG PRIMO TEMPO',
                  'upload_date': '20140612',
                  'duration': 1758,
+                'thumbnail': 're:^https?://.*\.jpg$',
              },
+            'skip': 'Geo-restricted to Italy',
          },
          {
              'url': 'http://www.rainews.it/dl/rainews/media/state-of-the-net-Antonella-La-Carpia-regole-virali-7aafdea9-0e5d-49d5-88a6-7e65da67ae13.html',
@@ -67,127 +161,70 @@ class RaiTVIE(InfoExtractor):
          },
          {
              'url': 'http://www.ilcandidato.rai.it/dl/ray/media/Il-Candidato---Primo-episodio-Le-Primarie-28e5525a-b495-45e8-a7c3-bc48ba45d2b6.html',
-            'md5': '496ab63e420574447f70d02578333437',
+            'md5': 'e57493e1cb8bc7c564663f363b171847',
              'info_dict': {
                  'id': '28e5525a-b495-45e8-a7c3-bc48ba45d2b6',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Il Candidato - Primo episodio: "Le Primarie"',
                  'description': 'md5:364b604f7db50594678f483353164fb8',
                  'upload_date': '20140923',
                  'duration': 386,
+                'thumbnail': 're:^https?://.*\.jpg$',
              }
          },
      ]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        media = self._download_json(
-            'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-%s.html?json' % video_id,
-            video_id, 'Downloading video JSON')
-
-        thumbnails = []
-        for image_type in ('image', 'image_medium', 'image_300'):
-            thumbnail_url = media.get(image_type)
-            if thumbnail_url:
-                thumbnails.append({
-                    'url': thumbnail_url,
-                })
-
-        subtitles = []
-        formats = []
-        media_type = media['type']
-        if 'Audio' in media_type:
-            formats.append({
-                'format_id': media.get('formatoAudio'),
-                'url': media['audioUrl'],
-                'ext': media.get('formatoAudio'),
-            })
-        elif 'Video' in media_type:
-            def fix_xml(xml):
-                return xml.replace(' tag elementi', '').replace('>/', '</')
-
-            relinker = self._download_xml(
-                media['mediaUri'] + '&output=43',
-                video_id, transform_source=fix_xml)
-
-            has_subtitle = False
-
-            for element in relinker.findall('element'):
-                media_url = xpath_text(element, 'url')
-                ext = determine_ext(media_url)
-                content_type = xpath_text(element, 'content-type')
-                if ext == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        media_url, video_id, 'mp4', 'm3u8_native',
-                        m3u8_id='hls', fatal=False))
-                elif ext == 'f4m':
-                    formats.extend(self._extract_f4m_formats(
-                        media_url + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44',
-                        video_id, f4m_id='hds', fatal=False))
-                elif ext == 'stl':
-                    has_subtitle = True
-                elif content_type.startswith('video/'):
-                    bitrate = int_or_none(xpath_text(element, 'bitrate'))
-                    formats.append({
-                        'url': media_url,
-                        'tbr': bitrate if bitrate > 0 else None,
-                        'format_id': 'http-%d' % bitrate if bitrate > 0 else 'http',
-                    })
-                elif content_type.startswith('image/'):
-                    thumbnails.append({
-                        'url': media_url,
-                    })
-
-            self._sort_formats(formats)
  
-            if has_subtitle:
-                webpage = self._download_webpage(url, video_id)
-                subtitles = self._get_subtitles(video_id, webpage)
-        else:
-            raise ExtractorError('not a media file')
+        return self._extract_from_content_id(video_id, url)
  
-        return {
-            'id': video_id,
-            'title': media['name'],
-            'description': media.get('desc'),
-            'thumbnails': thumbnails,
-            'uploader': media.get('author'),
-            'upload_date': unified_strdate(media.get('date')),
-            'duration': parse_duration(media.get('length')),
-            'formats': formats,
-            'subtitles': subtitles,
-        }
  
-    def _get_subtitles(self, video_id, webpage):
-        subtitles = {}
-        m = re.search(r'<meta name="closedcaption" content="(?P<captions>[^"]+)"', webpage)
-        if m:
-            captions = m.group('captions')
-            STL_EXT = '.stl'
-            SRT_EXT = '.srt'
-            if captions.endswith(STL_EXT):
-                captions = captions[:-len(STL_EXT)] + SRT_EXT
-            subtitles['it'] = [{
-                'ext': 'srt',
-                'url': 'http://www.rai.tv%s' % compat_urllib_parse.quote(captions),
-            }]
-        return subtitles
-
-
-class RaiIE(InfoExtractor):
+class RaiIE(RaiBaseIE):
      _VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
      _TESTS = [
          {
              'url': 'http://www.report.rai.it/dl/Report/puntata/ContentItem-0c7a664b-d0f4-4b2c-8835-3f82e46f433e.html',
-            'md5': 'e0e7a8a131e249d1aa0ebf270d1d8db7',
+            'md5': '2dd727e61114e1ee9c47f0da6914e178',
              'info_dict': {
                  'id': '59d69d28-6bb6-409d-a4b5-ed44096560af',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Il pacco',
                  'description': 'md5:4b1afae1364115ce5d78ed83cd2e5b3a',
                  'upload_date': '20141221',
              },
-        }
+        },
+        {
+            # Direct relinker URL
+            'url': 'http://www.rai.tv/dl/RaiTV/dirette/PublishingBlock-1912dbbf-3f96-44c3-b4cf-523681fbacbc.html?channel=EuroNews',
+            # HDS live stream, MD5 is unstable
+            'info_dict': {
+                'id': '1912dbbf-3f96-44c3-b4cf-523681fbacbc',
+                'ext': 'flv',
+                'title': 'EuroNews',
+            },
+            'skip': 'Geo-restricted to Italy',
+        },
+        {
+            # Embedded content item ID
+            'url': 'http://www.tg1.rai.it/dl/tg1/2010/edizioni/ContentSet-9b6e0cba-4bef-4aef-8cf0-9f7f665b7dfb-tg1.html?item=undefined',
+            'md5': '84c1135ce960e8822ae63cec34441d63',
+            'info_dict': {
+                'id': '0960e765-62c8-474a-ac4b-7eb3e2be39c8',
+                'ext': 'mp4',
+                'title': 'TG1 ore 20:00 del 02/07/2016',
+                'upload_date': '20160702',
+            },
+        },
+        {
+            'url': 'http://www.rainews.it/dl/rainews/live/ContentItem-3156f2f2-dc70-4953-8e2f-70d7489d4ce9.html',
+            # HDS live stream, MD5 is unstable
+            'info_dict': {
+                'id': '3156f2f2-dc70-4953-8e2f-70d7489d4ce9',
+                'ext': 'flv',
+                'title': 'La diretta di Rainews24',
+            },
+        },
      ]
  
      @classmethod
@@ -201,7 +238,30 @@ class RaiIE(InfoExtractor):
          iframe_url = self._search_regex(
              [r'<iframe[^>]+src="([^"]*/dl/[^"]+\?iframe\b[^"]*)"',
               r'drawMediaRaiTV\(["\'](.+?)["\']'],
-            webpage, 'iframe')
-        if not iframe_url.startswith('http'):
-            iframe_url = compat_urlparse.urljoin(url, iframe_url)
-        return self.url_result(iframe_url)
+            webpage, 'iframe', default=None)
+        if iframe_url:
+            if not iframe_url.startswith('http'):
+                iframe_url = compat_urlparse.urljoin(url, iframe_url)
+            return self.url_result(iframe_url)
+
+        content_item_id = self._search_regex(
+            r'initEdizione\((?P<q1>[\'"])ContentItem-(?P<content_id>[^\'"]+)(?P=q1)',
+            webpage, 'content item ID', group='content_id', default=None)
+        if content_item_id:
+            return self._extract_from_content_id(content_item_id, url)
+
+        relinker_url = compat_urlparse.urljoin(url, self._search_regex(
+            r'(?:var\s+videoURL|mediaInfo\.mediaUri)\s*=\s*(?P<q1>[\'"])(?P<url>(https?:)?//mediapolis\.rai\.it/relinker/relinkerServlet\.htm\?cont=\d+)(?P=q1)',
+            webpage, 'relinker URL', group='url'))
+        formats = self._extract_relinker_formats(relinker_url, video_id)
+        self._sort_formats(formats)
+
+        title = self._search_regex(
+            r'var\s+videoTitolo\s*=\s*([\'"])(?P<title>[^\'"]+)\1',
+            webpage, 'title', group='title', default=None) or self._og_search_title(webpage)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/rbmaradio.py b/youtube_dl/extractor/rbmaradio.py

index 7932af6ef7c599fdcce5c95bcdbf4e77f162d45d..471928ef86b5d434953fc694eef0bb7da8edd334 100644 (file)
--- a/youtube_dl/extractor/rbmaradio.py
+++ b/youtube_dl/extractor/rbmaradio.py
@@ -1,55 +1,71 @@
-# encoding: utf-8
  from __future__ import unicode_literals
  
-import json
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
-    ExtractorError,
+    clean_html,
+    int_or_none,
+    unified_timestamp,
+    update_url_query,
  )
  
  
  class RBMARadioIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?rbmaradio\.com/shows/(?P<videoID>[^/]+)$'
+    _VALID_URL = r'https?://(?:www\.)?rbmaradio\.com/shows/(?P<show_id>[^/]+)/episodes/(?P<id>[^/?#&]+)'
      _TEST = {
-        'url': 'http://www.rbmaradio.com/shows/ford-lopatin-live-at-primavera-sound-2011',
+        'url': 'https://www.rbmaradio.com/shows/main-stage/episodes/ford-lopatin-live-at-primavera-sound-2011',
          'md5': '6bc6f9bcb18994b4c983bc3bf4384d95',
          'info_dict': {
              'id': 'ford-lopatin-live-at-primavera-sound-2011',
              'ext': 'mp3',
-            'uploader_id': 'ford-lopatin',
-            'location': 'Spain',
-            'description': 'Joel Ford and Daniel ’Oneohtrix Point Never’ Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.',
-            'uploader': 'Ford & Lopatin',
-            'title': 'Live at Primavera Sound 2011',
+            'title': 'Main Stage - Ford & Lopatin',
+            'description': 'md5:4f340fb48426423530af5a9d87bd7b91',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 2452,
+            'timestamp': 1307103164,
+            'upload_date': '20110603',
          },
      }
  
      def _real_extract(self, url):
-        m = re.match(self._VALID_URL, url)
-        video_id = m.group('videoID')
+        mobj = re.match(self._VALID_URL, url)
+        show_id = mobj.group('show_id')
+        episode_id = mobj.group('id')
  
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(url, episode_id)
  
-        json_data = self._search_regex(r'window\.gon.*?gon\.show=(.+?);$',
-                                       webpage, 'json data', flags=re.MULTILINE)
+        episode = self._parse_json(
+            self._search_regex(
+                r'__INITIAL_STATE__\s*=\s*({.+?})\s*</script>',
+                webpage, 'json data'),
+            episode_id)['episodes'][show_id][episode_id]
  
-        try:
-            data = json.loads(json_data)
-        except ValueError as e:
-            raise ExtractorError('Invalid JSON: ' + str(e))
+        title = episode['title']
  
-        video_url = data['akamai_url'] + '&cbr=256'
+        show_title = episode.get('showTitle')
+        if show_title:
+            title = '%s - %s' % (show_title, title)
+
+        formats = [{
+            'url': update_url_query(episode['audioURL'], query={'cbr': abr}),
+            'format_id': compat_str(abr),
+            'abr': abr,
+            'vcodec': 'none',
+        } for abr in (96, 128, 256)]
+
+        description = clean_html(episode.get('longTeaser'))
+        thumbnail = self._proto_relative_url(episode.get('imageURL', {}).get('landscape'))
+        duration = int_or_none(episode.get('duration'))
+        timestamp = unified_timestamp(episode.get('publishedAt'))
  
          return {
-            'id': video_id,
-            'url': video_url,
-            'title': data['title'],
-            'description': data.get('teaser_text'),
-            'location': data.get('country_of_origin'),
-            'uploader': data.get('host', {}).get('name'),
-            'uploader_id': data.get('host', {}).get('slug'),
-            'thumbnail': data.get('image', {}).get('large_url_2x'),
-            'duration': data.get('duration'),
+            'id': episode_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'timestamp': timestamp,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/rds.py b/youtube_dl/extractor/rds.py

index 796adfdf9dab7f07481328026bb21591a7aa5612..bf200ea4d3f8b17f171bcce01c930b5d183fcc2e 100644 (file)
--- a/youtube_dl/extractor/rds.py
+++ b/youtube_dl/extractor/rds.py
@@ -1,23 +1,23 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
      parse_duration,
      parse_iso8601,
+    js_to_json,
  )
+from ..compat import compat_str
  
  
  class RDSIE(InfoExtractor):
      IE_DESC = 'RDS.ca'
-    _VALID_URL = r'https?://(?:www\.)?rds\.ca/vid(?:[eé]|%C3%A9)os/(?:[^/]+/)*(?P<display_id>[^/]+)-(?P<id>\d+\.\d+)'
+    _VALID_URL = r'https?://(?:www\.)?rds\.ca/vid(?:[eé]|%C3%A9)os/(?:[^/]+/)*(?P<id>[^/]+)-\d+\.\d+'
  
      _TESTS = [{
          'url': 'http://www.rds.ca/videos/football/nfl/fowler-jr-prend-la-direction-de-jacksonville-3.1132799',
          'info_dict': {
-            'id': '3.1132799',
+            'id': '604333',
              'display_id': 'fowler-jr-prend-la-direction-de-jacksonville',
              'ext': 'mp4',
              'title': 'Fowler Jr. prend la direction de Jacksonville',
@@ -33,22 +33,17 @@ class RDSIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        display_id = self._match_id(url)
  
          webpage = self._download_webpage(url, display_id)
  
-        # TODO: extract f4m from 9c9media.com
-        video_url = self._search_regex(
-            r'<span[^>]+itemprop="contentURL"[^>]+content="([^"]+)"',
-            webpage, 'video url')
-
-        title = self._og_search_title(webpage) or self._html_search_meta(
+        item = self._parse_json(self._search_regex(r'(?s)itemToPush\s*=\s*({.+?});', webpage, 'item'), display_id, js_to_json)
+        video_id = compat_str(item['id'])
+        title = item.get('title') or self._og_search_title(webpage) or self._html_search_meta(
              'title', webpage, 'title', fatal=True)
          description = self._og_search_description(webpage) or self._html_search_meta(
              'description', webpage, 'description')
-        thumbnail = self._og_search_thumbnail(webpage) or self._search_regex(
+        thumbnail = item.get('urlImageBig') or self._og_search_thumbnail(webpage) or self._search_regex(
              [r'<link[^>]+itemprop="thumbnailUrl"[^>]+href="([^"]+)"',
               r'<span[^>]+itemprop="thumbnailUrl"[^>]+content="([^"]+)"'],
              webpage, 'thumbnail', fatal=False)
@@ -61,13 +56,15 @@ class RDSIE(InfoExtractor):
          age_limit = self._family_friendly_search(webpage)
  
          return {
+            '_type': 'url_transparent',
              'id': video_id,
              'display_id': display_id,
-            'url': video_url,
+            'url': '9c9media:rds_web:%s' % video_id,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
              'timestamp': timestamp,
              'duration': duration,
              'age_limit': age_limit,
+            'ie_key': 'NineCNineMedia',
          }
diff --git a/youtube_dl/extractor/redtube.py b/youtube_dl/extractor/redtube.py

index 7ba41ba593295cdc7d2e28e6b64702321ed1ef08..c367a6ae74f3a7b63dd50f035a2d380f76dc3719 100644 (file)
--- a/youtube_dl/extractor/redtube.py
+++ b/youtube_dl/extractor/redtube.py
@@ -1,35 +1,82 @@
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    str_to_int,
+    unified_strdate,
+)
  
  
  class RedTubeIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?redtube\.com/(?P<id>[0-9]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:(?:www\.)?redtube\.com/|embed\.redtube\.com/\?.*?\bid=)(?P<id>[0-9]+)'
+    _TESTS = [{
          'url': 'http://www.redtube.com/66418',
          'md5': '7b8c22b5e7098a3e1c09709df1126d2d',
          'info_dict': {
              'id': '66418',
              'ext': 'mp4',
              'title': 'Sucked on a toilet',
+            'upload_date': '20120831',
+            'duration': 596,
+            'view_count': int,
              'age_limit': 18,
          }
-    }
+    }, {
+        'url': 'http://embed.redtube.com/?bgcolor=000000&id=1443286',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//embed\.redtube\.com/\?.*?\bid=\d+)',
+            webpage)
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(
+            'http://www.redtube.com/%s' % video_id, video_id)
  
          if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']):
              raise ExtractorError('Video %s has been removed' % video_id, expected=True)
  
-        video_url = self._html_search_regex(
-            r'<source src="(.+?)" type="video/mp4">', webpage, 'video URL')
-        video_title = self._html_search_regex(
-            r'<h1 class="videoTitle[^"]*">(.+?)</h1>',
-            webpage, 'title')
-        video_thumbnail = self._og_search_thumbnail(webpage)
+        title = self._html_search_regex(
+            (r'<h1 class="videoTitle[^"]*">(?P<title>.+?)</h1>',
+             r'videoTitle\s*:\s*(["\'])(?P<title>)\1'),
+            webpage, 'title', group='title')
+
+        formats = []
+        sources = self._parse_json(
+            self._search_regex(
+                r'sources\s*:\s*({.+?})', webpage, 'source', default='{}'),
+            video_id, fatal=False)
+        if sources and isinstance(sources, dict):
+            for format_id, format_url in sources.items():
+                if format_url:
+                    formats.append({
+                        'url': format_url,
+                        'format_id': format_id,
+                        'height': int_or_none(format_id),
+                    })
+        else:
+            video_url = self._html_search_regex(
+                r'<source src="(.+?)" type="video/mp4">', webpage, 'video URL')
+            formats.append({'url': video_url})
+        self._sort_formats(formats)
+
+        thumbnail = self._og_search_thumbnail(webpage)
+        upload_date = unified_strdate(self._search_regex(
+            r'<span[^>]+class="added-time"[^>]*>ADDED ([^<]+)<',
+            webpage, 'upload date', fatal=False))
+        duration = int_or_none(self._search_regex(
+            r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
+        view_count = str_to_int(self._search_regex(
+            r'<span[^>]*>VIEWS</span></td>\s*<td>([\d,.]+)',
+            webpage, 'view count', fatal=False))
  
          # No self-labeling, but they describe themselves as
          # "Home of Videos Porno"
@@ -37,9 +84,12 @@ class RedTubeIE(InfoExtractor):
  
          return {
              'id': video_id,
-            'url': video_url,
              'ext': 'mp4',
-            'title': video_title,
-            'thumbnail': video_thumbnail,
+            'title': title,
+            'thumbnail': thumbnail,
+            'upload_date': upload_date,
+            'duration': duration,
+            'view_count': view_count,
              'age_limit': age_limit,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/rentv.py b/youtube_dl/extractor/rentv.py

new file mode 100644 (file)

index 0000000..422c02c
--- /dev/null
+++ b/youtube_dl/extractor/rentv.py
@@ -0,0 +1,76 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .jwplatform import JWPlatformBaseIE
+from ..compat import compat_str
+
+
+class RENTVIE(JWPlatformBaseIE):
+    _VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://ren.tv/video/epizod/118577',
+        'md5': 'd91851bf9af73c0ad9b2cdf76c127fbb',
+        'info_dict': {
+            'id': '118577',
+            'ext': 'mp4',
+            'title': 'Документальный спецпроект: "Промывка мозгов. Технологии XXI века"'
+        }
+    }, {
+        'url': 'http://ren.tv/player/118577',
+        'only_matching': True,
+    }, {
+        'url': 'rentv:118577',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage('http://ren.tv/player/' + video_id, video_id)
+        jw_config = self._parse_json(self._search_regex(
+            r'config\s*=\s*({.+});', webpage, 'jw config'), video_id)
+        return self._parse_jwplayer_data(jw_config, video_id, m3u8_id='hls')
+
+
+class RENTVArticleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?ren\.tv/novosti/\d{4}-\d{2}-\d{2}/(?P<id>[^/?#]+)'
+    _TESTS = [{
+        'url': 'http://ren.tv/novosti/2016-10-26/video-mikroavtobus-popavshiy-v-dtp-s-gruzovikami-v-podmoskove-prevratilsya-v',
+        'md5': 'ebd63c4680b167693745ab91343df1d6',
+        'info_dict': {
+            'id': '136472',
+            'ext': 'mp4',
+            'title': 'Видео: микроавтобус, попавший в ДТП с грузовиками в Подмосковье, превратился в груду металла',
+            'description': 'Жертвами столкновения двух фур и микроавтобуса, по последним данным, стали семь человек.',
+        }
+    }, {
+        # TODO: invalid m3u8
+        'url': 'http://ren.tv/novosti/2015-09-25/sluchaynyy-prohozhiy-poymal-avtougonshchika-v-murmanske-video',
+        'info_dict': {
+            'id': 'playlist',
+            'ext': 'mp4',
+            'title': 'Случайный прохожий поймал автоугонщика в Мурманске. ВИДЕО | РЕН ТВ',
+            'uploader': 'ren.tv',
+        },
+        'params': {
+            # m3u8 downloads
+            'skip_download': True,
+        },
+        'skip': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        drupal_settings = self._parse_json(self._search_regex(
+            r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
+            webpage, 'drupal settings'), display_id)
+
+        entries = []
+        for config_profile in drupal_settings.get('ren_jwplayer', {}).values():
+            media_id = config_profile.get('mediaid')
+            if not media_id:
+                continue
+            media_id = compat_str(media_id)
+            entries.append(self.url_result('rentv:' + media_id, 'RENTV', media_id))
+        return self.playlist_result(entries, display_id)
diff --git a/youtube_dl/extractor/reuters.py b/youtube_dl/extractor/reuters.py

new file mode 100644 (file)

index 0000000..961d504
--- /dev/null
+++ b/youtube_dl/extractor/reuters.py
@@ -0,0 +1,69 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    js_to_json,
+    int_or_none,
+    unescapeHTML,
+)
+
+
+class ReutersIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?reuters\.com/.*?\?.*?videoId=(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.reuters.com/video/2016/05/20/san-francisco-police-chief-resigns?videoId=368575562',
+        'md5': '8015113643a0b12838f160b0b81cc2ee',
+        'info_dict': {
+            'id': '368575562',
+            'ext': 'mp4',
+            'title': 'San Francisco police chief resigns',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(
+            'http://www.reuters.com/assets/iframe/yovideo?videoId=%s' % video_id, video_id)
+        video_data = js_to_json(self._search_regex(
+            r'(?s)Reuters\.yovideo\.drawPlayer\(({.*?})\);',
+            webpage, 'video data'))
+
+        def get_json_value(key, fatal=False):
+            return self._search_regex('"%s"\s*:\s*"([^"]+)"' % key, video_data, key, fatal=fatal)
+
+        title = unescapeHTML(get_json_value('title', fatal=True))
+        mmid, fid = re.search(r',/(\d+)\?f=(\d+)', get_json_value('flv', fatal=True)).groups()
+
+        mas_data = self._download_json(
+            'http://mas-e.cds1.yospace.com/mas/%s/%s?trans=json' % (mmid, fid),
+            video_id, transform_source=js_to_json)
+        formats = []
+        for f in mas_data:
+            f_url = f.get('url')
+            if not f_url:
+                continue
+            method = f.get('method')
+            if method == 'hls':
+                formats.extend(self._extract_m3u8_formats(
+                    f_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+            else:
+                container = f.get('container')
+                ext = '3gp' if method == 'mobile' else container
+                formats.append({
+                    'format_id': ext,
+                    'url': f_url,
+                    'ext': ext,
+                    'container': container if method != 'mobile' else None,
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': get_json_value('thumb'),
+            'duration': int_or_none(get_json_value('seconds')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/reverbnation.py b/youtube_dl/extractor/reverbnation.py

index 3c6725aeb42945ce7f4e07b49bcd0d629248fcac..4875009e5cafd68867b67393d36d90625e5f29c8 100644 (file)
--- a/youtube_dl/extractor/reverbnation.py
+++ b/youtube_dl/extractor/reverbnation.py
@@ -1,29 +1,29 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..utils import str_or_none
+from ..utils import (
+    qualities,
+    str_or_none,
+)
  
  
  class ReverbNationIE(InfoExtractor):
      _VALID_URL = r'^https?://(?:www\.)?reverbnation\.com/.*?/song/(?P<id>\d+).*?$'
      _TESTS = [{
          'url': 'http://www.reverbnation.com/alkilados/song/16965047-mona-lisa',
-        'md5': '3da12ebca28c67c111a7f8b262d3f7a7',
+        'md5': 'c0aaf339bcee189495fdf5a8c8ba8645',
          'info_dict': {
              'id': '16965047',
              'ext': 'mp3',
              'title': 'MONA LISA',
              'uploader': 'ALKILADOS',
              'uploader_id': '216429',
-            'thumbnail': 're:^https://gp1\.wac\.edgecastcdn\.net/.*?\.jpg$'
+            'thumbnail': 're:^https?://.*\.jpg',
          },
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        song_id = mobj.group('id')
+        song_id = self._match_id(url)
  
          api_res = self._download_json(
              'https://api.reverbnation.com/song/%s' % song_id,
@@ -31,14 +31,23 @@ class ReverbNationIE(InfoExtractor):
              note='Downloading information of song %s' % song_id
          )
  
+        THUMBNAILS = ('thumbnail', 'image')
+        quality = qualities(THUMBNAILS)
+        thumbnails = []
+        for thumb_key in THUMBNAILS:
+            if api_res.get(thumb_key):
+                thumbnails.append({
+                    'url': api_res[thumb_key],
+                    'preference': quality(thumb_key)
+                })
+
          return {
              'id': song_id,
-            'title': api_res.get('name'),
-            'url': api_res.get('url'),
+            'title': api_res['name'],
+            'url': api_res['url'],
              'uploader': api_res.get('artist', {}).get('name'),
              'uploader_id': str_or_none(api_res.get('artist', {}).get('id')),
-            'thumbnail': self._proto_relative_url(
-                api_res.get('image', api_res.get('thumbnail'))),
+            'thumbnails': thumbnails,
              'ext': 'mp3',
              'vcodec': 'none',
          }
diff --git a/youtube_dl/extractor/revision3.py b/youtube_dl/extractor/revision3.py

index 99979ebe1a9fe82099076b46b576ef38a58bca8c..833d8a2f0d3813014224e39a8d2d41fb0e51d515 100644 (file)
--- a/youtube_dl/extractor/revision3.py
+++ b/youtube_dl/extractor/revision3.py
@@ -13,8 +13,64 @@ from ..utils import (
  )
  
  
+class Revision3EmbedIE(InfoExtractor):
+    IE_NAME = 'revision3:embed'
+    _VALID_URL = r'(?:revision3:(?:(?P<playlist_type>[^:]+):)?|https?://(?:(?:(?:www|embed)\.)?(?:revision3|animalist)|(?:(?:api|embed)\.)?seekernetwork)\.com/player/embed\?videoId=)(?P<playlist_id>\d+)'
+    _TEST = {
+        'url': 'http://api.seekernetwork.com/player/embed?videoId=67558',
+        'md5': '83bcd157cab89ad7318dd7b8c9cf1306',
+        'info_dict': {
+            'id': '67558',
+            'ext': 'mp4',
+            'title': 'The Pros & Cons Of Zoos',
+            'description': 'Zoos are often depicted as a terrible place for animals to live, but is there any truth to this?',
+            'uploader_id': 'dnews',
+            'uploader': 'DNews',
+        }
+    }
+    _API_KEY = 'ba9c741bce1b9d8e3defcc22193f3651b8867e62'
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        playlist_id = mobj.group('playlist_id')
+        playlist_type = mobj.group('playlist_type') or 'video_id'
+        video_data = self._download_json(
+            'http://revision3.com/api/getPlaylist.json', playlist_id, query={
+                'api_key': self._API_KEY,
+                'codecs': 'h264,vp8,theora',
+                playlist_type: playlist_id,
+            })['items'][0]
+
+        formats = []
+        for vcodec, media in video_data['media'].items():
+            for quality_id, quality in media.items():
+                if quality_id == 'hls':
+                    formats.extend(self._extract_m3u8_formats(
+                        quality['url'], playlist_id, 'mp4',
+                        'm3u8_native', m3u8_id='hls', fatal=False))
+                else:
+                    formats.append({
+                        'url': quality['url'],
+                        'format_id': '%s-%s' % (vcodec, quality_id),
+                        'tbr': int_or_none(quality.get('bitrate')),
+                        'vcodec': vcodec,
+                    })
+        self._sort_formats(formats)
+
+        return {
+            'id': playlist_id,
+            'title': unescapeHTML(video_data['title']),
+            'description': unescapeHTML(video_data.get('summary')),
+            'uploader': video_data.get('show', {}).get('name'),
+            'uploader_id': video_data.get('show', {}).get('slug'),
+            'duration': int_or_none(video_data.get('duration')),
+            'formats': formats,
+        }
+
+
  class Revision3IE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:revision3|testtube|animalist)\.com)/(?P<id>[^/]+(?:/[^/?#]+)?)'
+    IE_NAME = 'revision'
+    _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:revision3|animalist)\.com)/(?P<id>[^/]+(?:/[^/?#]+)?)'
      _TESTS = [{
          'url': 'http://www.revision3.com/technobuffalo/5-google-predictions-for-2016',
          'md5': 'd94a72d85d0a829766de4deb8daaf7df',
@@ -32,52 +88,14 @@ class Revision3IE(InfoExtractor):
          }
      }, {
          # Show
-        'url': 'http://testtube.com/brainstuff',
-        'info_dict': {
-            'id': '251',
-            'title': 'BrainStuff',
-            'description': 'Whether the topic is popcorn or particle physics, you can count on the HowStuffWorks team to explore-and explain-the everyday science in the world around us on BrainStuff.',
-        },
-        'playlist_mincount': 93,
-    }, {
-        'url': 'https://testtube.com/dnews/5-weird-ways-plants-can-eat-animals?utm_source=FB&utm_medium=DNews&utm_campaign=DNewsSocial',
-        'info_dict': {
-            'id': '58227',
-            'display_id': 'dnews/5-weird-ways-plants-can-eat-animals',
-            'duration': 275,
-            'ext': 'webm',
-            'title': '5 Weird Ways Plants Can Eat Animals',
-            'description': 'Why have some plants evolved to eat meat?',
-            'upload_date': '20150120',
-            'timestamp': 1421763300,
-            'uploader': 'DNews',
-            'uploader_id': 'dnews',
-        },
-    }, {
-        'url': 'http://testtube.com/tt-editors-picks/the-israel-palestine-conflict-explained-in-ten-min',
-        'info_dict': {
-            'id': '71618',
-            'ext': 'mp4',
-            'display_id': 'tt-editors-picks/the-israel-palestine-conflict-explained-in-ten-min',
-            'title': 'The Israel-Palestine Conflict Explained in Ten Minutes',
-            'description': 'If you\'d like to learn about the struggle between Israelis and Palestinians, this video is a great place to start',
-            'uploader': 'Editors\' Picks',
-            'uploader_id': 'tt-editors-picks',
-            'timestamp': 1453309200,
-            'upload_date': '20160120',
-        },
-        'add_ie': ['Youtube'],
+        'url': 'http://revision3.com/variant',
+        'only_matching': True,
      }, {
          # Tag
-        'url': 'http://testtube.com/tech-news',
-        'info_dict': {
-            'id': '21018',
-            'title': 'tech news',
-        },
-        'playlist_mincount': 9,
+        'url': 'http://revision3.com/vr',
+        'only_matching': True,
      }]
      _PAGE_DATA_TEMPLATE = 'http://www.%s/apiProxy/ddn/%s?domain=%s'
-    _API_KEY = 'ba9c741bce1b9d8e3defcc22193f3651b8867e62'
  
      def _real_extract(self, url):
          domain, display_id = re.match(self._VALID_URL, url).groups()
@@ -119,33 +137,9 @@ class Revision3IE(InfoExtractor):
                  })
                  return info
  
-            video_data = self._download_json(
-                'http://revision3.com/api/getPlaylist.json?api_key=%s&codecs=h264,vp8,theora&video_id=%s' % (self._API_KEY, video_id),
-                video_id)['items'][0]
-
-            formats = []
-            for vcodec, media in video_data['media'].items():
-                for quality_id, quality in media.items():
-                    if quality_id == 'hls':
-                        formats.extend(self._extract_m3u8_formats(
-                            quality['url'], video_id, 'mp4',
-                            'm3u8_native', m3u8_id='hls', fatal=False))
-                    else:
-                        formats.append({
-                            'url': quality['url'],
-                            'format_id': '%s-%s' % (vcodec, quality_id),
-                            'tbr': int_or_none(quality.get('bitrate')),
-                            'vcodec': vcodec,
-                        })
-            self._sort_formats(formats)
-
              info.update({
-                'title': unescapeHTML(video_data['title']),
-                'description': unescapeHTML(video_data.get('summary')),
-                'uploader': video_data.get('show', {}).get('name'),
-                'uploader_id': video_data.get('show', {}).get('slug'),
-                'duration': int_or_none(video_data.get('duration')),
-                'formats': formats,
+                '_type': 'url_transparent',
+                'url': 'revision3:%s' % video_id,
              })
              return info
          else:
diff --git a/youtube_dl/extractor/rmcdecouverte.py b/youtube_dl/extractor/rmcdecouverte.py

new file mode 100644 (file)

index 0000000..2340dae
--- /dev/null
+++ b/youtube_dl/extractor/rmcdecouverte.py
@@ -0,0 +1,39 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .brightcove import BrightcoveLegacyIE
+from ..compat import (
+    compat_parse_qs,
+    compat_urlparse,
+)
+
+
+class RMCDecouverteIE(InfoExtractor):
+    _VALID_URL = r'https?://rmcdecouverte\.bfmtv\.com/mediaplayer-replay.*?\bid=(?P<id>\d+)'
+
+    _TEST = {
+        'url': 'http://rmcdecouverte.bfmtv.com/mediaplayer-replay/?id=1430&title=LES%20HEROS%20DU%2088e%20ETAGE',
+        'info_dict': {
+            'id': '5111223049001',
+            'ext': 'mp4',
+            'title': ': LES HEROS DU 88e ETAGE',
+            'description': 'Découvrez comment la bravoure de deux hommes dans la Tour Nord du World Trade Center a sauvé  la vie d\'innombrables personnes le 11 septembre 2001.',
+            'uploader_id': '1969646226001',
+            'upload_date': '20160904',
+            'timestamp': 1472951103,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+        'skip': 'Only works from France',
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1969646226001/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
+        brightcove_id = compat_parse_qs(compat_urlparse.urlparse(brightcove_legacy_url).query)['@videoPlayer'][0]
+        return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/rockstargames.py b/youtube_dl/extractor/rockstargames.py

new file mode 100644 (file)

index 0000000..48128e2
--- /dev/null
+++ b/youtube_dl/extractor/rockstargames.py
@@ -0,0 +1,69 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+)
+
+
+class RockstarGamesIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?rockstargames\.com/videos(?:/video/|#?/?\?.*\bvideo=)(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.rockstargames.com/videos/video/11544/',
+        'md5': '03b5caa6e357a4bd50e3143fc03e5733',
+        'info_dict': {
+            'id': '11544',
+            'ext': 'mp4',
+            'title': 'Further Adventures in Finance and Felony Trailer',
+            'description': 'md5:6d31f55f30cb101b5476c4a379e324a3',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'timestamp': 1464876000,
+            'upload_date': '20160602',
+        }
+    }, {
+        'url': 'http://www.rockstargames.com/videos#/?video=48',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'https://www.rockstargames.com/videoplayer/videos/get-video.json',
+            video_id, query={
+                'id': video_id,
+                'locale': 'en_us',
+            })['video']
+
+        title = video['title']
+
+        formats = []
+        for video in video['files_processed']['video/mp4']:
+            if not video.get('src'):
+                continue
+            resolution = video.get('resolution')
+            height = int_or_none(self._search_regex(
+                r'^(\d+)[pP]$', resolution or '', 'height', default=None))
+            formats.append({
+                'url': self._proto_relative_url(video['src']),
+                'format_id': resolution,
+                'height': height,
+            })
+
+        if not formats:
+            youtube_id = video.get('youtube_id')
+            if youtube_id:
+                return self.url_result(youtube_id, 'Youtube')
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video.get('description'),
+            'thumbnail': self._proto_relative_url(video.get('screencap')),
+            'timestamp': parse_iso8601(video.get('created')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/roosterteeth.py b/youtube_dl/extractor/roosterteeth.py

new file mode 100644 (file)

index 0000000..f5b2f56
--- /dev/null
+++ b/youtube_dl/extractor/roosterteeth.py
@@ -0,0 +1,148 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    strip_or_none,
+    unescapeHTML,
+    urlencode_postdata,
+)
+
+
+class RoosterTeethIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/episode/(?P<id>[^/?#&]+)'
+    _LOGIN_URL = 'https://roosterteeth.com/login'
+    _NETRC_MACHINE = 'roosterteeth'
+    _TESTS = [{
+        'url': 'http://roosterteeth.com/episode/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
+        'md5': 'e2bd7764732d785ef797700a2489f212',
+        'info_dict': {
+            'id': '26576',
+            'display_id': 'million-dollars-but-season-2-million-dollars-but-the-game-announcement',
+            'ext': 'mp4',
+            'title': 'Million Dollars, But...: Million Dollars, But... The Game Announcement',
+            'description': 'md5:0cc3b21986d54ed815f5faeccd9a9ca5',
+            'thumbnail': 're:^https?://.*\.png$',
+            'series': 'Million Dollars, But...',
+            'episode': 'Million Dollars, But... The Game Announcement',
+            'comment_count': int,
+        },
+    }, {
+        'url': 'http://achievementhunter.roosterteeth.com/episode/off-topic-the-achievement-hunter-podcast-2016-i-didn-t-think-it-would-pass-31',
+        'only_matching': True,
+    }, {
+        'url': 'http://funhaus.roosterteeth.com/episode/funhaus-shorts-2016-austin-sucks-funhaus-shorts',
+        'only_matching': True,
+    }, {
+        'url': 'http://screwattack.roosterteeth.com/episode/death-battle-season-3-mewtwo-vs-shadow',
+        'only_matching': True,
+    }, {
+        'url': 'http://theknow.roosterteeth.com/episode/the-know-game-news-season-1-boring-steam-sales-are-better',
+        'only_matching': True,
+    }, {
+        # only available for FIRST members
+        'url': 'http://roosterteeth.com/episode/rt-docs-the-world-s-greatest-head-massage-the-world-s-greatest-head-massage-an-asmr-journey-part-one',
+        'only_matching': True,
+    }]
+
+    def _login(self):
+        (username, password) = self._get_login_info()
+        if username is None:
+            return
+
+        login_page = self._download_webpage(
+            self._LOGIN_URL, None,
+            note='Downloading login page',
+            errnote='Unable to download login page')
+
+        login_form = self._hidden_inputs(login_page)
+
+        login_form.update({
+            'username': username,
+            'password': password,
+        })
+
+        login_request = self._download_webpage(
+            self._LOGIN_URL, None,
+            note='Logging in as %s' % username,
+            data=urlencode_postdata(login_form),
+            headers={
+                'Referer': self._LOGIN_URL,
+            })
+
+        if not any(re.search(p, login_request) for p in (
+                r'href=["\']https?://(?:www\.)?roosterteeth\.com/logout"',
+                r'>Sign Out<')):
+            error = self._html_search_regex(
+                r'(?s)<div[^>]+class=(["\']).*?\balert-danger\b.*?\1[^>]*>(?:\s*<button[^>]*>.*?</button>)?(?P<error>.+?)</div>',
+                login_request, 'alert', default=None, group='error')
+            if error:
+                raise ExtractorError('Unable to login: %s' % error, expected=True)
+            raise ExtractorError('Unable to log in')
+
+    def _real_initialize(self):
+        self._login()
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        episode = strip_or_none(unescapeHTML(self._search_regex(
+            (r'videoTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
+             r'<title>(?P<title>[^<]+)</title>'), webpage, 'title',
+            default=None, group='title')))
+
+        title = strip_or_none(self._og_search_title(
+            webpage, default=None)) or episode
+
+        m3u8_url = self._search_regex(
+            r'file\s*:\s*(["\'])(?P<url>http.+?\.m3u8.*?)\1',
+            webpage, 'm3u8 url', default=None, group='url')
+
+        if not m3u8_url:
+            if re.search(r'<div[^>]+class=["\']non-sponsor', webpage):
+                self.raise_login_required(
+                    '%s is only available for FIRST members' % display_id)
+
+            if re.search(r'<div[^>]+class=["\']golive-gate', webpage):
+                self.raise_login_required('%s is not available yet' % display_id)
+
+            raise ExtractorError('Unable to extract m3u8 URL')
+
+        formats = self._extract_m3u8_formats(
+            m3u8_url, display_id, ext='mp4',
+            entry_protocol='m3u8_native', m3u8_id='hls')
+        self._sort_formats(formats)
+
+        description = strip_or_none(self._og_search_description(webpage))
+        thumbnail = self._proto_relative_url(self._og_search_thumbnail(webpage))
+
+        series = self._search_regex(
+            (r'<h2>More ([^<]+)</h2>', r'<a[^>]+>See All ([^<]+) Videos<'),
+            webpage, 'series', fatal=False)
+
+        comment_count = int_or_none(self._search_regex(
+            r'>Comments \((\d+)\)<', webpage,
+            'comment count', fatal=False))
+
+        video_id = self._search_regex(
+            (r'containerId\s*=\s*["\']episode-(\d+)\1',
+             r'<div[^<]+id=["\']episode-(\d+)'), webpage,
+            'video id', default=display_id)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'series': series,
+            'episode': episode,
+            'comment_count': comment_count,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/rottentomatoes.py b/youtube_dl/extractor/rottentomatoes.py

index e8bb20a0803700937875355d2f854d1de88cea1a..1d404d20aa8b2223c68cada46e4bfe87613eb6ae 100644 (file)
--- a/youtube_dl/extractor/rottentomatoes.py
+++ b/youtube_dl/extractor/rottentomatoes.py
@@ -1,19 +1,32 @@
  from __future__ import unicode_literals
  
-from .videodetective import VideoDetectiveIE
+from .common import InfoExtractor
+from .internetvideoarchive import InternetVideoArchiveIE
  
  
-# It just uses the same method as videodetective.com,
-# the internetvideoarchive.com is extracted from the og:video property
-class RottenTomatoesIE(VideoDetectiveIE):
-    _VALID_URL = r'https?://www\.rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
+class RottenTomatoesIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
  
      _TEST = {
          'url': 'http://www.rottentomatoes.com/m/toy_story_3/trailers/11028566/',
          'info_dict': {
-            'id': '613340',
+            'id': '11028566',
              'ext': 'mp4',
-            'title': 'TOY STORY 3',
+            'title': 'Toy Story 3',
              'description': 'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
+            'thumbnail': 're:^https?://.*\.jpg$',
          },
      }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        iva_id = self._search_regex(r'publishedid=(\d+)', webpage, 'internet video archive id')
+
+        return {
+            '_type': 'url_transparent',
+            'url': 'http://video.internetvideoarchive.net/player/6/configuration.ashx?domain=www.videodetective.com&customerid=69249&playerid=641&publishedid=' + iva_id,
+            'ie_key': InternetVideoArchiveIE.ie_key(),
+            'id': video_id,
+            'title': self._og_search_title(webpage),
+        }
diff --git a/youtube_dl/extractor/roxwel.py b/youtube_dl/extractor/roxwel.py

index 41638c1d01e2e76398d60ae5ef869d93845a59bc..65284643b4de287c7e77b9e2b571822c5a020606 100644 (file)
--- a/youtube_dl/extractor/roxwel.py
+++ b/youtube_dl/extractor/roxwel.py
@@ -7,7 +7,7 @@ from ..utils import unified_strdate, determine_ext
  
  
  class RoxwelIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.roxwel\.com/player/(?P<filename>.+?)(\.|\?|$)'
+    _VALID_URL = r'https?://(?:www\.)?roxwel\.com/player/(?P<filename>.+?)(\.|\?|$)'
  
      _TEST = {
          'url': 'http://www.roxwel.com/player/passionpittakeawalklive.html',
diff --git a/youtube_dl/extractor/rozhlas.py b/youtube_dl/extractor/rozhlas.py

new file mode 100644 (file)

index 0000000..f8eda8d
--- /dev/null
+++ b/youtube_dl/extractor/rozhlas.py
@@ -0,0 +1,50 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    remove_start,
+)
+
+
+class RozhlasIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?prehravac\.rozhlas\.cz/audio/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://prehravac.rozhlas.cz/audio/3421320',
+        'md5': '504c902dbc9e9a1fd50326eccf02a7e2',
+        'info_dict': {
+            'id': '3421320',
+            'ext': 'mp3',
+            'title': 'Echo Pavla Klusáka (30.06.2015 21:00)',
+            'description': 'Osmdesátiny Terryho Rileyho jsou skvělou příležitostí proletět se elektronickými i akustickými díly zakladatatele minimalismu, který je aktivní už přes padesát let'
+        }
+    }, {
+        'url': 'http://prehravac.rozhlas.cz/audio/3421320/embed',
+        'skip_download': True,
+    }]
+
+    def _real_extract(self, url):
+        audio_id = self._match_id(url)
+
+        webpage = self._download_webpage(
+            'http://prehravac.rozhlas.cz/audio/%s' % audio_id, audio_id)
+
+        title = self._html_search_regex(
+            r'<h3>(.+?)</h3>\s*<p[^>]*>.*?</p>\s*<div[^>]+id=["\']player-track',
+            webpage, 'title', default=None) or remove_start(
+            self._og_search_title(webpage), 'Radio Wave - ')
+        description = self._html_search_regex(
+            r'<p[^>]+title=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>.*?</p>\s*<div[^>]+id=["\']player-track',
+            webpage, 'description', fatal=False, group='url')
+        duration = int_or_none(self._search_regex(
+            r'data-duration=["\'](\d+)', webpage, 'duration', default=None))
+
+        return {
+            'id': audio_id,
+            'url': 'http://media.rozhlas.cz/_audio/%s.mp3' % audio_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'vcodec': 'none',
+        }
diff --git a/youtube_dl/extractor/rtbf.py b/youtube_dl/extractor/rtbf.py

index e42b319a3e224aa6b078cad7756e5c44b7f620d8..28cc5522d89083cec2ad7631d51fb0aa0798ccbd 100644 (file)
--- a/youtube_dl/extractor/rtbf.py
+++ b/youtube_dl/extractor/rtbf.py
@@ -4,12 +4,18 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
-    unescapeHTML,
+    ExtractorError,
  )
  
  
  class RTBFIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?rtbf\.be/(?:video/[^?]+\?.*\bid=|ouftivi/(?:[^/]+/)*[^?]+\?.*\bvideoId=)(?P<id>\d+)'
+    _VALID_URL = r'''(?x)
+        https?://(?:www\.)?rtbf\.be/
+        (?:
+            video/[^?]+\?.*\bid=|
+            ouftivi/(?:[^/]+/)*[^?]+\?.*\bvideoId=|
+            auvio/[^/]+\?.*id=
+        )(?P<id>\d+)'''
      _TESTS = [{
          'url': 'https://www.rtbf.be/video/detail_les-diables-au-coeur-episode-2?id=1921274',
          'md5': '799f334ddf2c0a582ba80c44655be570',
@@ -17,7 +23,11 @@ class RTBFIE(InfoExtractor):
              'id': '1921274',
              'ext': 'mp4',
              'title': 'Les Diables au coeur (épisode 2)',
+            'description': 'Football - Diables Rouges',
              'duration': 3099,
+            'upload_date': '20140425',
+            'timestamp': 1398456336,
+            'uploader': 'rtbfsport',
          }
      }, {
          # geo restricted
@@ -26,45 +36,63 @@ class RTBFIE(InfoExtractor):
      }, {
          'url': 'http://www.rtbf.be/ouftivi/niouzz?videoId=2055858',
          'only_matching': True,
+    }, {
+        'url': 'http://www.rtbf.be/auvio/detail_jeudi-en-prime-siegfried-bracke?id=2102996',
+        'only_matching': True,
      }]
-
+    _IMAGE_HOST = 'http://ds1.ds.static.rtbf.be'
+    _PROVIDERS = {
+        'YOUTUBE': 'Youtube',
+        'DAILYMOTION': 'Dailymotion',
+        'VIMEO': 'Vimeo',
+    }
      _QUALITIES = [
-        ('mobile', 'mobile'),
-        ('web', 'SD'),
-        ('url', 'MD'),
+        ('mobile', 'SD'),
+        ('web', 'MD'),
          ('high', 'HD'),
      ]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        data = self._download_json(
+            'http://www.rtbf.be/api/media/video?method=getVideoDetail&args[]=%s' % video_id, video_id)
  
-        webpage = self._download_webpage(
-            'http://www.rtbf.be/video/embed?id=%s' % video_id, video_id)
+        error = data.get('error')
+        if error:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
  
-        data = self._parse_json(
-            unescapeHTML(self._search_regex(
-                r'data-media="([^"]+)"', webpage, 'data video')),
-            video_id)
+        data = data['data']
+
+        provider = data.get('provider')
+        if provider in self._PROVIDERS:
+            return self.url_result(data['url'], self._PROVIDERS[provider])
  
-        if data.get('provider').lower() == 'youtube':
-            video_url = data.get('downloadUrl') or data.get('url')
-            return self.url_result(video_url, 'Youtube')
          formats = []
          for key, format_id in self._QUALITIES:
-            format_url = data['sources'].get(key)
+            format_url = data.get(key + 'Url')
              if format_url:
                  formats.append({
                      'format_id': format_id,
                      'url': format_url,
                  })
  
+        thumbnails = []
+        for thumbnail_id, thumbnail_url in data.get('thumbnail', {}).items():
+            if thumbnail_id != 'default':
+                thumbnails.append({
+                    'url': self._IMAGE_HOST + thumbnail_url,
+                    'id': thumbnail_id,
+                })
+
          return {
              'id': video_id,
              'formats': formats,
              'title': data['title'],
              'description': data.get('description') or data.get('subtitle'),
-            'thumbnail': data.get('thumbnail'),
+            'thumbnails': thumbnails,
              'duration': data.get('duration') or data.get('realDuration'),
              'timestamp': int_or_none(data.get('created')),
              'view_count': int_or_none(data.get('viewCount')),
+            'uploader': data.get('channel'),
+            'tags': data.get('tags'),
          }
diff --git a/youtube_dl/extractor/rte.py b/youtube_dl/extractor/rte.py

index 9c89974e7bf4f21d5c7a6f5f6eeacb500e761a28..ebe563ebb89e86e28a6bf55669cd066aca44d851 100644 (file)
--- a/youtube_dl/extractor/rte.py
+++ b/youtube_dl/extractor/rte.py
@@ -39,9 +39,14 @@ class RteIE(InfoExtractor):
          duration = float_or_none(self._html_search_meta(
              'duration', webpage, 'duration', fatal=False), 1000)
  
-        thumbnail_id = self._search_regex(
-            r'<meta name="thumbnail" content="uri:irus:(.*?)" />', webpage, 'thumbnail')
-        thumbnail = 'http://img.rasset.ie/' + thumbnail_id + '.jpg'
+        thumbnail = None
+        thumbnail_meta = self._html_search_meta('thumbnail', webpage)
+        if thumbnail_meta:
+            thumbnail_id = self._search_regex(
+                r'uri:irus:(.+)', thumbnail_meta,
+                'thumbnail id', fatal=False)
+            if thumbnail_id:
+                thumbnail = 'http://img.rasset.ie/%s.jpg' % thumbnail_id
  
          feeds_url = self._html_search_meta('feeds-prefix', webpage, 'feeds url') + video_id
          json_string = self._download_json(feeds_url, video_id)
diff --git a/youtube_dl/extractor/rtl2.py b/youtube_dl/extractor/rtl2.py

index de004671d564eb455e45361666fd304f8ca040a6..cb4ee88033ba1d761faac452de724a6c44f08503 100644 (file)
--- a/youtube_dl/extractor/rtl2.py
+++ b/youtube_dl/extractor/rtl2.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/rtlnl.py b/youtube_dl/extractor/rtlnl.py

index 543d94417f6d52f9cb5f4bd4508d5bcf3d4b984e..f0250af8a4c0e06e2642c9cced9f6b330dc1d4f6 100644 (file)
--- a/youtube_dl/extractor/rtlnl.py
+++ b/youtube_dl/extractor/rtlnl.py
@@ -14,24 +14,25 @@ class RtlNlIE(InfoExtractor):
      _VALID_URL = r'''(?x)
          https?://(?:www\.)?
          (?:
-            rtlxl\.nl/\#!/[^/]+/|
+            rtlxl\.nl/[^\#]*\#!/[^/]+/|
              rtl\.nl/system/videoplayer/(?:[^/]+/)+(?:video_)?embed\.html\b.+?\buuid=
          )
          (?P<id>[0-9a-f-]+)'''
  
      _TESTS = [{
-        'url': 'http://www.rtlxl.nl/#!/rtl-nieuws-132237/6e4203a6-0a5e-3596-8424-c599a59e0677',
-        'md5': 'cc16baa36a6c169391f0764fa6b16654',
+        'url': 'http://www.rtlxl.nl/#!/rtl-nieuws-132237/82b1aad1-4a14-3d7b-b554-b0aed1b2c416',
+        'md5': '473d1946c1fdd050b2c0161a4b13c373',
          'info_dict': {
-            'id': '6e4203a6-0a5e-3596-8424-c599a59e0677',
+            'id': '82b1aad1-4a14-3d7b-b554-b0aed1b2c416',
              'ext': 'mp4',
-            'title': 'RTL Nieuws - Laat',
-            'description': 'md5:6b61f66510c8889923b11f2778c72dc5',
-            'timestamp': 1408051800,
-            'upload_date': '20140814',
-            'duration': 576.880,
+            'title': 'RTL Nieuws',
+            'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
+            'timestamp': 1461951000,
+            'upload_date': '20160429',
+            'duration': 1167.96,
          },
      }, {
+        # best format avaialble a3t
          'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
          'md5': 'dea7474214af1271d91ef332fb8be7ea',
          'info_dict': {
@@ -39,18 +40,19 @@ class RtlNlIE(InfoExtractor):
              'ext': 'mp4',
              'timestamp': 1424039400,
              'title': 'RTL Nieuws - Nieuwe beelden Kopenhagen: chaos direct na aanslag',
-            'thumbnail': 're:^https?://screenshots\.rtl\.nl/system/thumb/sz=[0-9]+x[0-9]+/uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed$',
+            'thumbnail': 're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed$',
              'upload_date': '20150215',
              'description': 'Er zijn nieuwe beelden vrijgegeven die vlak na de aanslag in Kopenhagen zijn gemaakt. Op de video is goed te zien hoe omstanders zich bekommeren om één van de slachtoffers, terwijl de eerste agenten ter plaatse komen.',
          }
      }, {
          # empty synopsis and missing episodes (see https://github.com/rg3/youtube-dl/issues/6275)
+        # best format available nettv
          'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a/autoplay=false',
          'info_dict': {
              'id': 'f536aac0-1dc3-4314-920e-3bd1c5b3811a',
              'ext': 'mp4',
              'title': 'RTL Nieuws - Meer beelden van overval juwelier',
-            'thumbnail': 're:^https?://screenshots\.rtl\.nl/system/thumb/sz=[0-9]+x[0-9]+/uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a$',
+            'thumbnail': 're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a$',
              'timestamp': 1437233400,
              'upload_date': '20150718',
              'duration': 30.474,
@@ -65,6 +67,9 @@ class RtlNlIE(InfoExtractor):
      }, {
          'url': 'http://www.rtl.nl/system/videoplayer/derden/embed.html#!/uuid=bb0353b0-d6a4-1dad-90e9-18fe75b8d1f0',
          'only_matching': True,
+    }, {
+        'url': 'http://rtlxl.nl/?_ga=1.204735956.572365465.1466978370#!/rtl-nieuws-132237/3c487912-023b-49ac-903e-2c5d79f8410f',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -94,22 +99,46 @@ class RtlNlIE(InfoExtractor):
          videopath = material['videopath']
          m3u8_url = meta.get('videohost', 'http://manifest.us.rtl.nl') + videopath
  
-        formats = self._extract_m3u8_formats(m3u8_url, uuid, ext='mp4')
+        formats = self._extract_m3u8_formats(
+            m3u8_url, uuid, 'mp4', m3u8_id='hls', fatal=False)
  
          video_urlpart = videopath.split('/adaptive/')[1][:-5]
          PG_URL_TEMPLATE = 'http://pg.us.rtl.nl/rtlxl/network/%s/progressive/%s.mp4'
  
-        formats.extend([
-            {
-                'url': PG_URL_TEMPLATE % ('a2m', video_urlpart),
-                'format_id': 'pg-sd',
-            },
-            {
-                'url': PG_URL_TEMPLATE % ('a3m', video_urlpart),
-                'format_id': 'pg-hd',
-                'quality': 0,
+        PG_FORMATS = (
+            ('a2t', 512, 288),
+            ('a3t', 704, 400),
+            ('nettv', 1280, 720),
+        )
+
+        def pg_format(format_id, width, height):
+            return {
+                'url': PG_URL_TEMPLATE % (format_id, video_urlpart),
+                'format_id': 'pg-%s' % format_id,
+                'protocol': 'http',
+                'width': width,
+                'height': height,
              }
-        ])
+
+        if not formats:
+            formats = [pg_format(*pg_tuple) for pg_tuple in PG_FORMATS]
+        else:
+            pg_formats = []
+            for format_id, width, height in PG_FORMATS:
+                try:
+                    # Find hls format with the same width and height corresponding
+                    # to progressive format and copy metadata from it.
+                    f = next(f for f in formats if f.get('height') == height)
+                    # hls formats may have invalid width
+                    f['width'] = width
+                    f_copy = f.copy()
+                    f_copy.update(pg_format(format_id, width, height))
+                    pg_formats.append(f_copy)
+                except StopIteration:
+                    # Missing hls format does mean that no progressive format with
+                    # such width and height exists either.
+                    pass
+            formats.extend(pg_formats)
          self._sort_formats(formats)
  
          thumbnails = []
diff --git a/youtube_dl/extractor/rtve.py b/youtube_dl/extractor/rtve.py

index 79af477158630503078d86b117f960a36f5f1f73..6a43b036e924470055aea3910d1c5ea807483fdb 100644 (file)
--- a/youtube_dl/extractor/rtve.py
+++ b/youtube_dl/extractor/rtve.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import base64
@@ -6,6 +6,9 @@ import re
  import time
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_struct_unpack,
+)
  from ..utils import (
      ExtractorError,
      float_or_none,
@@ -13,7 +16,6 @@ from ..utils import (
      remove_start,
      sanitized_Request,
      std_headers,
-    struct_unpack,
  )
  
  
@@ -21,7 +23,7 @@ def _decrypt_url(png):
      encrypted_data = base64.b64decode(png.encode('utf-8'))
      text_index = encrypted_data.find(b'tEXt')
      text_chunk = encrypted_data[text_index - 4:]
-    length = struct_unpack('!I', text_chunk[:4])[0]
+    length = compat_struct_unpack('!I', text_chunk[:4])[0]
      # Use bytearray to get integers when iterating in both python 2.x and 3.x
      data = bytearray(text_chunk[8:8 + length])
      data = [chr(b) for b in data if b != 0]
@@ -62,7 +64,7 @@ def _decrypt_url(png):
  class RTVEALaCartaIE(InfoExtractor):
      IE_NAME = 'rtve.es:alacarta'
      IE_DESC = 'RTVE a la carta'
-    _VALID_URL = r'https?://www\.rtve\.es/(m/)?alacarta/videos/[^/]+/[^/]+/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?rtve\.es/(m/)?(alacarta/videos|filmoteca)/[^/]+/[^/]+/(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/',
@@ -85,6 +87,9 @@ class RTVEALaCartaIE(InfoExtractor):
      }, {
          'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve',
          'only_matching': True,
+    }, {
+        'url': 'http://www.rtve.es/filmoteca/no-do/not-1-introduccion-primer-noticiario-espanol/1465256/',
+        'only_matching': True,
      }]
  
      def _real_initialize(self):
@@ -108,9 +113,9 @@ class RTVEALaCartaIE(InfoExtractor):
          png = self._download_webpage(png_request, video_id, 'Downloading url information')
          video_url = _decrypt_url(png)
          if not video_url.endswith('.f4m'):
-            video_url = video_url.replace(
-                'resources/', 'auth/resources/'
-            ).replace('.net.rtve', '.multimedia.cdn.rtve')
+            if '?' not in video_url:
+                video_url = video_url.replace('resources/', 'auth/resources/')
+            video_url = video_url.replace('.net.rtve', '.multimedia.cdn.rtve')
  
          subtitles = None
          if info.get('sbtFile') is not None:
@@ -179,7 +184,7 @@ class RTVEInfantilIE(InfoExtractor):
  class RTVELiveIE(InfoExtractor):
      IE_NAME = 'rtve.es:live'
      IE_DESC = 'RTVE.es live streams'
-    _VALID_URL = r'https?://www\.rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
+    _VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
  
      _TESTS = [{
          'url': 'http://www.rtve.es/directo/la-1/',
@@ -217,3 +222,34 @@ class RTVELiveIE(InfoExtractor):
              'formats': formats,
              'is_live': True,
          }
+
+
+class RTVETelevisionIE(InfoExtractor):
+    IE_NAME = 'rtve.es:television'
+    _VALID_URL = r'https?://(?:www\.)?rtve\.es/television/[^/]+/[^/]+/(?P<id>\d+).shtml'
+
+    _TEST = {
+        'url': 'http://www.rtve.es/television/20160628/revolucion-del-movil/1364141.shtml',
+        'info_dict': {
+            'id': '3069778',
+            'ext': 'mp4',
+            'title': 'Documentos TV - La revolución del móvil',
+            'duration': 3496.948,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        page_id = self._match_id(url)
+        webpage = self._download_webpage(url, page_id)
+
+        alacarta_url = self._search_regex(
+            r'data-location="alacarta_videos"[^<]+url&quot;:&quot;(http://www\.rtve\.es/alacarta.+?)&',
+            webpage, 'alacarta url', default=None)
+        if alacarta_url is None:
+            raise ExtractorError(
+                'The webpage doesn\'t contain any video', expected=True)
+
+        return self.url_result(alacarta_url, ie=RTVEALaCartaIE.ie_key())
diff --git a/youtube_dl/extractor/rtvnh.py b/youtube_dl/extractor/rtvnh.py

index 4896d09d666e687010ae3cb6ebe0e2bfaec537d6..f6454c6b0082ed431fa74de49dd5881d3b0b7a0f 100644 (file)
--- a/youtube_dl/extractor/rtvnh.py
+++ b/youtube_dl/extractor/rtvnh.py
@@ -9,7 +9,7 @@ class RTVNHIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?rtvnh\.nl/video/(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://www.rtvnh.nl/video/131946',
-        'md5': '6e1d0ab079e2a00b6161442d3ceacfc1',
+        'md5': 'cdbec9f44550763c8afc96050fa747dc',
          'info_dict': {
              'id': '131946',
              'ext': 'mp4',
@@ -29,15 +29,29 @@ class RTVNHIE(InfoExtractor):
              raise ExtractorError(
                  '%s returned error code %d' % (self.IE_NAME, status), expected=True)
  
-        formats = self._extract_smil_formats(
-            'http://www.rtvnh.nl/video/smil?m=' + video_id, video_id, fatal=False)
-
-        for item in meta['source']['fb']:
-            if item.get('type') == 'hls':
-                formats.extend(self._extract_m3u8_formats(
-                    item['file'], video_id, ext='mp4', entry_protocol='m3u8_native'))
-            elif item.get('type') == '':
-                formats.append({'url': item['file']})
+        formats = []
+        rtmp_formats = self._extract_smil_formats(
+            'http://www.rtvnh.nl/video/smil?m=' + video_id, video_id)
+        formats.extend(rtmp_formats)
+
+        for rtmp_format in rtmp_formats:
+            rtmp_url = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
+            rtsp_format = rtmp_format.copy()
+            del rtsp_format['play_path']
+            del rtsp_format['ext']
+            rtsp_format.update({
+                'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
+                'url': rtmp_url.replace('rtmp://', 'rtsp://'),
+                'protocol': 'rtsp',
+            })
+            formats.append(rtsp_format)
+            http_base_url = rtmp_url.replace('rtmp://', 'http://')
+            formats.extend(self._extract_m3u8_formats(
+                http_base_url + '/playlist.m3u8', video_id, 'mp4',
+                'm3u8_native', m3u8_id='hls', fatal=False))
+            formats.extend(self._extract_f4m_formats(
+                http_base_url + '/manifest.f4m',
+                video_id, f4m_id='hds', fatal=False))
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/rudo.py b/youtube_dl/extractor/rudo.py

new file mode 100644 (file)

index 0000000..9a330c1
--- /dev/null
+++ b/youtube_dl/extractor/rudo.py
@@ -0,0 +1,53 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .jwplatform import JWPlatformBaseIE
+from ..utils import (
+    js_to_json,
+    get_element_by_class,
+    unified_strdate,
+)
+
+
+class RudoIE(JWPlatformBaseIE):
+    _VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
+
+    _TEST = {
+        'url': 'http://rudo.video/vod/oTzw0MGnyG',
+        'md5': '2a03a5b32dd90a04c83b6d391cf7b415',
+        'info_dict': {
+            'id': 'oTzw0MGnyG',
+            'ext': 'mp4',
+            'title': 'Comentario Tomás Mosciatti',
+            'upload_date': '20160617',
+        },
+    }
+
+    @classmethod
+    def _extract_url(self, webpage):
+        mobj = re.search(
+            '<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id, encoding='iso-8859-1')
+
+        jwplayer_data = self._parse_json(self._search_regex(
+            r'(?s)playerInstance\.setup\(({.+?})\)', webpage, 'jwplayer data'), video_id,
+            transform_source=lambda s: js_to_json(re.sub(r'encodeURI\([^)]+\)', '""', s)))
+
+        info_dict = self._parse_jwplayer_data(
+            jwplayer_data, video_id, require_title=False, m3u8_id='hls', mpd_id='dash')
+
+        info_dict.update({
+            'title': self._og_search_title(webpage),
+            'upload_date': unified_strdate(get_element_by_class('date', webpage)),
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/ruhd.py b/youtube_dl/extractor/ruhd.py

index 1f7c262993c8ce7e0d602f612fc6316e80052f66..ce631b46c30bcd2eda03c798d61bed616f41e0b4 100644 (file)
--- a/youtube_dl/extractor/ruhd.py
+++ b/youtube_dl/extractor/ruhd.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/rutube.py b/youtube_dl/extractor/rutube.py

index 9ca4ae147cb1e3c430de3abd9fd0927aaee2ed5a..fd1df925ba46bcecf87e192d2331da5e77d0b4bc 100644 (file)
--- a/youtube_dl/extractor/rutube.py
+++ b/youtube_dl/extractor/rutube.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -88,7 +88,7 @@ class RutubeIE(InfoExtractor):
  class RutubeEmbedIE(InfoExtractor):
      IE_NAME = 'rutube:embed'
      IE_DESC = 'Rutube embedded videos'
-    _VALID_URL = 'https?://rutube\.ru/(?:video|play)/embed/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://rutube\.ru/(?:video|play)/embed/(?P<id>[0-9]+)'
  
      _TESTS = [{
          'url': 'http://rutube.ru/video/embed/6722881?vk_puid37=&vk_puid38=',
diff --git a/youtube_dl/extractor/rutv.py b/youtube_dl/extractor/rutv.py

index a2379eb04c2e6744a49f315ebee2a0c9fb0170f6..a5e672c0a674e3461c261e0b6b2ca7ca9435ea30 100644 (file)
--- a/youtube_dl/extractor/rutv.py
+++ b/youtube_dl/extractor/rutv.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/ruutu.py b/youtube_dl/extractor/ruutu.py

index ffea438cc4645c267c87b54a761394e0c1eca247..2fce4e81b7f44c4c70ff5e6e775a4743032a231b 100644 (file)
--- a/youtube_dl/extractor/ruutu.py
+++ b/youtube_dl/extractor/ruutu.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  
  class RuutuIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ruutu\.fi/video/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:ruutu|supla)\.fi/(?:video|supla)/(?P<id>\d+)'
      _TESTS = [
          {
              'url': 'http://www.ruutu.fi/video/2058907',
@@ -34,12 +34,24 @@ class RuutuIE(InfoExtractor):
                  'id': '2057306',
                  'ext': 'mp4',
                  'title': 'Superpesis: katso koko kausi Ruudussa',
-                'description': 'md5:da2736052fef3b2bd5e0005e63c25eac',
+                'description': 'md5:bfb7336df2a12dc21d18fa696c9f8f23',
                  'thumbnail': 're:^https?://.*\.jpg$',
                  'duration': 40,
                  'age_limit': 0,
              },
          },
+        {
+            'url': 'http://www.supla.fi/supla/2231370',
+            'md5': 'df14e782d49a2c0df03d3be2a54ef949',
+            'info_dict': {
+                'id': '2231370',
+                'ext': 'mp4',
+                'title': 'Osa 1: Mikael Jungner',
+                'description': 'md5:7d90f358c47542e3072ff65d7b1bcffe',
+                'thumbnail': 're:^https?://.*\.jpg$',
+                'age_limit': 0,
+            },
+        },
      ]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/safari.py b/youtube_dl/extractor/safari.py

index 6ba91f202baadbfd72160cc739efde868a60d421..c3aec1edde5e9d02efb377fa39941ae01d2f04b4 100644 (file)
--- a/youtube_dl/extractor/safari.py
+++ b/youtube_dl/extractor/safari.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -75,7 +75,7 @@ class SafariBaseIE(InfoExtractor):
  class SafariIE(SafariBaseIE):
      IE_NAME = 'safari'
      IE_DESC = 'safaribooksonline.com online video'
-    _VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>part\d+)\.html'
+    _VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>[^/?#&]+)\.html'
  
      _TESTS = [{
          'url': 'https://www.safaribooksonline.com/library/view/hadoop-fundamentals-livelessons/9780133392838/part00.html',
@@ -92,6 +92,9 @@ class SafariIE(SafariBaseIE):
          # non-digits in course id
          'url': 'https://www.safaribooksonline.com/library/view/create-a-nodejs/100000006A0210/part00.html',
          'only_matching': True,
+    }, {
+        'url': 'https://www.safaribooksonline.com/library/view/learning-path-red/9780134664057/RHCE_Introduction.html',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -100,13 +103,13 @@ class SafariIE(SafariBaseIE):
  
          webpage = self._download_webpage(url, video_id)
          reference_id = self._search_regex(
-            r'data-reference-id=(["\'])(?P<id>.+?)\1',
+            r'data-reference-id=(["\'])(?P<id>(?:(?!\1).)+)\1',
              webpage, 'kaltura reference id', group='id')
          partner_id = self._search_regex(
-            r'data-partner-id=(["\'])(?P<id>.+?)\1',
+            r'data-partner-id=(["\'])(?P<id>(?:(?!\1).)+)\1',
              webpage, 'kaltura widget id', group='id')
          ui_id = self._search_regex(
-            r'data-ui-id=(["\'])(?P<id>.+?)\1',
+            r'data-ui-id=(["\'])(?P<id>(?:(?!\1).)+)\1',
              webpage, 'kaltura uiconf id', group='id')
  
          query = {
@@ -132,12 +135,15 @@ class SafariIE(SafariBaseIE):
  
  class SafariApiIE(SafariBaseIE):
      IE_NAME = 'safari:api'
-    _VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>part\d+)\.html'
+    _VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>[^/?#&]+)\.html'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html',
          'only_matching': True,
-    }
+    }, {
+        'url': 'https://www.safaribooksonline.com/api/v1/book/9780134664057/chapter/RHCE_Introduction.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -151,7 +157,14 @@ class SafariCourseIE(SafariBaseIE):
      IE_NAME = 'safari:course'
      IE_DESC = 'safaribooksonline.com online courses'
  
-    _VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/(?:library/view/[^/]+|api/v1/book)/(?P<id>[^/]+)/?(?:[#?]|$)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            (?:www\.)?safaribooksonline\.com/(?:library/view/[^/]+|api/v1/book)|
+                            techbus\.safaribooksonline\.com
+                        )
+                        /(?P<id>[^/]+)/?(?:[#?]|$)
+                    '''
  
      _TESTS = [{
          'url': 'https://www.safaribooksonline.com/library/view/hadoop-fundamentals-livelessons/9780133392838/',
@@ -164,6 +177,9 @@ class SafariCourseIE(SafariBaseIE):
      }, {
          'url': 'https://www.safaribooksonline.com/api/v1/book/9781449396459/?override_format=json',
          'only_matching': True,
+    }, {
+        'url': 'http://techbus.safaribooksonline.com/9780134426365',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/sandia.py b/youtube_dl/extractor/sandia.py

index 759898a492f43c67179409c563be42e864deae5f..96e43af849bc9e6b90bbba71a38e7155ef864268 100644 (file)
--- a/youtube_dl/extractor/sandia.py
+++ b/youtube_dl/extractor/sandia.py
@@ -1,18 +1,12 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import itertools
  import json
-import re
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
  from ..utils import (
      int_or_none,
-    js_to_json,
      mimetype2ext,
-    sanitized_Request,
-    unified_strdate,
  )
  
  
@@ -27,7 +21,8 @@ class SandiaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Xyce Software Training - Section 1',
              'description': 're:(?s)SAND Number: SAND 2013-7800.{200,}',
-            'upload_date': '20120904',
+            'upload_date': '20120409',
+            'timestamp': 1333983600,
              'duration': 7794,
          }
      }
@@ -35,81 +30,36 @@ class SandiaIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        req = sanitized_Request(url)
-        req.add_header('Cookie', 'MediasitePlayerCaps=ClientPlugins=4')
-        webpage = self._download_webpage(req, video_id)
+        presentation_data = self._download_json(
+            'http://digitalops.sandia.gov/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions',
+            video_id, data=json.dumps({
+                'getPlayerOptionsRequest': {
+                    'ResourceId': video_id,
+                    'QueryString': '',
+                }
+            }), headers={
+                'Content-Type': 'application/json; charset=utf-8',
+            })['d']['Presentation']
  
-        js_path = self._search_regex(
-            r'<script type="text/javascript" src="(/Mediasite/FileServer/Presentation/[^"]+)"',
-            webpage, 'JS code URL')
-        js_url = compat_urlparse.urljoin(url, js_path)
-
-        js_code = self._download_webpage(
-            js_url, video_id, note='Downloading player')
-
-        def extract_str(key, **args):
-            return self._search_regex(
-                r'Mediasite\.PlaybackManifest\.%s\s*=\s*(.+);\s*?\n' % re.escape(key),
-                js_code, key, **args)
-
-        def extract_data(key, **args):
-            data_json = extract_str(key, **args)
-            if data_json is None:
-                return data_json
-            return self._parse_json(
-                data_json, video_id, transform_source=js_to_json)
+        title = presentation_data['Title']
  
          formats = []
-        for i in itertools.count():
-            fd = extract_data('VideoUrls[%d]' % i, default=None)
-            if fd is None:
-                break
-            formats.append({
-                'format_id': '%s' % i,
-                'format_note': fd['MimeType'].partition('/')[2],
-                'ext': mimetype2ext(fd['MimeType']),
-                'url': fd['Location'],
-                'protocol': 'f4m' if fd['MimeType'] == 'video/x-mp4-fragmented' else None,
-            })
+        for stream in presentation_data.get('Streams', []):
+            for fd in stream.get('VideoUrls', []):
+                formats.append({
+                    'format_id': fd['MediaType'],
+                    'format_note': fd['MimeType'].partition('/')[2],
+                    'ext': mimetype2ext(fd['MimeType']),
+                    'url': fd['Location'],
+                    'protocol': 'f4m' if fd['MimeType'] == 'video/x-mp4-fragmented' else None,
+                })
          self._sort_formats(formats)
  
-        slide_baseurl = compat_urlparse.urljoin(
-            url, extract_data('SlideBaseUrl'))
-        slide_template = slide_baseurl + re.sub(
-            r'\{0:D?([0-9+])\}', r'%0\1d', extract_data('SlideImageFileNameTemplate'))
-        slides = []
-        last_slide_time = 0
-        for i in itertools.count(1):
-            sd = extract_str('Slides[%d]' % i, default=None)
-            if sd is None:
-                break
-            timestamp = int_or_none(self._search_regex(
-                r'^Mediasite\.PlaybackManifest\.CreateSlide\("[^"]*"\s*,\s*([0-9]+),',
-                sd, 'slide %s timestamp' % i, fatal=False))
-            slides.append({
-                'url': slide_template % i,
-                'duration': timestamp - last_slide_time,
-            })
-            last_slide_time = timestamp
-        formats.append({
-            'format_id': 'slides',
-            'protocol': 'slideshow',
-            'url': json.dumps(slides),
-            'preference': -10000,  # Downloader not yet written
-        })
-        self._sort_formats(formats)
-
-        title = extract_data('Title')
-        description = extract_data('Description', fatal=False)
-        duration = int_or_none(extract_data(
-            'Duration', fatal=False), scale=1000)
-        upload_date = unified_strdate(extract_data('AirDate', fatal=False))
-
          return {
              'id': video_id,
              'title': title,
-            'description': description,
+            'description': presentation_data.get('Description'),
              'formats': formats,
-            'upload_date': upload_date,
-            'duration': duration,
+            'timestamp': int_or_none(presentation_data.get('UnixTime'), 1000),
+            'duration': int_or_none(presentation_data.get('Duration'), 1000),
          }
diff --git a/youtube_dl/extractor/sapo.py b/youtube_dl/extractor/sapo.py

index 172cc12752d64ce326ba74ced1223628e5fff76a..49a9b313a87a5bf9b80fa8a3b8d78c104722cd5b 100644 (file)
--- a/youtube_dl/extractor/sapo.py
+++ b/youtube_dl/extractor/sapo.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/sbs.py b/youtube_dl/extractor/sbs.py

index 2f96477ca9f9cef7e684dfec1c3ff7fe5ac4fecf..43131fb7e5ce82d69d25bf639ce6c2bffe35182a 100644 (file)
--- a/youtube_dl/extractor/sbs.py
+++ b/youtube_dl/extractor/sbs.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -24,6 +24,9 @@ class SBSIE(InfoExtractor):
              'description': 'md5:f250a9856fca50d22dec0b5b8015f8a5',
              'thumbnail': 're:http://.*\.jpg',
              'duration': 308,
+            'timestamp': 1408613220,
+            'upload_date': '20140821',
+            'uploader': 'SBSC',
          },
      }, {
          'url': 'http://www.sbs.com.au/ondemand/video/320403011771/Dingo-Conservation-The-Feed',
@@ -57,6 +60,7 @@ class SBSIE(InfoExtractor):
  
          return {
              '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
              'id': video_id,
-            'url': smuggle_url(theplatform_url, {'force_smil_url': True}),
+            'url': smuggle_url(self._proto_relative_url(theplatform_url), {'force_smil_url': True}),
          }
diff --git a/youtube_dl/extractor/scivee.py b/youtube_dl/extractor/scivee.py

index 3bf93c870b2bc30c3baf9567a64d06171558f06b..b1ca12fdee1c012de789ebfaf15f03f04e73f768 100644 (file)
--- a/youtube_dl/extractor/scivee.py
+++ b/youtube_dl/extractor/scivee.py
@@ -18,6 +18,7 @@ class SciVeeIE(InfoExtractor):
              'title': 'Adam Arkin at the 2014 DOE JGI Genomics of Energy & Environment Meeting',
              'description': 'md5:81f1710638e11a481358fab1b11059d7',
          },
+        'skip': 'Not accessible from Travis CI server',
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/screencast.py b/youtube_dl/extractor/screencast.py

index dfd897ba3a3f0a7297164fb315e4543bb597d678..ed9de964841e52c1e5753556d6b9e53339ba23c3 100644 (file)
--- a/youtube_dl/extractor/screencast.py
+++ b/youtube_dl/extractor/screencast.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -12,7 +12,7 @@ from ..utils import (
  
  
  class ScreencastIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.screencast\.com/t/(?P<id>[a-zA-Z0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?screencast\.com/t/(?P<id>[a-zA-Z0-9]+)'
      _TESTS = [{
          'url': 'http://www.screencast.com/t/3ZEjQXlT',
          'md5': '917df1c13798a3e96211dd1561fded83',
@@ -53,8 +53,10 @@ class ScreencastIE(InfoExtractor):
              'description': 'md5:7b9f393bc92af02326a5c5889639eab0',
              'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
          }
-    },
-    ]
+    }, {
+        'url': 'http://screencast.com/t/aAB3iowa',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -94,8 +96,9 @@ class ScreencastIE(InfoExtractor):
          title = self._og_search_title(webpage, default=None)
          if title is None:
              title = self._html_search_regex(
-                [r'<b>Title:</b> ([^<]*)</div>',
-                 r'class="tabSeperator">></span><span class="tabText">(.*?)<'],
+                [r'<b>Title:</b> ([^<]+)</div>',
+                 r'class="tabSeperator">></span><span class="tabText">(.+?)<',
+                 r'<title>([^<]+)</title>'],
                  webpage, 'title')
          thumbnail = self._og_search_thumbnail(webpage)
          description = self._og_search_description(webpage, default=None)
diff --git a/youtube_dl/extractor/screencastomatic.py b/youtube_dl/extractor/screencastomatic.py

index 05337421ca4210af5a9a797f22c112bb663a0960..7a88a42cd84dbfd9f343567dffb5f462c10329b7 100644 (file)
--- a/youtube_dl/extractor/screencastomatic.py
+++ b/youtube_dl/extractor/screencastomatic.py
@@ -1,15 +1,11 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..compat import compat_urlparse
-from ..utils import (
-    ExtractorError,
-    js_to_json,
-)
+from .jwplatform import JWPlatformBaseIE
+from ..utils import js_to_json
  
  
-class ScreencastOMaticIE(InfoExtractor):
+class ScreencastOMaticIE(JWPlatformBaseIE):
      _VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
      _TEST = {
          'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',
@@ -20,6 +16,7 @@ class ScreencastOMaticIE(InfoExtractor):
              'title': 'Welcome to 3-4 Philosophy @ DECV!',
              'thumbnail': 're:^https?://.*\.jpg$',
              'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
+            'duration': 369.163,
          }
      }
  
@@ -27,23 +24,14 @@ class ScreencastOMaticIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        setup_js = self._search_regex(
-            r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);",
-            webpage, 'setup code')
-        data = self._parse_json(setup_js, video_id, transform_source=js_to_json)
-        try:
-            video_data = next(
-                m for m in data['modes'] if m.get('type') == 'html5')
-        except StopIteration:
-            raise ExtractorError('Could not find any video entries!')
-        video_url = compat_urlparse.urljoin(url, video_data['config']['file'])
-        thumbnail = data.get('image')
+        jwplayer_data = self._parse_json(
+            self._search_regex(
+                r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);", webpage, 'setup code'),
+            video_id, transform_source=js_to_json)
  
-        return {
-            'id': video_id,
+        info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False)
+        info_dict.update({
              'title': self._og_search_title(webpage),
              'description': self._og_search_description(webpage),
-            'url': video_url,
-            'ext': 'mp4',
-            'thumbnail': thumbnail,
-        }
+        })
+        return info_dict
diff --git a/youtube_dl/extractor/screenjunkies.py b/youtube_dl/extractor/screenjunkies.py

index dd0a6ba19d4ef3b9397af6d977277256cbc0e1e9..02e574cd89a79b2ad76c0eea8c76792b15f48d7b 100644 (file)
--- a/youtube_dl/extractor/screenjunkies.py
+++ b/youtube_dl/extractor/screenjunkies.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class ScreenJunkiesIE(InfoExtractor):
-    _VALID_URL = r'https?://www.screenjunkies.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
+    _VALID_URL = r'https?://(?:www\.)?screenjunkies\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
      _TESTS = [{
          'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
          'md5': '5c2b686bec3d43de42bde9ec047536b0',
diff --git a/youtube_dl/extractor/screenwavemedia.py b/youtube_dl/extractor/screenwavemedia.py

index 44b0bbee68953a199c67e420fe1928048be5f2cf..7d77e8825d7420185ed1f9f707efb3462f19cab1 100644 (file)
--- a/youtube_dl/extractor/screenwavemedia.py
+++ b/youtube_dl/extractor/screenwavemedia.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -12,7 +12,7 @@ from ..utils import (
  
  
  class ScreenwaveMediaIE(InfoExtractor):
-    _VALID_URL = r'https?://player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?.*\bid=(?P<id>[A-Za-z0-9-]+)'
+    _VALID_URL = r'(?:https?:)?//player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?.*\bid=(?P<id>[A-Za-z0-9-]+)'
      EMBED_PATTERN = r'src=(["\'])(?P<url>(?:https?:)?//player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?.*\bid=.+?)\1'
      _TESTS = [{
          'url': 'http://player.screenwavemedia.com/play/play.php?playerdiv=videoarea&companiondiv=squareAd&id=Cinemassacre-19911',
diff --git a/youtube_dl/extractor/seeker.py b/youtube_dl/extractor/seeker.py

new file mode 100644 (file)

index 0000000..3b9c65e
--- /dev/null
+++ b/youtube_dl/extractor/seeker.py
@@ -0,0 +1,57 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class SeekerIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?seeker\.com/(?P<display_id>.*)-(?P<article_id>\d+)\.html'
+    _TESTS = [{
+        # player.loadRevision3Item
+        'url': 'http://www.seeker.com/should-trump-be-required-to-release-his-tax-returns-1833805621.html',
+        'md5': '30c1dc4030cc715cf05b423d0947ac18',
+        'info_dict': {
+            'id': '76243',
+            'ext': 'webm',
+            'title': 'Should Trump Be Required To Release His Tax Returns?',
+            'description': 'Donald Trump has been secretive about his "big," "beautiful" tax returns. So what can we learn if he decides to release them?',
+            'uploader': 'Seeker Daily',
+            'uploader_id': 'seekerdaily',
+        }
+    }, {
+        'url': 'http://www.seeker.com/changes-expected-at-zoos-following-recent-gorilla-lion-shootings-1834116536.html',
+        'playlist': [
+            {
+                'md5': '83bcd157cab89ad7318dd7b8c9cf1306',
+                'info_dict': {
+                    'id': '67558',
+                    'ext': 'mp4',
+                    'title': 'The Pros & Cons Of Zoos',
+                    'description': 'Zoos are often depicted as a terrible place for animals to live, but is there any truth to this?',
+                    'uploader': 'DNews',
+                    'uploader_id': 'dnews',
+                },
+            }
+        ],
+        'info_dict': {
+            'id': '1834116536',
+            'title': 'After Gorilla Killing, Changes Ahead for Zoos',
+            'description': 'The largest association of zoos and others are hoping to learn from recent incidents that led to the shooting deaths of a gorilla and two lions.',
+        },
+    }]
+
+    def _real_extract(self, url):
+        display_id, article_id = re.match(self._VALID_URL, url).groups()
+        webpage = self._download_webpage(url, display_id)
+        mobj = re.search(r"player\.loadRevision3Item\('([^']+)'\s*,\s*(\d+)\);", webpage)
+        if mobj:
+            playlist_type, playlist_id = mobj.groups()
+            return self.url_result(
+                'revision3:%s:%s' % (playlist_type, playlist_id), 'Revision3Embed', playlist_id)
+        else:
+            entries = [self.url_result('revision3:video_id:%s' % video_id, 'Revision3Embed', video_id) for video_id in re.findall(
+                r'<iframe[^>]+src=[\'"](?:https?:)?//api\.seekernetwork\.com/player/embed\?videoId=(\d+)', webpage)]
+            return self.playlist_result(
+                entries, article_id, self._og_search_title(webpage), self._og_search_description(webpage))
diff --git a/youtube_dl/extractor/senateisvp.py b/youtube_dl/extractor/senateisvp.py

index c5f474dd1d8a5040a5368de7f2aa050658f7a984..35540c082ef2f7c4d6fa9cf9ce8acf404bc33a8c 100644 (file)
--- a/youtube_dl/extractor/senateisvp.py
+++ b/youtube_dl/extractor/senateisvp.py
@@ -48,7 +48,7 @@ class SenateISVPIE(InfoExtractor):
          ['arch', '', 'http://ussenate-f.akamaihd.net/']
      ]
      _IE_NAME = 'senate.gov'
-    _VALID_URL = r'https?://www\.senate\.gov/isvp/?\?(?P<qs>.+)'
+    _VALID_URL = r'https?://(?:www\.)?senate\.gov/isvp/?\?(?P<qs>.+)'
      _TESTS = [{
          'url': 'http://www.senate.gov/isvp/?comm=judiciary&type=live&stt=&filename=judiciary031715&auto_play=false&wmode=transparent&poster=http%3A%2F%2Fwww.judiciary.senate.gov%2Fthemes%2Fjudiciary%2Fimages%2Fvideo-poster-flash-fit.png',
          'info_dict': {
diff --git a/youtube_dl/extractor/sendtonews.py b/youtube_dl/extractor/sendtonews.py

new file mode 100644 (file)

index 0000000..2dbe490
--- /dev/null
+++ b/youtube_dl/extractor/sendtonews.py
@@ -0,0 +1,89 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .jwplatform import JWPlatformBaseIE
+from ..utils import (
+    float_or_none,
+    parse_iso8601,
+    update_url_query,
+)
+
+
+class SendtoNewsIE(JWPlatformBaseIE):
+    _VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)'
+
+    _TEST = {
+        # From http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/
+        'url': 'http://embed.sendtonews.com/player2/embedplayer.php?SC=GxfCe0Zo7D-175909-5588&type=single&autoplay=on&sound=YES',
+        'info_dict': {
+            'id': 'GxfCe0Zo7D-175909-5588'
+        },
+        'playlist_count': 9,
+        # test the first video only to prevent lengthy tests
+        'playlist': [{
+            'info_dict': {
+                'id': '198180',
+                'ext': 'mp4',
+                'title': 'Recap: CLE 5, LAA 4',
+                'description': '8/14/16: Naquin, Almonte lead Indians in 5-4 win',
+                'duration': 57.343,
+                'thumbnail': 're:https?://.*\.jpg$',
+                'upload_date': '20160815',
+                'timestamp': 1471221961,
+            },
+        }],
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    _URL_TEMPLATE = '//embed.sendtonews.com/player2/embedplayer.php?SC=%s'
+
+    @classmethod
+    def _extract_url(cls, webpage):
+        mobj = re.search(r'''(?x)<script[^>]+src=([\'"])
+            (?:https?:)?//embed\.sendtonews\.com/player/responsiveembed\.php\?
+                .*\bSC=(?P<SC>[0-9a-zA-Z-]+).*
+            \1>''', webpage)
+        if mobj:
+            sc = mobj.group('SC')
+            return cls._URL_TEMPLATE % sc
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        data_url = update_url_query(
+            url.replace('embedplayer.php', 'data_read.php'),
+            {'cmd': 'loadInitial'})
+        playlist_data = self._download_json(data_url, playlist_id)
+
+        entries = []
+        for video in playlist_data['playlistData'][0]:
+            info_dict = self._parse_jwplayer_data(
+                video['jwconfiguration'],
+                require_title=False, rtmp_params={'no_resume': True})
+
+            thumbnails = []
+            if video.get('thumbnailUrl'):
+                thumbnails.append({
+                    'id': 'normal',
+                    'url': video['thumbnailUrl'],
+                })
+            if video.get('smThumbnailUrl'):
+                thumbnails.append({
+                    'id': 'small',
+                    'url': video['smThumbnailUrl'],
+                })
+            info_dict.update({
+                'title': video['S_headLine'],
+                'description': video.get('S_fullStory'),
+                'thumbnails': thumbnails,
+                'duration': float_or_none(video.get('SM_length')),
+                'timestamp': parse_iso8601(video.get('S_sysDate'), delimiter=' '),
+            })
+            entries.append(info_dict)
+
+        return self.playlist_result(entries, playlist_id)
diff --git a/youtube_dl/extractor/sexykarma.py b/youtube_dl/extractor/sexykarma.py

deleted file mode 100644 (file)

index e334836..0000000
--- a/youtube_dl/extractor/sexykarma.py
+++ /dev/null
@@ -1,121 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    unified_strdate,
-    parse_duration,
-    int_or_none,
-)
-
-
-class SexyKarmaIE(InfoExtractor):
-    IE_DESC = 'Sexy Karma and Watch Indian Porn'
-    _VALID_URL = r'https?://(?:www\.)?(?:sexykarma\.com|watchindianporn\.net)/(?:[^/]+/)*video/(?P<display_id>[^/]+)-(?P<id>[a-zA-Z0-9]+)\.html'
-    _TESTS = [{
-        'url': 'http://www.sexykarma.com/gonewild/video/taking-a-quick-pee-yHI70cOyIHt.html',
-        'md5': 'b9798e7d1ef1765116a8f516c8091dbd',
-        'info_dict': {
-            'id': 'yHI70cOyIHt',
-            'display_id': 'taking-a-quick-pee',
-            'ext': 'mp4',
-            'title': 'Taking a quick pee.',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'uploader': 'wildginger7',
-            'upload_date': '20141008',
-            'duration': 22,
-            'view_count': int,
-            'comment_count': int,
-            'categories': list,
-            'age_limit': 18,
-        }
-    }, {
-        'url': 'http://www.sexykarma.com/gonewild/video/pot-pixie-tribute-8Id6EZPbuHf.html',
-        'md5': 'dd216c68d29b49b12842b9babe762a5d',
-        'info_dict': {
-            'id': '8Id6EZPbuHf',
-            'display_id': 'pot-pixie-tribute',
-            'ext': 'mp4',
-            'title': 'pot_pixie tribute',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'uploader': 'banffite',
-            'upload_date': '20141013',
-            'duration': 16,
-            'view_count': int,
-            'comment_count': int,
-            'categories': list,
-            'age_limit': 18,
-        }
-    }, {
-        'url': 'http://www.watchindianporn.net/video/desi-dancer-namrata-stripping-completely-nude-and-dancing-on-a-hot-number-dW2mtctxJfs.html',
-        'md5': '9afb80675550406ed9a63ac2819ef69d',
-        'info_dict': {
-            'id': 'dW2mtctxJfs',
-            'display_id': 'desi-dancer-namrata-stripping-completely-nude-and-dancing-on-a-hot-number',
-            'ext': 'mp4',
-            'title': 'Desi dancer namrata stripping completely nude and dancing on a hot number',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'uploader': 'Don',
-            'upload_date': '20140213',
-            'duration': 83,
-            'view_count': int,
-            'comment_count': int,
-            'categories': list,
-            'age_limit': 18,
-        }
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-
-        webpage = self._download_webpage(url, display_id)
-
-        video_url = self._html_search_regex(
-            r"url: escape\('([^']+)'\)", webpage, 'url')
-
-        title = self._html_search_regex(
-            r'<h2 class="he2"><span>(.*?)</span>',
-            webpage, 'title')
-        thumbnail = self._html_search_regex(
-            r'<span id="container"><img\s+src="([^"]+)"',
-            webpage, 'thumbnail', fatal=False)
-
-        uploader = self._html_search_regex(
-            r'class="aupa">\s*(.*?)</a>',
-            webpage, 'uploader')
-        upload_date = unified_strdate(self._html_search_regex(
-            r'Added: <strong>(.+?)</strong>', webpage, 'upload date', fatal=False))
-
-        duration = parse_duration(self._search_regex(
-            r'<td>Time:\s*</td>\s*<td align="right"><span>\s*(.+?)\s*</span>',
-            webpage, 'duration', fatal=False))
-
-        view_count = int_or_none(self._search_regex(
-            r'<td>Views:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>',
-            webpage, 'view count', fatal=False))
-        comment_count = int_or_none(self._search_regex(
-            r'<td>Comments:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>',
-            webpage, 'comment count', fatal=False))
-
-        categories = re.findall(
-            r'<a href="[^"]+/search/video/desi"><span>([^<]+)</span></a>',
-            webpage)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'url': video_url,
-            'title': title,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'upload_date': upload_date,
-            'duration': duration,
-            'view_count': view_count,
-            'comment_count': comment_count,
-            'categories': categories,
-            'age_limit': 18,
-        }
diff --git a/youtube_dl/extractor/shahid.py b/youtube_dl/extractor/shahid.py

index d95ea06be56844deb3c34276fb1c4e7e18d64173..62d41e88a1084c58af259176e28a8b2654ccd4ee 100644 (file)
--- a/youtube_dl/extractor/shahid.py
+++ b/youtube_dl/extractor/shahid.py
@@ -1,17 +1,24 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+import json
+
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_urlencode
+from ..compat import compat_HTTPError
  from ..utils import (
      ExtractorError,
      int_or_none,
      parse_iso8601,
+    str_or_none,
+    urlencode_postdata,
+    clean_html,
  )
  
  
  class ShahidIE(InfoExtractor):
-    _VALID_URL = r'https?://shahid\.mbc\.net/ar/episode/(?P<id>\d+)/?'
+    _NETRC_MACHINE = 'shahid'
+    _VALID_URL = r'https?://shahid\.mbc\.net/ar/(?P<type>episode|movie)/(?P<id>\d+)'
      _TESTS = [{
          'url': 'https://shahid.mbc.net/ar/episode/90574/%D8%A7%D9%84%D9%85%D9%84%D9%83-%D8%B9%D8%A8%D8%AF%D8%A7%D9%84%D9%84%D9%87-%D8%A7%D9%84%D8%A5%D9%86%D8%B3%D8%A7%D9%86-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-3.html',
          'info_dict': {
@@ -27,51 +34,69 @@ class ShahidIE(InfoExtractor):
              # m3u8 download
              'skip_download': True,
          }
+    }, {
+        'url': 'https://shahid.mbc.net/ar/movie/151746/%D8%A7%D9%84%D9%82%D9%86%D8%A7%D8%B5%D8%A9.html',
+        'only_matching': True
      }, {
          # shahid plus subscriber only
          'url': 'https://shahid.mbc.net/ar/episode/90511/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1.html',
          'only_matching': True
      }]
  
-    def _handle_error(self, response):
-        if not isinstance(response, dict):
+    def _real_initialize(self):
+        email, password = self._get_login_info()
+        if email is None:
              return
-        error = response.get('error')
+
+        try:
+            user_data = self._download_json(
+                'https://shahid.mbc.net/wd/service/users/login',
+                None, 'Logging in', data=json.dumps({
+                    'email': email,
+                    'password': password,
+                    'basic': 'false',
+                }).encode('utf-8'), headers={
+                    'Content-Type': 'application/json; charset=UTF-8',
+                })['user']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError):
+                fail_data = self._parse_json(
+                    e.cause.read().decode('utf-8'), None, fatal=False)
+                if fail_data:
+                    faults = fail_data.get('faults', [])
+                    faults_message = ', '.join([clean_html(fault['userMessage']) for fault in faults if fault.get('userMessage')])
+                    if faults_message:
+                        raise ExtractorError(faults_message, expected=True)
+            raise
+
+        self._download_webpage(
+            'https://shahid.mbc.net/populateContext',
+            None, 'Populate Context', data=urlencode_postdata({
+                'firstName': user_data['firstName'],
+                'lastName': user_data['lastName'],
+                'userName': user_data['email'],
+                'csg_user_name': user_data['email'],
+                'subscriberId': user_data['id'],
+                'sessionId': user_data['sessionId'],
+            }))
+
+    def _get_api_data(self, response):
+        data = response.get('data', {})
+
+        error = data.get('error')
          if error:
              raise ExtractorError(
                  '%s returned error: %s' % (self.IE_NAME, '\n'.join(error.values())),
                  expected=True)
  
-    def _download_json(self, url, video_id, note='Downloading JSON metadata'):
-        response = super(ShahidIE, self)._download_json(url, video_id, note)['data']
-        self._handle_error(response)
-        return response
+        return data
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
+        page_type, video_id = re.match(self._VALID_URL, url).groups()
  
-        api_vars = {
-            'id': video_id,
-            'type': 'player',
-            'url': 'http://api.shahid.net/api/v1_1',
-            'playerType': 'episode',
-        }
-
-        flashvars = self._search_regex(
-            r'var\s+flashvars\s*=\s*({[^}]+})', webpage, 'flashvars', default=None)
-        if flashvars:
-            for key in api_vars.keys():
-                value = self._search_regex(
-                    r'\b%s\s*:\s*(?P<q>["\'])(?P<value>.+?)(?P=q)' % key,
-                    flashvars, 'type', default=None, group='value')
-                if value:
-                    api_vars[key] = value
-
-        player = self._download_json(
-            'https://shahid.mbc.net/arContent/getPlayerContent-param-.id-%s.type-%s.html'
-            % (video_id, api_vars['type']), video_id, 'Downloading player JSON')
+        player = self._get_api_data(self._download_json(
+            'https://shahid.mbc.net/arContent/getPlayerContent-param-.id-%s.type-player.html' % video_id,
+            video_id, 'Downloading player JSON'))
  
          if player.get('drm'):
              raise ExtractorError('This video is DRM protected.', expected=True)
@@ -79,22 +104,14 @@ class ShahidIE(InfoExtractor):
          formats = self._extract_m3u8_formats(player['url'], video_id, 'mp4')
          self._sort_formats(formats)
  
-        video = self._download_json(
-            '%s/%s/%s?%s' % (
-                api_vars['url'], api_vars['playerType'], api_vars['id'],
-                compat_urllib_parse_urlencode({
-                    'apiKey': 'sh@hid0nlin3',
-                    'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
-                })),
-            video_id, 'Downloading video JSON')
-
-        video = video[api_vars['playerType']]
+        video = self._get_api_data(self._download_json(
+            'http://api.shahid.net/api/v1_1/%s/%s' % (page_type, video_id),
+            video_id, 'Downloading video JSON', query={
+                'apiKey': 'sh@hid0nlin3',
+                'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
+            }))[page_type]
  
          title = video['title']
-        description = video.get('description')
-        thumbnail = video.get('thumbnailUrl')
-        duration = int_or_none(video.get('duration'))
-        timestamp = parse_iso8601(video.get('referenceDate'))
          categories = [
              category['name']
              for category in video.get('genres', []) if 'name' in category]
@@ -102,10 +119,16 @@ class ShahidIE(InfoExtractor):
          return {
              'id': video_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'timestamp': timestamp,
+            'description': video.get('description'),
+            'thumbnail': video.get('thumbnailUrl'),
+            'duration': int_or_none(video.get('duration')),
+            'timestamp': parse_iso8601(video.get('referenceDate')),
              'categories': categories,
+            'series': video.get('showTitle') or video.get('showName'),
+            'season': video.get('seasonTitle'),
+            'season_number': int_or_none(video.get('seasonNumber')),
+            'season_id': str_or_none(video.get('seasonId')),
+            'episode_number': int_or_none(video.get('number')),
+            'episode_id': video_id,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/shared.py b/youtube_dl/extractor/shared.py

index e7e5f653eb2117936f568195e508f9e7778b1085..89e19e9277f42b69ac6f01b03c78ced00f3ec990 100644 (file)
--- a/youtube_dl/extractor/shared.py
+++ b/youtube_dl/extractor/shared.py
@@ -6,59 +6,26 @@ from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
      int_or_none,
-    sanitized_Request,
      urlencode_postdata,
  )
  
  
-class SharedIE(InfoExtractor):
-    IE_DESC = 'shared.sx and vivo.sx'
-    _VALID_URL = r'https?://(?:shared|vivo)\.sx/(?P<id>[\da-z]{10})'
-
-    _TESTS = [{
-        'url': 'http://shared.sx/0060718775',
-        'md5': '106fefed92a8a2adb8c98e6a0652f49b',
-        'info_dict': {
-            'id': '0060718775',
-            'ext': 'mp4',
-            'title': 'Bmp4',
-            'filesize': 1720110,
-        },
-    }, {
-        'url': 'http://vivo.sx/d7ddda0e78',
-        'md5': '15b3af41be0b4fe01f4df075c2678b2c',
-        'info_dict': {
-            'id': 'd7ddda0e78',
-            'ext': 'mp4',
-            'title': 'Chicken',
-            'filesize': 528031,
-        },
-    }]
-
+class SharedBaseIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
  
-        if '>File does not exist<' in webpage:
+        webpage, urlh = self._download_webpage_handle(url, video_id)
+
+        if self._FILE_NOT_FOUND in webpage:
              raise ExtractorError(
                  'Video %s does not exist' % video_id, expected=True)
  
-        download_form = self._hidden_inputs(webpage)
-        request = sanitized_Request(
-            url, urlencode_postdata(download_form))
-        request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+        video_url = self._extract_video_url(webpage, video_id, url)
  
-        video_page = self._download_webpage(
-            request, video_id, 'Downloading video page')
-
-        video_url = self._html_search_regex(
-            r'data-url="([^"]+)"', video_page, 'video URL')
          title = base64.b64decode(self._html_search_meta(
              'full:title', webpage, 'title').encode('utf-8')).decode('utf-8')
          filesize = int_or_none(self._html_search_meta(
              'full:size', webpage, 'file size', fatal=False))
-        thumbnail = self._html_search_regex(
-            r'data-poster="([^"]+)"', video_page, 'thumbnail', default=None)
  
          return {
              'id': video_id,
@@ -66,5 +33,64 @@ class SharedIE(InfoExtractor):
              'ext': 'mp4',
              'filesize': filesize,
              'title': title,
-            'thumbnail': thumbnail,
          }
+
+
+class SharedIE(SharedBaseIE):
+    IE_DESC = 'shared.sx'
+    _VALID_URL = r'https?://shared\.sx/(?P<id>[\da-z]{10})'
+    _FILE_NOT_FOUND = '>File does not exist<'
+
+    _TEST = {
+        'url': 'http://shared.sx/0060718775',
+        'md5': '106fefed92a8a2adb8c98e6a0652f49b',
+        'info_dict': {
+            'id': '0060718775',
+            'ext': 'mp4',
+            'title': 'Bmp4',
+            'filesize': 1720110,
+        },
+    }
+
+    def _extract_video_url(self, webpage, video_id, url):
+        download_form = self._hidden_inputs(webpage)
+
+        video_page = self._download_webpage(
+            url, video_id, 'Downloading video page',
+            data=urlencode_postdata(download_form),
+            headers={
+                'Content-Type': 'application/x-www-form-urlencoded',
+                'Referer': url,
+            })
+
+        video_url = self._html_search_regex(
+            r'data-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
+            video_page, 'video URL', group='url')
+
+        return video_url
+
+
+class VivoIE(SharedBaseIE):
+    IE_DESC = 'vivo.sx'
+    _VALID_URL = r'https?://vivo\.sx/(?P<id>[\da-z]{10})'
+    _FILE_NOT_FOUND = '>The file you have requested does not exists or has been removed'
+
+    _TEST = {
+        'url': 'http://vivo.sx/d7ddda0e78',
+        'md5': '15b3af41be0b4fe01f4df075c2678b2c',
+        'info_dict': {
+            'id': 'd7ddda0e78',
+            'ext': 'mp4',
+            'title': 'Chicken',
+            'filesize': 528031,
+        },
+    }
+
+    def _extract_video_url(self, webpage, video_id, *args):
+        return self._parse_json(
+            self._search_regex(
+                r'InitializeStream\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
+                webpage, 'stream', group='url'),
+            video_id,
+            transform_source=lambda x: base64.b64decode(
+                x.encode('ascii')).decode('utf-8'))[0]
diff --git a/youtube_dl/extractor/sina.py b/youtube_dl/extractor/sina.py

index d03f1b1d4308d047e5b690a682587ac5655ce338..8fc66732af70f4db5305fdc891c5142afd5c97c7 100644 (file)
--- a/youtube_dl/extractor/sina.py
+++ b/youtube_dl/extractor/sina.py
@@ -4,28 +4,35 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_urlencode
-from ..utils import sanitized_Request
+from ..utils import (
+    HEADRequest,
+    ExtractorError,
+    int_or_none,
+    update_url_query,
+    qualities,
+    get_element_by_attribute,
+    clean_html,
+)
  
  
  class SinaIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://(.*?\.)?video\.sina\.com\.cn/
-                        (
-                            (.+?/(((?P<pseudo_id>\d+).html)|(.*?(\#|(vid=)|b/)(?P<id>\d+?)($|&|\-))))
-                            |
+    _VALID_URL = r'''(?x)https?://(?:.*?\.)?video\.sina\.com\.cn/
+                        (?:
+                            (?:view/|.*\#)(?P<video_id>\d+)|
+                            .+?/(?P<pseudo_id>[^/?#]+)(?:\.s?html)|
                              # This is used by external sites like Weibo
-                            (api/sinawebApi/outplay.php/(?P<token>.+?)\.swf)
+                            api/sinawebApi/outplay.php/(?P<token>.+?)\.swf
                          )
                    '''
  
      _TESTS = [
          {
-            'url': 'http://video.sina.com.cn/news/vlist/zt/chczlj2013/?opsubject_id=top12#110028898',
-            'md5': 'd65dd22ddcf44e38ce2bf58a10c3e71f',
+            'url': 'http://video.sina.com.cn/news/spj/topvideoes20160504/?opsubject_id=top1#250576622',
+            'md5': 'd38433e2fc886007729735650ae4b3e9',
              'info_dict': {
-                'id': '110028898',
-                'ext': 'flv',
-                'title': '《中国新闻》 朝鲜要求巴拿马立即释放被扣船员',
+                'id': '250576622',
+                'ext': 'mp4',
+                'title': '现场:克鲁兹宣布退选 特朗普将稳获提名',
              }
          },
          {
@@ -35,37 +42,74 @@ class SinaIE(InfoExtractor):
                  'ext': 'flv',
                  'title': '军方提高对朝情报监视级别',
              },
+            'skip': 'the page does not exist or has been deleted',
+        },
+        {
+            'url': 'http://video.sina.com.cn/view/250587748.html',
+            'md5': '3d1807a25c775092aab3bc157fff49b4',
+            'info_dict': {
+                'id': '250587748',
+                'ext': 'mp4',
+                'title': '瞬间泪目：8年前汶川地震珍贵视频首曝光',
+            },
          },
      ]
  
-    def _extract_video(self, video_id):
-        data = compat_urllib_parse_urlencode({'vid': video_id})
-        url_doc = self._download_xml('http://v.iask.com/v_play.php?%s' % data,
-                                     video_id, 'Downloading video url')
-        image_page = self._download_webpage(
-            'http://interface.video.sina.com.cn/interface/common/getVideoImage.php?%s' % data,
-            video_id, 'Downloading thumbnail info')
-
-        return {'id': video_id,
-                'url': url_doc.find('./durl/url').text,
-                'ext': 'flv',
-                'title': url_doc.find('./vname').text,
-                'thumbnail': image_page.split('=')[1],
-                }
-
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        if mobj.group('token') is not None:
-            # The video id is in the redirected url
-            self.to_screen('Getting video id')
-            request = sanitized_Request(url)
-            request.get_method = lambda: 'HEAD'
-            (_, urlh) = self._download_webpage_handle(request, 'NA', False)
-            return self._real_extract(urlh.geturl())
-        elif video_id is None:
-            pseudo_id = mobj.group('pseudo_id')
-            webpage = self._download_webpage(url, pseudo_id)
-            video_id = self._search_regex(r'vid:\'(\d+?)\'', webpage, 'video id')
  
-        return self._extract_video(video_id)
+        video_id = mobj.group('video_id')
+        if not video_id:
+            if mobj.group('token') is not None:
+                # The video id is in the redirected url
+                self.to_screen('Getting video id')
+                request = HEADRequest(url)
+                (_, urlh) = self._download_webpage_handle(request, 'NA', False)
+                return self._real_extract(urlh.geturl())
+            else:
+                pseudo_id = mobj.group('pseudo_id')
+                webpage = self._download_webpage(url, pseudo_id)
+                error = get_element_by_attribute('class', 'errtitle', webpage)
+                if error:
+                    raise ExtractorError('%s said: %s' % (
+                        self.IE_NAME, clean_html(error)), expected=True)
+                video_id = self._search_regex(
+                    r"video_id\s*:\s*'(\d+)'", webpage, 'video id')
+
+        video_data = self._download_json(
+            'http://s.video.sina.com.cn/video/h5play',
+            video_id, query={'video_id': video_id})
+        if video_data['code'] != 1:
+            raise ExtractorError('%s said: %s' % (
+                self.IE_NAME, video_data['message']), expected=True)
+        else:
+            video_data = video_data['data']
+            title = video_data['title']
+            description = video_data.get('description')
+            if description:
+                description = description.strip()
+
+            preference = qualities(['cif', 'sd', 'hd', 'fhd', 'ffd'])
+            formats = []
+            for quality_id, quality in video_data.get('videos', {}).get('mp4', {}).items():
+                file_api = quality.get('file_api')
+                file_id = quality.get('file_id')
+                if not file_api or not file_id:
+                    continue
+                formats.append({
+                    'format_id': quality_id,
+                    'url': update_url_query(file_api, {'vid': file_id}),
+                    'preference': preference(quality_id),
+                    'ext': 'mp4',
+                })
+            self._sort_formats(formats)
+
+            return {
+                'id': video_id,
+                'title': title,
+                'description': description,
+                'thumbnail': video_data.get('image'),
+                'duration': int_or_none(video_data.get('length')),
+                'timestamp': int_or_none(video_data.get('create_time')),
+                'formats': formats,
+            }
diff --git a/youtube_dl/extractor/sixplay.py b/youtube_dl/extractor/sixplay.py

new file mode 100644 (file)

index 0000000..d3aba58
--- /dev/null
+++ b/youtube_dl/extractor/sixplay.py
@@ -0,0 +1,64 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    qualities,
+    int_or_none,
+    mimetype2ext,
+    determine_ext,
+)
+
+
+class SixPlayIE(InfoExtractor):
+    _VALID_URL = r'(?:6play:|https?://(?:www\.)?6play\.fr/.+?-c_)(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.6play.fr/jamel-et-ses-amis-au-marrakech-du-rire-p_1316/jamel-et-ses-amis-au-marrakech-du-rire-2015-c_11495320',
+        'md5': '42310bffe4ba3982db112b9cd3467328',
+        'info_dict': {
+            'id': '11495320',
+            'ext': 'mp4',
+            'title': 'Jamel et ses amis au Marrakech du rire 2015',
+            'description': 'md5:ba2149d5c321d5201b78070ee839d872',
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        clip_data = self._download_json(
+            'https://player.m6web.fr/v2/video/config/6play-auth/FR/%s.json' % video_id,
+            video_id)
+        video_data = clip_data['videoInfo']
+
+        quality_key = qualities(['lq', 'sd', 'hq', 'hd'])
+        formats = []
+        for source in clip_data['sources']:
+            source_type, source_url = source.get('type'), source.get('src')
+            if not source_url or source_type == 'hls/primetime':
+                continue
+            ext = mimetype2ext(source_type) or determine_ext(source_url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    source_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
+                formats.extend(self._extract_f4m_formats(
+                    source_url.replace('.m3u8', '.f4m'),
+                    video_id, f4m_id='hds', fatal=False))
+            elif ext == 'mp4':
+                quality = source.get('quality')
+                formats.append({
+                    'url': source_url,
+                    'format_id': quality,
+                    'quality': quality_key(quality),
+                    'ext': ext,
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': video_data['title'].strip(),
+            'description': video_data.get('description'),
+            'duration': int_or_none(video_data.get('duration')),
+            'series': video_data.get('titlePgm'),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/skynewsarabia.py b/youtube_dl/extractor/skynewsarabia.py

index 05e1b02ada567c8d88c0dbbba41171d0df1ac680..fffc9aa2277e3a0f35d24c585eec5be1e59d5c17 100644 (file)
--- a/youtube_dl/extractor/skynewsarabia.py
+++ b/youtube_dl/extractor/skynewsarabia.py
@@ -67,7 +67,7 @@ class SkyNewsArabiaIE(SkyNewsArabiaBaseIE):
  
  
  class SkyNewsArabiaArticleIE(SkyNewsArabiaBaseIE):
-    IE_NAME = 'skynewsarabia:video'
+    IE_NAME = 'skynewsarabia:article'
      _VALID_URL = r'https?://(?:www\.)?skynewsarabia\.com/web/article/(?P<id>[0-9]+)'
      _TESTS = [{
          'url': 'http://www.skynewsarabia.com/web/article/794549/%D8%A7%D9%94%D8%AD%D8%AF%D8%A7%D8%AB-%D8%A7%D9%84%D8%B4%D8%B1%D9%82-%D8%A7%D9%84%D8%A7%D9%94%D9%88%D8%B3%D8%B7-%D8%AE%D8%B1%D9%8A%D8%B7%D8%A9-%D8%A7%D9%84%D8%A7%D9%94%D9%84%D8%B9%D8%A7%D8%A8-%D8%A7%D9%84%D8%B0%D9%83%D9%8A%D8%A9',
diff --git a/youtube_dl/extractor/skysports.py b/youtube_dl/extractor/skysports.py

new file mode 100644 (file)

index 0000000..9dc78c7
--- /dev/null
+++ b/youtube_dl/extractor/skysports.py
@@ -0,0 +1,33 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class SkySportsIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?skysports\.com/watch/video/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.skysports.com/watch/video/10328419/bale-its-our-time-to-shine',
+        'md5': 'c44a1db29f27daf9a0003e010af82100',
+        'info_dict': {
+            'id': '10328419',
+            'ext': 'flv',
+            'title': 'Bale: Its our time to shine',
+            'description': 'md5:9fd1de3614d525f5addda32ac3c482c9',
+        },
+        'add_ie': ['Ooyala'],
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': 'ooyala:%s' % self._search_regex(
+                r'data-video-id="([^"]+)"', webpage, 'ooyala id'),
+            'title': self._og_search_title(webpage),
+            'description': self._og_search_description(webpage),
+            'ie_key': 'Ooyala',
+        }
diff --git a/youtube_dl/extractor/slideshare.py b/youtube_dl/extractor/slideshare.py

index 0b717a1e42b8dd2c3d8a88d602f001876cf99e03..74a1dc672e7725f2f3500284a53ade4ca16c380d 100644 (file)
--- a/youtube_dl/extractor/slideshare.py
+++ b/youtube_dl/extractor/slideshare.py
@@ -9,11 +9,12 @@ from ..compat import (
  )
  from ..utils import (
      ExtractorError,
+    get_element_by_id,
  )
  
  
  class SlideshareIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.slideshare\.net/[^/]+?/(?P<title>.+?)($|\?)'
+    _VALID_URL = r'https?://(?:www\.)?slideshare\.net/[^/]+?/(?P<title>.+?)($|\?)'
  
      _TEST = {
          'url': 'http://www.slideshare.net/Dataversity/keynote-presentation-managing-scale-and-complexity',
@@ -40,7 +41,7 @@ class SlideshareIE(InfoExtractor):
          bucket = info['jsplayer']['video_bucket']
          ext = info['jsplayer']['video_extension']
          video_url = compat_urlparse.urljoin(bucket, doc + '-SD.' + ext)
-        description = self._html_search_regex(
+        description = get_element_by_id('slideshow-description-paragraph', webpage) or self._html_search_regex(
              r'(?s)<p[^>]+itemprop="description"[^>]*>(.+?)</p>', webpage,
              'description', fatal=False)
  
@@ -51,5 +52,5 @@ class SlideshareIE(InfoExtractor):
              'ext': ext,
              'url': video_url,
              'thumbnail': info['slideshow']['pin_image_url'],
-            'description': description,
+            'description': description.strip() if description else None,
          }
diff --git a/youtube_dl/extractor/slutload.py b/youtube_dl/extractor/slutload.py

index 7efb29f653b76b25c26d91aac16c6985255ee1d0..18cc7721e142c7493bbebdfcb59f621e3fedaf4f 100644 (file)
--- a/youtube_dl/extractor/slutload.py
+++ b/youtube_dl/extractor/slutload.py
@@ -1,7 +1,5 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  
  
@@ -9,7 +7,7 @@ class SlutloadIE(InfoExtractor):
      _VALID_URL = r'^https?://(?:\w+\.)?slutload\.com/video/[^/]+/(?P<id>[^/]+)/?$'
      _TEST = {
          'url': 'http://www.slutload.com/video/virginie-baisee-en-cam/TD73btpBqSxc/',
-        'md5': '0cf531ae8006b530bd9df947a6a0df77',
+        'md5': '868309628ba00fd488cf516a113fd717',
          'info_dict': {
              'id': 'TD73btpBqSxc',
              'ext': 'mp4',
@@ -20,9 +18,7 @@ class SlutloadIE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
+        video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
          video_title = self._html_search_regex(r'<h1><strong>([^<]+)</strong>',
diff --git a/youtube_dl/extractor/smotri.py b/youtube_dl/extractor/smotri.py

index 5c3fd0fece8dc8b32a3d05bea8ed4dedf430f1c5..def46abda45c5d4899f3c3e5a3fb775592efdfa6 100644 (file)
--- a/youtube_dl/extractor/smotri.py
+++ b/youtube_dl/extractor/smotri.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -13,20 +13,21 @@ from ..utils import (
      sanitized_Request,
      unified_strdate,
      urlencode_postdata,
+    xpath_text,
  )
  
  
  class SmotriIE(InfoExtractor):
      IE_DESC = 'Smotri.com'
      IE_NAME = 'smotri'
-    _VALID_URL = r'^https?://(?:www\.)?(?:smotri\.com/video/view/\?id=|pics\.smotri\.com/(?:player|scrubber_custom8)\.swf\?file=)(?P<id>v(?P<realvideoid>[0-9]+)[a-z0-9]{4})'
+    _VALID_URL = r'https?://(?:www\.)?(?:smotri\.com/video/view/\?id=|pics\.smotri\.com/(?:player|scrubber_custom8)\.swf\?file=)(?P<id>v(?P<realvideoid>[0-9]+)[a-z0-9]{4})'
      _NETRC_MACHINE = 'smotri'
  
      _TESTS = [
          # real video id 2610366
          {
              'url': 'http://smotri.com/video/view/?id=v261036632ab',
-            'md5': '2a7b08249e6f5636557579c368040eb9',
+            'md5': '02c0dfab2102984e9c5bb585cc7cc321',
              'info_dict': {
                  'id': 'v261036632ab',
                  'ext': 'mp4',
@@ -174,11 +175,11 @@ class SmotriIE(InfoExtractor):
          if video_password:
              video_form['pass'] = hashlib.md5(video_password.encode('utf-8')).hexdigest()
  
-        request = sanitized_Request(
-            'http://smotri.com/video/view/url/bot/', urlencode_postdata(video_form))
-        request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-
-        video = self._download_json(request, video_id, 'Downloading video JSON')
+        video = self._download_json(
+            'http://smotri.com/video/view/url/bot/',
+            video_id, 'Downloading video JSON',
+            data=urlencode_postdata(video_form),
+            headers={'Content-Type': 'application/x-www-form-urlencoded'})
  
          video_url = video.get('_vidURL') or video.get('_vidURL_mp4')
  
@@ -196,11 +197,11 @@ class SmotriIE(InfoExtractor):
                  raise ExtractorError(msg, expected=True)
  
          title = video['title']
-        thumbnail = video['_imgURL']
-        upload_date = unified_strdate(video['added'])
-        uploader = video['userNick']
-        uploader_id = video['userLogin']
-        duration = int_or_none(video['duration'])
+        thumbnail = video.get('_imgURL')
+        upload_date = unified_strdate(video.get('added'))
+        uploader = video.get('userNick')
+        uploader_id = video.get('userLogin')
+        duration = int_or_none(video.get('duration'))
  
          # Video JSON does not provide enough meta data
          # We will extract some from the video web page instead
@@ -209,7 +210,7 @@ class SmotriIE(InfoExtractor):
  
          # Warning if video is unavailable
          warning = self._html_search_regex(
-            r'<div class="videoUnModer">(.*?)</div>', webpage,
+            r'<div[^>]+class="videoUnModer"[^>]*>(.+?)</div>', webpage,
              'warning message', default=None)
          if warning is not None:
              self._downloader.report_warning(
@@ -217,20 +218,22 @@ class SmotriIE(InfoExtractor):
                  (video_id, warning))
  
          # Adult content
-        if re.search('EroConfirmText">', webpage) is not None:
+        if 'EroConfirmText">' in webpage:
              self.report_age_confirmation()
              confirm_string = self._html_search_regex(
-                r'<a href="/video/view/\?id=%s&confirm=([^"]+)" title="[^"]+">' % video_id,
+                r'<a[^>]+href="/video/view/\?id=%s&confirm=([^"]+)"' % video_id,
                  webpage, 'confirm string')
              confirm_url = webpage_url + '&confirm=%s' % confirm_string
-            webpage = self._download_webpage(confirm_url, video_id, 'Downloading video page (age confirmed)')
+            webpage = self._download_webpage(
+                confirm_url, video_id,
+                'Downloading video page (age confirmed)')
              adult_content = True
          else:
              adult_content = False
  
          view_count = self._html_search_regex(
-            'Общее количество просмотров.*?<span class="Number">(\\d+)</span>',
-            webpage, 'view count', fatal=False, flags=re.MULTILINE | re.DOTALL)
+            r'(?s)Общее количество просмотров.*?<span class="Number">(\d+)</span>',
+            webpage, 'view count', fatal=False)
  
          return {
              'id': video_id,
@@ -249,37 +252,33 @@ class SmotriIE(InfoExtractor):
  class SmotriCommunityIE(InfoExtractor):
      IE_DESC = 'Smotri.com community videos'
      IE_NAME = 'smotri:community'
-    _VALID_URL = r'^https?://(?:www\.)?smotri\.com/community/video/(?P<communityid>[0-9A-Za-z_\'-]+)'
+    _VALID_URL = r'https?://(?:www\.)?smotri\.com/community/video/(?P<id>[0-9A-Za-z_\'-]+)'
      _TEST = {
          'url': 'http://smotri.com/community/video/kommuna',
          'info_dict': {
              'id': 'kommuna',
-            'title': 'КПРФ',
          },
          'playlist_mincount': 4,
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        community_id = mobj.group('communityid')
+        community_id = self._match_id(url)
  
-        url = 'http://smotri.com/export/rss/video/by/community/-/%s/video.xml' % community_id
-        rss = self._download_xml(url, community_id, 'Downloading community RSS')
+        rss = self._download_xml(
+            'http://smotri.com/export/rss/video/by/community/-/%s/video.xml' % community_id,
+            community_id, 'Downloading community RSS')
  
-        entries = [self.url_result(video_url.text, 'Smotri')
-                   for video_url in rss.findall('./channel/item/link')]
+        entries = [
+            self.url_result(video_url.text, SmotriIE.ie_key())
+            for video_url in rss.findall('./channel/item/link')]
  
-        description_text = rss.find('./channel/description').text
-        community_title = self._html_search_regex(
-            '^Видео сообщества "([^"]+)"$', description_text, 'community title')
-
-        return self.playlist_result(entries, community_id, community_title)
+        return self.playlist_result(entries, community_id)
  
  
  class SmotriUserIE(InfoExtractor):
      IE_DESC = 'Smotri.com user videos'
      IE_NAME = 'smotri:user'
-    _VALID_URL = r'^https?://(?:www\.)?smotri\.com/user/(?P<userid>[0-9A-Za-z_\'-]+)'
+    _VALID_URL = r'https?://(?:www\.)?smotri\.com/user/(?P<id>[0-9A-Za-z_\'-]+)'
      _TESTS = [{
          'url': 'http://smotri.com/user/inspector',
          'info_dict': {
@@ -290,19 +289,19 @@ class SmotriUserIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        user_id = mobj.group('userid')
+        user_id = self._match_id(url)
  
-        url = 'http://smotri.com/export/rss/user/video/-/%s/video.xml' % user_id
-        rss = self._download_xml(url, user_id, 'Downloading user RSS')
+        rss = self._download_xml(
+            'http://smotri.com/export/rss/user/video/-/%s/video.xml' % user_id,
+            user_id, 'Downloading user RSS')
  
          entries = [self.url_result(video_url.text, 'Smotri')
                     for video_url in rss.findall('./channel/item/link')]
  
-        description_text = rss.find('./channel/description').text
-        user_nickname = self._html_search_regex(
-            '^Видео режиссера (.*)$', description_text,
-            'user nickname')
+        description_text = xpath_text(rss, './channel/description') or ''
+        user_nickname = self._search_regex(
+            '^Видео режиссера (.+)$', description_text,
+            'user nickname', fatal=False)
  
          return self.playlist_result(entries, user_id, user_nickname)
  
@@ -310,11 +309,11 @@ class SmotriUserIE(InfoExtractor):
  class SmotriBroadcastIE(InfoExtractor):
      IE_DESC = 'Smotri.com broadcasts'
      IE_NAME = 'smotri:broadcast'
-    _VALID_URL = r'^https?://(?:www\.)?(?P<url>smotri\.com/live/(?P<broadcastid>[^/]+))/?.*'
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>smotri\.com/live/(?P<id>[^/]+))/?.*'
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        broadcast_id = mobj.group('broadcastid')
+        broadcast_id = mobj.group('id')
  
          broadcast_url = 'http://' + mobj.group('url')
          broadcast_page = self._download_webpage(broadcast_url, broadcast_id, 'Downloading broadcast page')
@@ -328,7 +327,8 @@ class SmotriBroadcastIE(InfoExtractor):
  
              (username, password) = self._get_login_info()
              if username is None:
-                self.raise_login_required('Erotic broadcasts allowed only for registered users')
+                self.raise_login_required(
+                    'Erotic broadcasts allowed only for registered users')
  
              login_form = {
                  'login-hint53': '1',
@@ -343,8 +343,9 @@ class SmotriBroadcastIE(InfoExtractor):
              broadcast_page = self._download_webpage(
                  request, broadcast_id, 'Logging in and confirming age')
  
-            if re.search('>Неверный логин или пароль<', broadcast_page) is not None:
-                raise ExtractorError('Unable to log in: bad username or password', expected=True)
+            if '>Неверный логин или пароль<' in broadcast_page:
+                raise ExtractorError(
+                    'Unable to log in: bad username or password', expected=True)
  
              adult_content = True
          else:
@@ -383,11 +384,11 @@ class SmotriBroadcastIE(InfoExtractor):
  
              broadcast_playpath = broadcast_json['_streamName']
              broadcast_app = '%s/%s' % (mobj.group('app'), broadcast_json['_vidURL'])
-            broadcast_thumbnail = broadcast_json['_imgURL']
+            broadcast_thumbnail = broadcast_json.get('_imgURL')
              broadcast_title = self._live_title(broadcast_json['title'])
-            broadcast_description = broadcast_json['description']
-            broadcaster_nick = broadcast_json['nick']
-            broadcaster_login = broadcast_json['login']
+            broadcast_description = broadcast_json.get('description')
+            broadcaster_nick = broadcast_json.get('nick')
+            broadcaster_login = broadcast_json.get('login')
              rtmp_conn = 'S:%s' % uuid.uuid4().hex
          except KeyError:
              if protected_broadcast:
diff --git a/youtube_dl/extractor/snagfilms.py b/youtube_dl/extractor/snagfilms.py

deleted file mode 100644 (file)

index 6977afb..0000000
--- a/youtube_dl/extractor/snagfilms.py
+++ /dev/null
@@ -1,181 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    clean_html,
-    determine_ext,
-    int_or_none,
-    js_to_json,
-    parse_duration,
-)
-
-
-class SnagFilmsEmbedIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:(?:www|embed)\.)?snagfilms\.com/embed/player\?.*\bfilmId=(?P<id>[\da-f-]{36})'
-    _TESTS = [{
-        'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
-        'md5': '2924e9215c6eff7a55ed35b72276bd93',
-        'info_dict': {
-            'id': '74849a00-85a9-11e1-9660-123139220831',
-            'ext': 'mp4',
-            'title': '#whilewewatch',
-        }
-    }, {
-        # invalid labels, 360p is better that 480p
-        'url': 'http://www.snagfilms.com/embed/player?filmId=17ca0950-a74a-11e0-a92a-0026bb61d036',
-        'md5': '882fca19b9eb27ef865efeeaed376a48',
-        'info_dict': {
-            'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
-            'ext': 'mp4',
-            'title': 'Life in Limbo',
-        }
-    }, {
-        'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
-        'only_matching': True,
-    }]
-
-    @staticmethod
-    def _extract_url(webpage):
-        mobj = re.search(
-            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:embed\.)?snagfilms\.com/embed/player.+?)\1',
-            webpage)
-        if mobj:
-            return mobj.group('url')
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        if '>This film is not playable in your area.<' in webpage:
-            raise ExtractorError(
-                'Film %s is not playable in your area.' % video_id, expected=True)
-
-        formats = []
-        for source in self._parse_json(js_to_json(self._search_regex(
-                r'(?s)sources:\s*(\[.+?\]),', webpage, 'json')), video_id):
-            file_ = source.get('file')
-            if not file_:
-                continue
-            type_ = source.get('type')
-            ext = determine_ext(file_)
-            format_id = source.get('label') or ext
-            if all(v == 'm3u8' for v in (type_, ext)):
-                formats.extend(self._extract_m3u8_formats(
-                    file_, video_id, 'mp4', m3u8_id='hls'))
-            else:
-                bitrate = int_or_none(self._search_regex(
-                    [r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext],
-                    file_, 'bitrate', default=None))
-                height = int_or_none(self._search_regex(
-                    r'^(\d+)[pP]$', format_id, 'height', default=None))
-                formats.append({
-                    'url': file_,
-                    'format_id': format_id,
-                    'tbr': bitrate,
-                    'height': height,
-                })
-        self._sort_formats(formats)
-
-        title = self._search_regex(
-            [r"title\s*:\s*'([^']+)'", r'<title>([^<]+)</title>'],
-            webpage, 'title')
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-        }
-
-
-class SnagFilmsIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?snagfilms\.com/(?:films/title|show)/(?P<id>[^?#]+)'
-    _TESTS = [{
-        'url': 'http://www.snagfilms.com/films/title/lost_for_life',
-        'md5': '19844f897b35af219773fd63bdec2942',
-        'info_dict': {
-            'id': '0000014c-de2f-d5d6-abcf-ffef58af0017',
-            'display_id': 'lost_for_life',
-            'ext': 'mp4',
-            'title': 'Lost for Life',
-            'description': 'md5:fbdacc8bb6b455e464aaf98bc02e1c82',
-            'thumbnail': 're:^https?://.*\.jpg',
-            'duration': 4489,
-            'categories': ['Documentary', 'Crime', 'Award Winning', 'Festivals']
-        }
-    }, {
-        'url': 'http://www.snagfilms.com/show/the_world_cut_project/india',
-        'md5': 'e6292e5b837642bbda82d7f8bf3fbdfd',
-        'info_dict': {
-            'id': '00000145-d75c-d96e-a9c7-ff5c67b20000',
-            'display_id': 'the_world_cut_project/india',
-            'ext': 'mp4',
-            'title': 'India',
-            'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
-            'thumbnail': 're:^https?://.*\.jpg',
-            'duration': 979,
-            'categories': ['Documentary', 'Sports', 'Politics']
-        }
-    }, {
-        # Film is not playable in your area.
-        'url': 'http://www.snagfilms.com/films/title/inside_mecca',
-        'only_matching': True,
-    }, {
-        # Film is not available.
-        'url': 'http://www.snagfilms.com/show/augie_alone/flirting',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
-
-        if ">Sorry, the Film you're looking for is not available.<" in webpage:
-            raise ExtractorError(
-                'Film %s is not available.' % display_id, expected=True)
-
-        film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id')
-
-        snag = self._parse_json(
-            self._search_regex(
-                'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'),
-            display_id)
-
-        for item in snag:
-            if item.get('data', {}).get('film', {}).get('id') == film_id:
-                data = item['data']['film']
-                title = data['title']
-                description = clean_html(data.get('synopsis'))
-                thumbnail = data.get('image')
-                duration = int_or_none(data.get('duration') or data.get('runtime'))
-                categories = [
-                    category['title'] for category in data.get('categories', [])
-                    if category.get('title')]
-                break
-        else:
-            title = self._search_regex(
-                r'itemprop="title">([^<]+)<', webpage, 'title')
-            description = self._html_search_regex(
-                r'(?s)<div itemprop="description" class="film-synopsis-inner ">(.+?)</div>',
-                webpage, 'description', default=None) or self._og_search_description(webpage)
-            thumbnail = self._og_search_thumbnail(webpage)
-            duration = parse_duration(self._search_regex(
-                r'<span itemprop="duration" class="film-duration strong">([^<]+)<',
-                webpage, 'duration', fatal=False))
-            categories = re.findall(r'<a href="/movies/[^"]+">([^<]+)</a>', webpage)
-
-        return {
-            '_type': 'url_transparent',
-            'url': 'http://embed.snagfilms.com/embed/player?filmId=%s' % film_id,
-            'id': film_id,
-            'display_id': display_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'categories': categories,
-        }
diff --git a/youtube_dl/extractor/snotr.py b/youtube_dl/extractor/snotr.py

index 0d1ab07f86ac4088b4fd1e56e9d1dfaa52514ddd..4819fe5b4b6322cc02e9e1fdd4c128cbe28e55b0 100644 (file)
--- a/youtube_dl/extractor/snotr.py
+++ b/youtube_dl/extractor/snotr.py
@@ -5,9 +5,9 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
-    float_or_none,
-    str_to_int,
      parse_duration,
+    parse_filesize,
+    str_to_int,
  )
  
  
@@ -17,21 +17,24 @@ class SnotrIE(InfoExtractor):
          'url': 'http://www.snotr.com/video/13708/Drone_flying_through_fireworks',
          'info_dict': {
              'id': '13708',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Drone flying through fireworks!',
-            'duration': 247,
-            'filesize_approx': 98566144,
+            'duration': 248,
+            'filesize_approx': 40700000,
              'description': 'A drone flying through Fourth of July Fireworks',
-        }
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+        'expected_warnings': ['description'],
      }, {
          'url': 'http://www.snotr.com/video/530/David_Letteman_-_George_W_Bush_Top_10',
          'info_dict': {
              'id': '530',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'David Letteman - George W. Bush Top 10',
              'duration': 126,
-            'filesize_approx': 8912896,
+            'filesize_approx': 8500000,
              'description': 'The top 10 George W. Bush moments, brought to you by David Letterman!',
+            'thumbnail': 're:^https?://.*\.jpg$',
          }
      }]
  
@@ -43,26 +46,28 @@ class SnotrIE(InfoExtractor):
          title = self._og_search_title(webpage)
  
          description = self._og_search_description(webpage)
-        video_url = 'http://cdn.videos.snotr.com/%s.flv' % video_id
+        info_dict = self._parse_html5_media_entries(
+            url, webpage, video_id, m3u8_entry_protocol='m3u8_native')[0]
  
          view_count = str_to_int(self._html_search_regex(
-            r'<p>\n<strong>Views:</strong>\n([\d,\.]+)</p>',
+            r'<p[^>]*>\s*<strong[^>]*>Views:</strong>\s*<span[^>]*>([\d,\.]+)',
              webpage, 'view count', fatal=False))
  
          duration = parse_duration(self._html_search_regex(
-            r'<p>\n<strong>Length:</strong>\n\s*([0-9:]+).*?</p>',
+            r'<p[^>]*>\s*<strong[^>]*>Length:</strong>\s*<span[^>]*>([\d:]+)',
              webpage, 'duration', fatal=False))
  
-        filesize_approx = float_or_none(self._html_search_regex(
-            r'<p>\n<strong>Filesize:</strong>\n\s*([0-9.]+)\s*megabyte</p>',
-            webpage, 'filesize', fatal=False), invscale=1024 * 1024)
+        filesize_approx = parse_filesize(self._html_search_regex(
+            r'<p[^>]*>\s*<strong[^>]*>Filesize:</strong>\s*<span[^>]*>([^<]+)',
+            webpage, 'filesize', fatal=False))
  
-        return {
+        info_dict.update({
              'id': video_id,
              'description': description,
              'title': title,
-            'url': video_url,
              'view_count': view_count,
              'duration': duration,
              'filesize_approx': filesize_approx,
-        }
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/sohu.py b/youtube_dl/extractor/sohu.py

index 49e5d09ae450d11bb567a2fe95ecba55998c8b42..30760ca06be4b3fc112f3fe0200c74b665d64855 100644 (file)
--- a/youtube_dl/extractor/sohu.py
+++ b/youtube_dl/extractor/sohu.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -8,19 +8,16 @@ from ..compat import (
      compat_str,
      compat_urllib_parse_urlencode,
  )
-from ..utils import (
-    ExtractorError,
-    sanitized_Request,
-)
+from ..utils import ExtractorError
  
  
  class SohuIE(InfoExtractor):
      _VALID_URL = r'https?://(?P<mytv>my\.)?tv\.sohu\.com/.+?/(?(mytv)|n)(?P<id>\d+)\.shtml.*?'
  
+    # Sohu videos give different MD5 sums on Travis CI and my machine
      _TESTS = [{
          'note': 'This video is available only in Mainland China',
          'url': 'http://tv.sohu.com/20130724/n382479172.shtml#super',
-        'md5': '29175c8cadd8b5cc4055001e85d6b372',
          'info_dict': {
              'id': '382479172',
              'ext': 'mp4',
@@ -29,7 +26,6 @@ class SohuIE(InfoExtractor):
          'skip': 'On available in China',
      }, {
          'url': 'http://tv.sohu.com/20150305/n409385080.shtml',
-        'md5': '699060e75cf58858dd47fb9c03c42cfb',
          'info_dict': {
              'id': '409385080',
              'ext': 'mp4',
@@ -37,7 +33,6 @@ class SohuIE(InfoExtractor):
          }
      }, {
          'url': 'http://my.tv.sohu.com/us/232799889/78693464.shtml',
-        'md5': '9bf34be48f2f4dadcb226c74127e203c',
          'info_dict': {
              'id': '78693464',
              'ext': 'mp4',
@@ -51,7 +46,6 @@ class SohuIE(InfoExtractor):
              'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
          },
          'playlist': [{
-            'md5': 'bdbfb8f39924725e6589c146bc1883ad',
              'info_dict': {
                  'id': '78910339_part1',
                  'ext': 'mp4',
@@ -59,7 +53,6 @@ class SohuIE(InfoExtractor):
                  'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
              }
          }, {
-            'md5': '3e1f46aaeb95354fd10e7fca9fc1804e',
              'info_dict': {
                  'id': '78910339_part2',
                  'ext': 'mp4',
@@ -67,7 +60,6 @@ class SohuIE(InfoExtractor):
                  'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
              }
          }, {
-            'md5': '8407e634175fdac706766481b9443450',
              'info_dict': {
                  'id': '78910339_part3',
                  'ext': 'mp4',
@@ -96,15 +88,10 @@ class SohuIE(InfoExtractor):
              else:
                  base_data_url = 'http://hot.vrs.sohu.com/vrs_flash.action?vid='
  
-            req = sanitized_Request(base_data_url + vid_id)
-
-            cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
-            if cn_verification_proxy:
-                req.add_header('Ytdl-request-proxy', cn_verification_proxy)
-
              return self._download_json(
-                req, video_id,
-                'Downloading JSON data for %s' % vid_id)
+                base_data_url + vid_id, video_id,
+                'Downloading JSON data for %s' % vid_id,
+                headers=self.geo_verification_headers())
  
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
diff --git a/youtube_dl/extractor/sonyliv.py b/youtube_dl/extractor/sonyliv.py

new file mode 100644 (file)

index 0000000..accd112
--- /dev/null
+++ b/youtube_dl/extractor/sonyliv.py
@@ -0,0 +1,34 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class SonyLIVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?sonyliv\.com/details/[^/]+/(?P<id>\d+)'
+    _TESTS = [{
+        'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight",
+        'info_dict': {
+            'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight",
+            'id': '5024612095001',
+            'ext': 'mp4',
+            'upload_date': '20160707',
+            'description': 'md5:7f28509a148d5be9d0782b4d5106410d',
+            'uploader_id': '4338955589001',
+            'timestamp': 1467870968,
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': ['BrightcoveNew'],
+    }, {
+        'url': 'http://www.sonyliv.com/details/full%20movie/4951168986001/Sei-Raat-(Bangla)',
+        'only_matching': True,
+    }]
+
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        brightcove_id = self._match_id(url)
+        return self.url_result(
+            self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index 194dabc71d84072fc64afd50baa3b80467c0808f..3b7ecb3c343291e3fec8af451b4bb2bc3dde9fae 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -32,7 +32,7 @@ class SoundcloudIE(InfoExtractor):
      _VALID_URL = r'''(?x)^(?:https?://)?
                      (?:(?:(?:www\.|m\.)?soundcloud\.com/
                              (?P<uploader>[\w\d-]+)/
-                            (?!(?:tracks|sets(?:/[^/?#]+)?|reposts|likes|spotlight)/?(?:$|[?#]))
+                            (?!(?:tracks|sets(?:/.+?)?|reposts|likes|spotlight)/?(?:$|[?#]))
                              (?P<title>[\w\d-]+)/?
                              (?P<token>[^?]+?)?(?:[?].*)?$)
                         |(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+)
@@ -53,6 +53,7 @@ class SoundcloudIE(InfoExtractor):
                  'uploader': 'E.T. ExTerrestrial Music',
                  'title': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
                  'duration': 143,
+                'license': 'all-rights-reserved',
              }
          },
          # not streamable song
@@ -66,6 +67,7 @@ class SoundcloudIE(InfoExtractor):
                  'uploader': 'The Royal Concept',
                  'upload_date': '20120521',
                  'duration': 227,
+                'license': 'all-rights-reserved',
              },
              'params': {
                  # rtmp
@@ -84,6 +86,7 @@ class SoundcloudIE(InfoExtractor):
                  'description': 'test chars:  \"\'/\\ä↭',
                  'upload_date': '20131209',
                  'duration': 9,
+                'license': 'all-rights-reserved',
              },
          },
          # private link (alt format)
@@ -98,6 +101,7 @@ class SoundcloudIE(InfoExtractor):
                  'description': 'test chars:  \"\'/\\ä↭',
                  'upload_date': '20131209',
                  'duration': 9,
+                'license': 'all-rights-reserved',
              },
          },
          # downloadable song
@@ -112,6 +116,7 @@ class SoundcloudIE(InfoExtractor):
                  'uploader': 'oddsamples',
                  'upload_date': '20140109',
                  'duration': 17,
+                'license': 'cc-by-sa',
              },
          },
      ]
@@ -119,6 +124,12 @@ class SoundcloudIE(InfoExtractor):
      _CLIENT_ID = '02gUJC0hH2ct1EGOcYXQIzRFU91c72Ea'
      _IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
  
+    @staticmethod
+    def _extract_urls(webpage):
+        return [m.group('url') for m in re.finditer(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:w\.)?soundcloud\.com/player.+?)\1',
+            webpage)]
+
      def report_resolve(self, video_id):
          """Report information extraction."""
          self.to_screen('%s: Resolving id' % video_id)
@@ -132,20 +143,20 @@ class SoundcloudIE(InfoExtractor):
          name = full_title or track_id
          if quiet:
              self.report_extraction(name)
-
-        thumbnail = info['artwork_url']
-        if thumbnail is not None:
+        thumbnail = info.get('artwork_url')
+        if isinstance(thumbnail, compat_str):
              thumbnail = thumbnail.replace('-large', '-t500x500')
          ext = 'mp3'
          result = {
              'id': track_id,
-            'uploader': info['user']['username'],
-            'upload_date': unified_strdate(info['created_at']),
+            'uploader': info.get('user', {}).get('username'),
+            'upload_date': unified_strdate(info.get('created_at')),
              'title': info['title'],
-            'description': info['description'],
+            'description': info.get('description'),
              'thumbnail': thumbnail,
              'duration': int_or_none(info.get('duration'), 1000),
              'webpage_url': info.get('permalink_url'),
+            'license': info.get('license'),
          }
          formats = []
          if info.get('downloadable', False):
@@ -215,7 +226,7 @@ class SoundcloudIE(InfoExtractor):
              raise ExtractorError('Invalid URL: %s' % url)
  
          track_id = mobj.group('track_id')
-        token = None
+
          if track_id is not None:
              info_json_url = 'http://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
              full_title = track_id
@@ -249,7 +260,20 @@ class SoundcloudIE(InfoExtractor):
          return self._extract_info_dict(info, full_title, secret_token=token)
  
  
-class SoundcloudSetIE(SoundcloudIE):
+class SoundcloudPlaylistBaseIE(SoundcloudIE):
+    @staticmethod
+    def _extract_id(e):
+        return compat_str(e['id']) if e.get('id') else None
+
+    def _extract_track_entries(self, tracks):
+        return [
+            self.url_result(
+                track['permalink_url'], SoundcloudIE.ie_key(),
+                video_id=self._extract_id(track))
+            for track in tracks if track.get('permalink_url')]
+
+
+class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
      _VALID_URL = r'https?://(?:(?:www|m)\.)?soundcloud\.com/(?P<uploader>[\w\d-]+)/sets/(?P<slug_title>[\w\d-]+)(?:/(?P<token>[^?/]+))?'
      IE_NAME = 'soundcloud:set'
      _TESTS = [{
@@ -259,6 +283,9 @@ class SoundcloudSetIE(SoundcloudIE):
              'title': 'The Royal Concept EP',
          },
          'playlist_mincount': 6,
+    }, {
+        'url': 'https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep/token',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -285,7 +312,7 @@ class SoundcloudSetIE(SoundcloudIE):
              msgs = (compat_str(err['error_message']) for err in info['errors'])
              raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
  
-        entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in info['tracks']]
+        entries = self._extract_track_entries(info['tracks'])
  
          return {
              '_type': 'playlist',
@@ -295,7 +322,7 @@ class SoundcloudSetIE(SoundcloudIE):
          }
  
  
-class SoundcloudUserIE(SoundcloudIE):
+class SoundcloudUserIE(SoundcloudPlaylistBaseIE):
      _VALID_URL = r'''(?x)
                          https?://
                              (?:(?:www|m)\.)?soundcloud\.com/
@@ -312,21 +339,21 @@ class SoundcloudUserIE(SoundcloudIE):
              'id': '114582580',
              'title': 'The Akashic Chronicler (All)',
          },
-        'playlist_mincount': 111,
+        'playlist_mincount': 74,
      }, {
          'url': 'https://soundcloud.com/the-akashic-chronicler/tracks',
          'info_dict': {
              'id': '114582580',
              'title': 'The Akashic Chronicler (Tracks)',
          },
-        'playlist_mincount': 50,
+        'playlist_mincount': 37,
      }, {
          'url': 'https://soundcloud.com/the-akashic-chronicler/sets',
          'info_dict': {
              'id': '114582580',
              'title': 'The Akashic Chronicler (Playlists)',
          },
-        'playlist_mincount': 3,
+        'playlist_mincount': 2,
      }, {
          'url': 'https://soundcloud.com/the-akashic-chronicler/reposts',
          'info_dict': {
@@ -345,7 +372,7 @@ class SoundcloudUserIE(SoundcloudIE):
          'url': 'https://soundcloud.com/grynpyret/spotlight',
          'info_dict': {
              'id': '7098329',
-            'title': 'Grynpyret (Spotlight)',
+            'title': 'GRYNPYRET (Spotlight)',
          },
          'playlist_mincount': 1,
      }]
@@ -407,13 +434,14 @@ class SoundcloudUserIE(SoundcloudIE):
                  for cand in candidates:
                      if isinstance(cand, dict):
                          permalink_url = cand.get('permalink_url')
+                        entry_id = self._extract_id(cand)
                          if permalink_url and permalink_url.startswith('http'):
-                            return permalink_url
+                            return permalink_url, entry_id
  
              for e in collection:
-                permalink_url = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
+                permalink_url, entry_id = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
                  if permalink_url:
-                    entries.append(self.url_result(permalink_url))
+                    entries.append(self.url_result(permalink_url, video_id=entry_id))
  
              next_href = response.get('next_href')
              if not next_href:
@@ -433,7 +461,7 @@ class SoundcloudUserIE(SoundcloudIE):
          }
  
  
-class SoundcloudPlaylistIE(SoundcloudIE):
+class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
      _VALID_URL = r'https?://api\.soundcloud\.com/playlists/(?P<id>[0-9]+)(?:/?\?secret_token=(?P<token>[^&]+?))?$'
      IE_NAME = 'soundcloud:playlist'
      _TESTS = [{
@@ -463,7 +491,7 @@ class SoundcloudPlaylistIE(SoundcloudIE):
          data = self._download_json(
              base_url + data, playlist_id, 'Downloading playlist')
  
-        entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in data['tracks']]
+        entries = self._extract_track_entries(data['tracks'])
  
          return {
              '_type': 'playlist',
diff --git a/youtube_dl/extractor/southpark.py b/youtube_dl/extractor/southpark.py

index 87b6504682a4feb50ce65c3eea43d07041aae31c..08f8c5744a84dffda03904afd30d44cac42f2917 100644 (file)
--- a/youtube_dl/extractor/southpark.py
+++ b/youtube_dl/extractor/southpark.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .mtv import MTVServicesInfoExtractor
@@ -17,6 +17,8 @@ class SouthParkIE(MTVServicesInfoExtractor):
              'ext': 'mp4',
              'title': 'South Park|Bat Daded',
              'description': 'Randy disqualifies South Park by getting into a fight with Bat Dad.',
+            'timestamp': 1112760000,
+            'upload_date': '20050406',
          },
      }]
  
@@ -28,7 +30,12 @@ class SouthParkEsIE(SouthParkIE):
  
      _TESTS = [{
          'url': 'http://southpark.cc.com/episodios-en-espanol/s01e01-cartman-consigue-una-sonda-anal#source=351c1323-0b96-402d-a8b9-40d01b2e9bde&position=1&sort=!airdate',
+        'info_dict': {
+            'title': 'Cartman Consigue Una Sonda Anal',
+            'description': 'Cartman Consigue Una Sonda Anal',
+        },
          'playlist_count': 4,
+        'skip': 'Geo-restricted',
      }]
  
  
@@ -42,17 +49,27 @@ class SouthParkDeIE(SouthParkIE):
          'info_dict': {
              'id': '85487c96-b3b9-4e39-9127-ad88583d9bf2',
              'ext': 'mp4',
-            'title': 'The Government Won\'t Respect My Privacy',
+            'title': 'South Park|The Government Won\'t Respect My Privacy',
              'description': 'Cartman explains the benefits of "Shitter" to Stan, Kyle and Craig.',
+            'timestamp': 1380160800,
+            'upload_date': '20130926',
          },
      }, {
          # non-ASCII characters in initial URL
          'url': 'http://www.southpark.de/alle-episoden/s18e09-hashtag-aufwärmen',
-        'playlist_count': 4,
+        'info_dict': {
+            'title': 'Hashtag „Aufwärmen“',
+            'description': 'Kyle will mit seinem kleinen Bruder Ike Videospiele spielen. Als der nicht mehr mit ihm spielen will, hat Kyle Angst, dass er die Kids von heute nicht mehr versteht.',
+        },
+        'playlist_count': 3,
      }, {
          # non-ASCII characters in redirect URL
          'url': 'http://www.southpark.de/alle-episoden/s18e09',
-        'playlist_count': 4,
+        'info_dict': {
+            'title': 'Hashtag „Aufwärmen“',
+            'description': 'Kyle will mit seinem kleinen Bruder Ike Videospiele spielen. Als der nicht mehr mit ihm spielen will, hat Kyle Angst, dass er die Kids von heute nicht mehr versteht.',
+        },
+        'playlist_count': 3,
      }]
  
  
@@ -63,7 +80,11 @@ class SouthParkNlIE(SouthParkIE):
  
      _TESTS = [{
          'url': 'http://www.southpark.nl/full-episodes/s18e06-freemium-isnt-free',
-        'playlist_count': 4,
+        'info_dict': {
+            'title': 'Freemium Isn\'t Free',
+            'description': 'Stan is addicted to the new Terrance and Phillip mobile game.',
+        },
+        'playlist_mincount': 3,
      }]
  
  
@@ -74,5 +95,9 @@ class SouthParkDkIE(SouthParkIE):
  
      _TESTS = [{
          'url': 'http://www.southparkstudios.dk/full-episodes/s18e07-grounded-vindaloop',
-        'playlist_count': 4,
+        'info_dict': {
+            'title': 'Grounded Vindaloop',
+            'description': 'Butters is convinced he\'s living in a virtual reality.',
+        },
+        'playlist_mincount': 3,
      }]
diff --git a/youtube_dl/extractor/spankbang.py b/youtube_dl/extractor/spankbang.py

index 50433d0f678f27c348031dbe0d6fcc3774d021b7..186d22b7d1608b01bb0a3d45082403e6a58bb05e 100644 (file)
--- a/youtube_dl/extractor/spankbang.py
+++ b/youtube_dl/extractor/spankbang.py
@@ -14,7 +14,7 @@ class SpankBangIE(InfoExtractor):
              'id': '3vvn',
              'ext': 'mp4',
              'title': 'fantasy solo',
-            'description': 'dillion harper masturbates on a bed',
+            'description': 'Watch fantasy solo free HD porn video - 05 minutes - dillion harper masturbates on a bed free adult movies.',
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': 'silly2587',
              'age_limit': 18,
@@ -44,12 +44,10 @@ class SpankBangIE(InfoExtractor):
  
          title = self._html_search_regex(
              r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title')
-        description = self._search_regex(
-            r'class="desc"[^>]*>([^<]+)',
-            webpage, 'description', default=None)
+        description = self._og_search_description(webpage)
          thumbnail = self._og_search_thumbnail(webpage)
          uploader = self._search_regex(
-            r'class="user"[^>]*>([^<]+)',
+            r'class="user"[^>]*><img[^>]+>([^<]+)',
              webpage, 'uploader', fatal=False)
  
          age_limit = self._rta_search(webpage)
diff --git a/youtube_dl/extractor/spankwire.py b/youtube_dl/extractor/spankwire.py

index 692fd78e886c0a6a932adce4659f2564beeab7e6..92a7120a3242e732ceb58f51b4391a5efbc569d8 100644 (file)
--- a/youtube_dl/extractor/spankwire.py
+++ b/youtube_dl/extractor/spankwire.py
@@ -96,20 +96,18 @@ class SpankwireIE(InfoExtractor):
          formats = []
          for height, video_url in zip(heights, video_urls):
              path = compat_urllib_parse_urlparse(video_url).path
-            _, quality = path.split('/')[4].split('_')[:2]
-            f = {
+            m = re.search(r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', path)
+            if m:
+                tbr = int(m.group('tbr'))
+                height = int(m.group('height'))
+            else:
+                tbr = None
+            formats.append({
                  'url': video_url,
+                'format_id': '%dp' % height,
                  'height': height,
-            }
-            tbr = self._search_regex(r'^(\d+)[Kk]$', quality, 'tbr', default=None)
-            if tbr:
-                f.update({
-                    'tbr': int(tbr),
-                    'format_id': '%dp' % height,
-                })
-            else:
-                f['format_id'] = quality
-            formats.append(f)
+                'tbr': tbr,
+            })
          self._sort_formats(formats)
  
          age_limit = self._rta_search(webpage)
diff --git a/youtube_dl/extractor/spiegel.py b/youtube_dl/extractor/spiegel.py

index 39a7aaf9d630203dc1796b3b5621aad3c433f575..ec1b603889754af70d516e70c123be7d2604387a 100644 (file)
--- a/youtube_dl/extractor/spiegel.py
+++ b/youtube_dl/extractor/spiegel.py
@@ -1,11 +1,16 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
  from .spiegeltv import SpiegeltvIE
+from ..compat import compat_urlparse
+from ..utils import (
+    extract_attributes,
+    unified_strdate,
+    get_element_by_attribute,
+)
  
  
  class SpiegelIE(InfoExtractor):
@@ -19,6 +24,7 @@ class SpiegelIE(InfoExtractor):
              'title': 'Vulkanausbruch in Ecuador: Der "Feuerschlund" ist wieder aktiv',
              'description': 'md5:8029d8310232196eb235d27575a8b9f4',
              'duration': 49,
+            'upload_date': '20130311',
          },
      }, {
          'url': 'http://www.spiegel.de/video/schach-wm-videoanalyse-des-fuenften-spiels-video-1309159.html',
@@ -29,6 +35,7 @@ class SpiegelIE(InfoExtractor):
              'title': 'Schach-WM in der Videoanalyse: Carlsen nutzt die Fehlgriffe des Titelverteidigers',
              'description': 'md5:c2322b65e58f385a820c10fa03b2d088',
              'duration': 983,
+            'upload_date': '20131115',
          },
      }, {
          'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-embed.html',
@@ -38,6 +45,7 @@ class SpiegelIE(InfoExtractor):
              'ext': 'mp4',
              'description': 'SPIEGEL ONLINE-Nutzer durften den deutschen Astronauten Alexander Gerst über sein Leben auf der ISS-Station befragen. Hier kommen seine Antworten auf die besten sechs Fragen.',
              'title': 'Fragen an Astronaut Alexander Gerst: "Bekommen Sie die Tageszeiten mit?"',
+            'upload_date': '20140904',
          }
      }, {
          'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html',
@@ -52,10 +60,10 @@ class SpiegelIE(InfoExtractor):
          if SpiegeltvIE.suitable(handle.geturl()):
              return self.url_result(handle.geturl(), 'Spiegeltv')
  
-        title = re.sub(r'\s+', ' ', self._html_search_regex(
-            r'(?s)<(?:h1|div) class="module-title"[^>]*>(.*?)</(?:h1|div)>',
-            webpage, 'title'))
-        description = self._html_search_meta('description', webpage, 'description')
+        video_data = extract_attributes(self._search_regex(r'(<div[^>]+id="spVideoElements"[^>]+>)', webpage, 'video element', default=''))
+
+        title = video_data.get('data-video-title') or get_element_by_attribute('class', 'module-title', webpage)
+        description = video_data.get('data-video-teaser') or self._html_search_meta('description', webpage, 'description')
  
          base_url = self._search_regex(
              [r'server\s*:\s*(["\'])(?P<url>.+?)\1', r'var\s+server\s*=\s*"(?P<url>[^"]+)\"'],
@@ -87,14 +95,15 @@ class SpiegelIE(InfoExtractor):
          return {
              'id': video_id,
              'title': title,
-            'description': description,
+            'description': description.strip() if description else None,
              'duration': duration,
+            'upload_date': unified_strdate(video_data.get('data-video-date')),
              'formats': formats,
          }
  
  
  class SpiegelArticleIE(InfoExtractor):
-    _VALID_URL = 'https?://www\.spiegel\.de/(?!video/)[^?#]*?-(?P<id>[0-9]+)\.html'
+    _VALID_URL = r'https?://(?:www\.)?spiegel\.de/(?!video/)[^?#]*?-(?P<id>[0-9]+)\.html'
      IE_NAME = 'Spiegel:Article'
      IE_DESC = 'Articles on spiegel.de'
      _TESTS = [{
@@ -104,6 +113,7 @@ class SpiegelArticleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Faszination Badminton: Nennt es bloß nicht Federball',
              'description': 're:^Patrick Kämnitz gehört.{100,}',
+            'upload_date': '20140825',
          },
      }, {
          'url': 'http://www.spiegel.de/wissenschaft/weltall/astronaut-alexander-gerst-antwortet-spiegel-online-lesern-a-989876.html',
diff --git a/youtube_dl/extractor/spike.py b/youtube_dl/extractor/spike.py

index 182f286dfefc4023483c422fbf6c6a73203b86ff..218785ee4e11045bcbb09416cd3bc6862a757ac0 100644 (file)
--- a/youtube_dl/extractor/spike.py
+++ b/youtube_dl/extractor/spike.py
@@ -4,26 +4,31 @@ from .mtv import MTVServicesInfoExtractor
  
  
  class SpikeIE(MTVServicesInfoExtractor):
-    _VALID_URL = r'''(?x)https?://
-        (?:www\.spike\.com/(?:video-(?:clips|playlists)|(?:full-)?episodes)/.+|
-         m\.spike\.com/videos/video\.rbml\?id=(?P<id>[^&]+))
-        '''
-    _TEST = {
+    _VALID_URL = r'https?://(?:[^/]+\.)?spike\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'
+    _TESTS = [{
          'url': 'http://www.spike.com/video-clips/lhtu8m/auction-hunters-can-allen-ride-a-hundred-year-old-motorcycle',
          'md5': '1a9265f32b0c375793d6c4ce45255256',
          'info_dict': {
              'id': 'b9c8221a-4e50-479a-b86d-3333323e38ba',
              'ext': 'mp4',
-            'title': 'Auction Hunters|Can Allen Ride A Hundred Year-Old Motorcycle?',
+            'title': 'Auction Hunters|December 27, 2013|4|414|Can Allen Ride A Hundred Year-Old Motorcycle?',
              'description': 'md5:fbed7e82ed5fad493615b3094a9499cb',
+            'timestamp': 1388120400,
+            'upload_date': '20131227',
          },
-    }
+    }, {
+        'url': 'http://www.spike.com/video-clips/lhtu8m/',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.spike.com/video-clips/lhtu8m',
+        'only_matching': True,
+    }, {
+        'url': 'http://bellator.spike.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
+        'only_matching': True,
+    }, {
+        'url': 'http://bellator.spike.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
+        'only_matching': True,
+    }]
  
      _FEED_URL = 'http://www.spike.com/feeds/mrss/'
      _MOBILE_TEMPLATE = 'http://m.spike.com/videos/video.rbml?id=%s'
-
-    def _real_extract(self, url):
-        mobile_id = self._match_id(url)
-        if mobile_id:
-            url = 'http://www.spike.com/video-clips/%s' % mobile_id
-        return super(SpikeIE, self)._real_extract(url)
diff --git a/youtube_dl/extractor/sportbox.py b/youtube_dl/extractor/sportbox.py

index 4f0c66213cc1269f44f4c0d9672a83ba3999d441..e5c28ae890ee61536052a5716677d486d0a5b43e 100644 (file)
--- a/youtube_dl/extractor/sportbox.py
+++ b/youtube_dl/extractor/sportbox.py
@@ -6,6 +6,7 @@ import re
  from .common import InfoExtractor
  from ..compat import compat_urlparse
  from ..utils import (
+    js_to_json,
      unified_strdate,
  )
  
@@ -94,19 +95,32 @@ class SportBoxEmbedIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        hls = self._search_regex(
-            r"sportboxPlayer\.jwplayer_common_params\.file\s*=\s*['\"]([^'\"]+)['\"]",
-            webpage, 'hls file')
+        formats = []
  
-        formats = self._extract_m3u8_formats(hls, video_id, 'mp4')
-        self._sort_formats(formats)
+        def cleanup_js(code):
+            # desktop_advert_config contains complex Javascripts and we don't need it
+            return js_to_json(re.sub(r'desktop_advert_config.*', '', code))
+
+        jwplayer_data = self._parse_json(self._search_regex(
+            r'(?s)player\.setup\(({.+?})\);', webpage, 'jwplayer settings'), video_id,
+            transform_source=cleanup_js)
+
+        hls_url = jwplayer_data.get('hls_url')
+        if hls_url:
+            formats.extend(self._extract_m3u8_formats(
+                hls_url, video_id, ext='mp4', m3u8_id='hls'))
  
-        title = self._search_regex(
-            r'sportboxPlayer\.node_title\s*=\s*"([^"]+)"', webpage, 'title')
+        rtsp_url = jwplayer_data.get('rtsp_url')
+        if rtsp_url:
+            formats.append({
+                'url': rtsp_url,
+                'format_id': 'rtsp',
+            })
+
+        self._sort_formats(formats)
  
-        thumbnail = self._search_regex(
-            r'sportboxPlayer\.jwplayer_common_params\.image\s*=\s*"([^"]+)"',
-            webpage, 'thumbnail', default=None)
+        title = jwplayer_data['node_title']
+        thumbnail = jwplayer_data.get('image_url')
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/sportschau.py b/youtube_dl/extractor/sportschau.py

new file mode 100644 (file)

index 0000000..0d7925a
--- /dev/null
+++ b/youtube_dl/extractor/sportschau.py
@@ -0,0 +1,38 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .wdr import WDRBaseIE
+from ..utils import get_element_by_attribute
+
+
+class SportschauIE(WDRBaseIE):
+    IE_NAME = 'Sportschau'
+    _VALID_URL = r'https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video-?(?P<id>[^/#?]+)\.html'
+    _TEST = {
+        'url': 'http://www.sportschau.de/uefaeuro2016/videos/video-dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100.html',
+        'info_dict': {
+            'id': 'mdb-1140188',
+            'display_id': 'dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100',
+            'ext': 'mp4',
+            'title': 'DFB-Team geht gut gelaunt ins Spiel gegen Polen',
+            'description': 'Vor dem zweiten Gruppenspiel gegen Polen herrscht gute Stimmung im deutschen Team. Insbesondere Bastian Schweinsteiger strotzt vor Optimismus nach seinem Tor gegen die Ukraine.',
+            'upload_date': '20160615',
+        },
+        'skip': 'Geo-restricted to Germany',
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+        title = get_element_by_attribute('class', 'headline', webpage)
+        description = self._html_search_meta('description', webpage, 'description')
+
+        info = self._extract_wdr_video(webpage, video_id)
+
+        info.update({
+            'title': title,
+            'description': description,
+        })
+
+        return info
diff --git a/youtube_dl/extractor/srmediathek.py b/youtube_dl/extractor/srmediathek.py

index 74d01183f5f396fb9499a8426775886faed5961d..b03272f7a273e8a3726adb03d805bd2a449849bf 100644 (file)
--- a/youtube_dl/extractor/srmediathek.py
+++ b/youtube_dl/extractor/srmediathek.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .ard import ARDMediathekIE
@@ -9,8 +9,9 @@ from ..utils import (
  
  
  class SRMediathekIE(ARDMediathekIE):
+    IE_NAME = 'sr:mediathek'
      IE_DESC = 'Saarländischer Rundfunk'
-    _VALID_URL = r'https?://sr-mediathek\.sr-online\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
  
      _TESTS = [{
          'url': 'http://sr-mediathek.sr-online.de/index.php?seite=7&id=28455',
@@ -34,7 +35,9 @@ class SRMediathekIE(ARDMediathekIE):
              # m3u8 download
              'skip_download': True,
          },
-        'expected_warnings': ['Unable to download f4m manifest']
+    }, {
+        'url': 'http://sr-mediathek.de/index.php?seite=7&id=7480',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/ssa.py b/youtube_dl/extractor/ssa.py

deleted file mode 100644 (file)

index 54d1843..0000000
--- a/youtube_dl/extractor/ssa.py
+++ /dev/null
@@ -1,58 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    unescapeHTML,
-    parse_duration,
-)
-
-
-class SSAIE(InfoExtractor):
-    _VALID_URL = r'https?://ssa\.nls\.uk/film/(?P<id>\d+)'
-    _TEST = {
-        'url': 'http://ssa.nls.uk/film/3561',
-        'info_dict': {
-            'id': '3561',
-            'ext': 'flv',
-            'title': 'SHETLAND WOOL',
-            'description': 'md5:c5afca6871ad59b4271e7704fe50ab04',
-            'duration': 900,
-            'thumbnail': 're:^https?://.*\.jpg$',
-        },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        streamer = self._search_regex(
-            r"'streamer'\s*,\S*'(rtmp[^']+)'", webpage, 'streamer')
-        play_path = self._search_regex(
-            r"'file'\s*,\s*'([^']+)'", webpage, 'file').rpartition('.')[0]
-
-        def search_field(field_name, fatal=False):
-            return self._search_regex(
-                r'<span\s+class="field_title">%s:</span>\s*<span\s+class="field_content">([^<]+)</span>' % field_name,
-                webpage, 'title', fatal=fatal)
-
-        title = unescapeHTML(search_field('Title', fatal=True)).strip('()[]')
-        description = unescapeHTML(search_field('Description'))
-        duration = parse_duration(search_field('Running time'))
-        thumbnail = self._search_regex(
-            r"'image'\s*,\s*'([^']+)'", webpage, 'thumbnails', fatal=False)
-
-        return {
-            'id': video_id,
-            'url': streamer,
-            'play_path': play_path,
-            'ext': 'flv',
-            'title': title,
-            'description': description,
-            'duration': duration,
-            'thumbnail': thumbnail,
-        }
diff --git a/youtube_dl/extractor/stitcher.py b/youtube_dl/extractor/stitcher.py

index d5c852f5207bdad9510a720d69b0cd70527f9f3f..0f8782d038c9fdadf903b05479ff468a039c6aa4 100644 (file)
--- a/youtube_dl/extractor/stitcher.py
+++ b/youtube_dl/extractor/stitcher.py
@@ -56,7 +56,7 @@ class StitcherIE(InfoExtractor):
  
          episode = self._parse_json(
              js_to_json(self._search_regex(
-                r'(?s)var\s+stitcher\s*=\s*({.+?});\n', webpage, 'episode config')),
+                r'(?s)var\s+stitcher(?:Config)?\s*=\s*({.+?});\n', webpage, 'episode config')),
              display_id)['config']['episode']
  
          title = unescapeHTML(episode['title'])
diff --git a/youtube_dl/extractor/streamable.py b/youtube_dl/extractor/streamable.py

new file mode 100644 (file)

index 0000000..2c26fa6
--- /dev/null
+++ b/youtube_dl/extractor/streamable.py
@@ -0,0 +1,108 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    float_or_none,
+    int_or_none,
+)
+
+
+class StreamableIE(InfoExtractor):
+    _VALID_URL = r'https?://streamable\.com/(?:e/)?(?P<id>\w+)'
+    _TESTS = [
+        {
+            'url': 'https://streamable.com/dnd1',
+            'md5': '3e3bc5ca088b48c2d436529b64397fef',
+            'info_dict': {
+                'id': 'dnd1',
+                'ext': 'mp4',
+                'title': 'Mikel Oiarzabal scores to make it 0-3 for La Real against Espanyol',
+                'thumbnail': 're:https?://.*\.jpg$',
+                'uploader': 'teabaker',
+                'timestamp': 1454964157.35115,
+                'upload_date': '20160208',
+                'duration': 61.516,
+                'view_count': int,
+            }
+        },
+        # older video without bitrate, width/height, etc. info
+        {
+            'url': 'https://streamable.com/moo',
+            'md5': '2cf6923639b87fba3279ad0df3a64e73',
+            'info_dict': {
+                'id': 'moo',
+                'ext': 'mp4',
+                'title': '"Please don\'t eat me!"',
+                'thumbnail': 're:https?://.*\.jpg$',
+                'timestamp': 1426115495,
+                'upload_date': '20150311',
+                'duration': 12,
+                'view_count': int,
+            }
+        },
+        {
+            'url': 'https://streamable.com/e/dnd1',
+            'only_matching': True,
+        }
+    ]
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src=(?P<q1>[\'"])(?P<src>(?:https?:)?//streamable\.com/(?:(?!\1).+))(?P=q1)',
+            webpage)
+        if mobj:
+            return mobj.group('src')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        # Note: Using the ajax API, as the public Streamable API doesn't seem
+        # to return video info like the title properly sometimes, and doesn't
+        # include info like the video duration
+        video = self._download_json(
+            'https://streamable.com/ajax/videos/%s' % video_id, video_id)
+
+        # Format IDs:
+        # 0 The video is being uploaded
+        # 1 The video is being processed
+        # 2 The video has at least one file ready
+        # 3 The video is unavailable due to an error
+        status = video.get('status')
+        if status != 2:
+            raise ExtractorError(
+                'This video is currently unavailable. It may still be uploading or processing.',
+                expected=True)
+
+        title = video.get('reddit_title') or video['title']
+
+        formats = []
+        for key, info in video['files'].items():
+            if not info.get('url'):
+                continue
+            formats.append({
+                'format_id': key,
+                'url': self._proto_relative_url(info['url']),
+                'width': int_or_none(info.get('width')),
+                'height': int_or_none(info.get('height')),
+                'filesize': int_or_none(info.get('size')),
+                'fps': int_or_none(info.get('framerate')),
+                'vbr': float_or_none(info.get('bitrate'), 1000)
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video.get('description'),
+            'thumbnail': self._proto_relative_url(video.get('thumbnail_url')),
+            'uploader': video.get('owner', {}).get('user_name'),
+            'timestamp': float_or_none(video.get('date_added')),
+            'duration': float_or_none(video.get('duration')),
+            'view_count': int_or_none(video.get('plays')),
+            'formats': formats
+        }
diff --git a/youtube_dl/extractor/streamcloud.py b/youtube_dl/extractor/streamcloud.py

index 712359885fde90fa3032aeff1b2cb74afb761f35..6a6bb90c493a92fc2e644e2a550547460d899ca4 100644 (file)
--- a/youtube_dl/extractor/streamcloud.py
+++ b/youtube_dl/extractor/streamcloud.py
@@ -5,7 +5,7 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
-    sanitized_Request,
+    ExtractorError,
      urlencode_postdata,
  )
  
@@ -14,7 +14,7 @@ class StreamcloudIE(InfoExtractor):
      IE_NAME = 'streamcloud.eu'
      _VALID_URL = r'https?://streamcloud\.eu/(?P<id>[a-zA-Z0-9_-]+)(?:/(?P<fname>[^#?]*)\.html)?'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://streamcloud.eu/skp9j99s4bpz/youtube-dl_test_video_____________-BaW_jenozKc.mp4.html',
          'md5': '6bea4c7fa5daaacc2a946b7146286686',
          'info_dict': {
@@ -23,7 +23,10 @@ class StreamcloudIE(InfoExtractor):
              'title': 'youtube-dl test video  \'/\\ ä ↭',
          },
          'skip': 'Only available from the EU'
-    }
+    }, {
+        'url': 'http://streamcloud.eu/ua8cmfh1nbe6/NSHIP-148--KUC-NG--H264-.mp4.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -31,26 +34,36 @@ class StreamcloudIE(InfoExtractor):
  
          orig_webpage = self._download_webpage(url, video_id)
  
+        if '>File Not Found<' in orig_webpage:
+            raise ExtractorError(
+                'Video %s does not exist' % video_id, expected=True)
+
          fields = re.findall(r'''(?x)<input\s+
              type="(?:hidden|submit)"\s+
              name="([^"]+)"\s+
              (?:id="[^"]+"\s+)?
              value="([^"]*)"
              ''', orig_webpage)
-        post = urlencode_postdata(fields)
  
          self._sleep(12, video_id)
-        headers = {
-            b'Content-Type': b'application/x-www-form-urlencoded',
-        }
-        req = sanitized_Request(url, post, headers)
  
          webpage = self._download_webpage(
-            req, video_id, note='Downloading video page ...')
-        title = self._html_search_regex(
-            r'<h1[^>]*>([^<]+)<', webpage, 'title')
-        video_url = self._search_regex(
-            r'file:\s*"([^"]+)"', webpage, 'video URL')
+            url, video_id, data=urlencode_postdata(fields), headers={
+                b'Content-Type': b'application/x-www-form-urlencoded',
+            })
+
+        try:
+            title = self._html_search_regex(
+                r'<h1[^>]*>([^<]+)<', webpage, 'title')
+            video_url = self._search_regex(
+                r'file:\s*"([^"]+)"', webpage, 'video URL')
+        except ExtractorError:
+            message = self._html_search_regex(
+                r'(?s)<div[^>]+class=(["\']).*?msgboxinfo.*?\1[^>]*>(?P<message>.+?)</div>',
+                webpage, 'message', default=None, group='message')
+            if message:
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
+            raise
          thumbnail = self._search_regex(
              r'image:\s*"([^"]+)"', webpage, 'thumbnail URL', fatal=False)
  
diff --git a/youtube_dl/extractor/streamcz.py b/youtube_dl/extractor/streamcz.py

index d3d2b7eb7a6fa9db4008365e62e046b83490b064..9e533103c88b93157efdd28d7765a7e9ae961603 100644 (file)
--- a/youtube_dl/extractor/streamcz.py
+++ b/youtube_dl/extractor/streamcz.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import hashlib
diff --git a/youtube_dl/extractor/streetvoice.py b/youtube_dl/extractor/streetvoice.py

index 6a57fa60a5a2ea877f65a1af045f14d05377c1a9..e529051d100b8024007229200648ea259b3d1677 100644 (file)
--- a/youtube_dl/extractor/streetvoice.py
+++ b/youtube_dl/extractor/streetvoice.py
@@ -14,7 +14,6 @@ class StreetVoiceIE(InfoExtractor):
          'info_dict': {
              'id': '94440',
              'ext': 'mp3',
-            'filesize': 4167053,
              'title': '輸',
              'description': 'Crispy脆樂團 - 輸',
              'thumbnail': 're:^https?://.*\.jpg$',
@@ -32,20 +31,19 @@ class StreetVoiceIE(InfoExtractor):
          song_id = self._match_id(url)
  
          song = self._download_json(
-            'http://streetvoice.com/music/api/song/%s' % song_id, song_id)
+            'https://streetvoice.com/api/v1/public/song/%s/' % song_id, song_id, data=b'')
  
          title = song['name']
-        author = song['musician']['name']
+        author = song['user']['nickname']
  
          return {
              'id': song_id,
              'url': song['file'],
-            'filesize': song.get('size'),
              'title': title,
              'description': '%s - %s' % (author, title),
              'thumbnail': self._proto_relative_url(song.get('image'), 'http:'),
              'duration': song.get('length'),
              'upload_date': unified_strdate(song.get('created_at')),
              'uploader': author,
-            'uploader_id': compat_str(song['musician']['id']),
+            'uploader_id': compat_str(song['user']['id']),
          }
diff --git a/youtube_dl/extractor/sunporno.py b/youtube_dl/extractor/sunporno.py

index e527aa97188b1860e054f8af7c7bd7a33301729e..ef9be7926866f6420d802f14cfdf83b3a9e4f69b 100644 (file)
--- a/youtube_dl/extractor/sunporno.py
+++ b/youtube_dl/extractor/sunporno.py
@@ -12,25 +12,29 @@ from ..utils import (
  
  
  class SunPornoIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?sunporno\.com/videos/(?P<id>\d+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:(?:www\.)?sunporno\.com/videos|embeds\.sunporno\.com/embed)/(?P<id>\d+)'
+    _TESTS = [{
          'url': 'http://www.sunporno.com/videos/807778/',
-        'md5': '6457d3c165fd6de062b99ef6c2ff4c86',
+        'md5': '507887e29033502f29dba69affeebfc9',
          'info_dict': {
              'id': '807778',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'md5:0a400058e8105d39e35c35e7c5184164',
              'description': 'md5:a31241990e1bd3a64e72ae99afb325fb',
              'thumbnail': 're:^https?://.*\.jpg$',
              'duration': 302,
              'age_limit': 18,
          }
-    }
+    }, {
+        'url': 'http://embeds.sunporno.com/embed/807778',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(
+            'http://www.sunporno.com/videos/%s' % video_id, video_id)
  
          title = self._html_search_regex(
              r'<title>([^<]+)</title>', webpage, 'title')
@@ -40,7 +44,8 @@ class SunPornoIE(InfoExtractor):
              r'poster="([^"]+)"', webpage, 'thumbnail', fatal=False)
  
          duration = parse_duration(self._search_regex(
-            r'itemprop="duration">\s*(\d+:\d+)\s*<',
+            (r'itemprop="duration"[^>]*>\s*(\d+:\d+)\s*<',
+             r'>Duration:\s*<span[^>]+>\s*(\d+:\d+)\s*<'),
              webpage, 'duration', fatal=False))
  
          view_count = int_or_none(self._html_search_regex(
@@ -48,7 +53,7 @@ class SunPornoIE(InfoExtractor):
              webpage, 'view count', fatal=False))
          comment_count = int_or_none(self._html_search_regex(
              r'(\d+)</b> Comments?',
-            webpage, 'comment count', fatal=False))
+            webpage, 'comment count', fatal=False, default=None))
  
          formats = []
          quality = qualities(['mp4', 'flv'])
diff --git a/youtube_dl/extractor/svt.py b/youtube_dl/extractor/svt.py

index 2ab30e45ff7c65ab7dd1d6cff7a1952764799cc0..fb0a4b24ef5bf65ff13ca2288395f09540e71d48 100644 (file)
--- a/youtube_dl/extractor/svt.py
+++ b/youtube_dl/extractor/svt.py
@@ -6,20 +6,17 @@ import re
  from .common import InfoExtractor
  from ..utils import (
      determine_ext,
+    dict_get,
+    int_or_none,
+    try_get,
  )
  
  
  class SVTBaseIE(InfoExtractor):
-    def _extract_video(self, url, video_id):
-        info = self._download_json(url, video_id)
-
-        title = info['context']['title']
-        thumbnail = info['context'].get('thumbnailImage')
-
-        video_info = info['video']
+    def _extract_video(self, video_info, video_id):
          formats = []
          for vr in video_info['videoReferences']:
-            player_type = vr.get('playerType')
+            player_type = vr.get('playerType') or vr.get('format')
              vurl = vr['url']
              ext = determine_ext(vurl)
              if ext == 'm3u8':
@@ -40,27 +37,49 @@ class SVTBaseIE(InfoExtractor):
                      'format_id': player_type,
                      'url': vurl,
                  })
+        if not formats and video_info.get('rights', {}).get('geoBlockedSweden'):
+            self.raise_geo_restricted('This video is only available in Sweden')
          self._sort_formats(formats)
  
          subtitles = {}
-        subtitle_references = video_info.get('subtitleReferences')
+        subtitle_references = dict_get(video_info, ('subtitles', 'subtitleReferences'))
          if isinstance(subtitle_references, list):
              for sr in subtitle_references:
                  subtitle_url = sr.get('url')
+                subtitle_lang = sr.get('language', 'sv')
                  if subtitle_url:
-                    subtitles.setdefault('sv', []).append({'url': subtitle_url})
+                    if determine_ext(subtitle_url) == 'm3u8':
+                        # TODO(yan12125): handle WebVTT in m3u8 manifests
+                        continue
+
+                    subtitles.setdefault(subtitle_lang, []).append({'url': subtitle_url})
  
-        duration = video_info.get('materialLength')
-        age_limit = 18 if video_info.get('inappropriateForChildren') else 0
+        title = video_info.get('title')
+
+        series = video_info.get('programTitle')
+        season_number = int_or_none(video_info.get('season'))
+        episode = video_info.get('episodeTitle')
+        episode_number = int_or_none(video_info.get('episodeNumber'))
+
+        duration = int_or_none(dict_get(video_info, ('materialLength', 'contentDuration')))
+        age_limit = None
+        adult = dict_get(
+            video_info, ('inappropriateForChildren', 'blockedForChildren'),
+            skip_false_values=False)
+        if adult is not None:
+            age_limit = 18 if adult else 0
  
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
              'subtitles': subtitles,
-            'thumbnail': thumbnail,
              'duration': duration,
              'age_limit': age_limit,
+            'series': series,
+            'season_number': season_number,
+            'episode': episode,
+            'episode_number': episode_number,
          }
  
  
@@ -68,11 +87,11 @@ class SVTIE(SVTBaseIE):
      _VALID_URL = r'https?://(?:www\.)?svt\.se/wd\?(?:.*?&)?widgetId=(?P<widget_id>\d+)&.*?\barticleId=(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.svt.se/wd?widgetId=23991&sectionId=541&articleId=2900353&type=embed&contextSectionId=123&autostart=false',
-        'md5': '9648197555fc1b49e3dc22db4af51d46',
+        'md5': '33e9a5d8f646523ce0868ecfb0eed77d',
          'info_dict': {
              'id': '2900353',
-            'ext': 'flv',
-            'title': 'Här trycker Jagr till Giroux (under SVT-intervjun)',
+            'ext': 'mp4',
+            'title': 'Stjärnorna skojar till det - under SVT-intervjun',
              'duration': 27,
              'age_limit': 0,
          },
@@ -89,15 +108,20 @@ class SVTIE(SVTBaseIE):
          mobj = re.match(self._VALID_URL, url)
          widget_id = mobj.group('widget_id')
          article_id = mobj.group('id')
-        return self._extract_video(
+
+        info = self._download_json(
              'http://www.svt.se/wd?widgetId=%s&articleId=%s&format=json&type=embed&output=json' % (widget_id, article_id),
              article_id)
  
+        info_dict = self._extract_video(info['video'], article_id)
+        info_dict['title'] = info['context']['title']
+        return info_dict
+
  
  class SVTPlayIE(SVTBaseIE):
      IE_DESC = 'SVT Play and Öppet arkiv'
-    _VALID_URL = r'https?://(?:www\.)?(?P<host>svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp)/(?P<id>[0-9]+)'
+    _TESTS = [{
          'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2',
          'md5': '2b6704fe4a28801e1a098bbf3c5ac611',
          'info_dict': {
@@ -113,12 +137,50 @@ class SVTPlayIE(SVTBaseIE):
                  }]
              },
          },
-    }
+    }, {
+        # geo restricted to Sweden
+        'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        host = mobj.group('host')
-        return self._extract_video(
-            'http://www.%s.se/video/%s?output=json' % (host, video_id),
-            video_id)
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        data = self._parse_json(
+            self._search_regex(
+                r'root\["__svtplay"\]\s*=\s*([^;]+);',
+                webpage, 'embedded data', default='{}'),
+            video_id, fatal=False)
+
+        thumbnail = self._og_search_thumbnail(webpage)
+
+        if data:
+            video_info = try_get(
+                data, lambda x: x['context']['dispatcher']['stores']['VideoTitlePageStore']['data']['video'],
+                dict)
+            if video_info:
+                info_dict = self._extract_video(video_info, video_id)
+                info_dict.update({
+                    'title': data['context']['dispatcher']['stores']['MetaStore']['title'],
+                    'thumbnail': thumbnail,
+                })
+                return info_dict
+
+        video_id = self._search_regex(
+            r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
+            webpage, 'video id', default=None)
+
+        if video_id:
+            data = self._download_json(
+                'http://www.svt.se/videoplayer-api/video/%s' % video_id, video_id)
+            info_dict = self._extract_video(data, video_id)
+            if not info_dict.get('title'):
+                info_dict['title'] = re.sub(
+                    r'\s*\|\s*.+?$', '',
+                    info_dict.get('episode') or self._og_search_title(webpage))
+            return info_dict
diff --git a/youtube_dl/extractor/swrmediathek.py b/youtube_dl/extractor/swrmediathek.py

index 58073eefeffc0f3ebc244a6087cad36662940228..6d69f7686b37bd2b39b6362373eadefedef0b932 100644 (file)
--- a/youtube_dl/extractor/swrmediathek.py
+++ b/youtube_dl/extractor/swrmediathek.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/syfy.py b/youtube_dl/extractor/syfy.py

index 5ca079f880717933a4216de6399046a44970d29b..def7e5a2c719e38fab0ec27d33d5abd920031670 100644 (file)
--- a/youtube_dl/extractor/syfy.py
+++ b/youtube_dl/extractor/syfy.py
@@ -1,46 +1,58 @@
  from __future__ import unicode_literals
  
-import re
+from .adobepass import AdobePassIE
+from ..utils import (
+    update_url_query,
+    smuggle_url,
+)
  
-from .common import InfoExtractor
-
-
-class SyfyIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.syfy\.com/(?:videos/.+?vid:(?P<id>[0-9]+)|(?!videos)(?P<video_name>[^/]+)(?:$|[?#]))'
  
+class SyfyIE(AdobePassIE):
+    _VALID_URL = r'https?://(?:www\.)?syfy\.com/(?:[^/]+/)?videos/(?P<id>[^/?#]+)'
      _TESTS = [{
-        'url': 'http://www.syfy.com/videos/Robot%20Combat%20League/Behind%20the%20Scenes/vid:2631458',
+        'url': 'http://www.syfy.com/theinternetruinedmylife/videos/the-internet-ruined-my-life-season-1-trailer',
          'info_dict': {
-            'id': 'NmqMrGnXvmO1',
-            'ext': 'flv',
-            'title': 'George Lucas has Advice for his Daughter',
-            'description': 'Listen to what insights George Lucas give his daughter Amanda.',
+            'id': '2968097',
+            'ext': 'mp4',
+            'title': 'The Internet Ruined My Life: Season 1 Trailer',
+            'description': 'One tweet, one post, one click, can destroy everything.',
+            'uploader': 'NBCU-MPAT',
+            'upload_date': '20170113',
+            'timestamp': 1484345640,
          },
-        'add_ie': ['ThePlatform'],
-    }, {
-        'url': 'http://www.syfy.com/wilwheaton',
-        'md5': '94dfa54ee3ccb63295b276da08c415f6',
-        'info_dict': {
-            'id': '4yoffOOXC767',
-            'ext': 'flv',
-            'title': 'The Wil Wheaton Project - Premiering May 27th at 10/9c.',
-            'description': 'The Wil Wheaton Project premieres May 27th at 10/9c. Don\'t miss it.',
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          },
          'add_ie': ['ThePlatform'],
-        'skip': 'Blocked outside the US',
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_name = mobj.group('video_name')
-        if video_name:
-            generic_webpage = self._download_webpage(url, video_name)
-            video_id = self._search_regex(
-                r'<iframe.*?class="video_iframe_page"\s+src="/_utils/video/thP_video_controller.php.*?_vid([0-9]+)">',
-                generic_webpage, 'video ID')
-            url = 'http://www.syfy.com/videos/%s/%s/vid:%s' % (
-                video_name, video_name, video_id)
-        else:
-            video_id = mobj.group('id')
-        webpage = self._download_webpage(url, video_id)
-        return self.url_result(self._og_search_video_url(webpage))
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        syfy_mpx = list(self._parse_json(self._search_regex(
+            r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
+            display_id)['syfy']['syfy_mpx'].values())[0]
+        video_id = syfy_mpx['mpxGUID']
+        title = syfy_mpx['episodeTitle']
+        query = {
+            'mbr': 'true',
+            'manifest': 'm3u',
+        }
+        if syfy_mpx.get('entitlement') == 'auth':
+            resource = self._get_mvpd_resource(
+                'syfy', title, video_id,
+                syfy_mpx.get('mpxRating', 'TV-14'))
+            query['auth'] = self._extract_mvpd_auth(
+                url, video_id, 'syfy', resource)
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(update_url_query(
+                self._proto_relative_url(syfy_mpx['releaseURL']), query),
+                {'force_smil_url': True}),
+            'title': title,
+            'id': video_id,
+            'display_id': display_id,
+        }
diff --git a/youtube_dl/extractor/sztvhu.py b/youtube_dl/extractor/sztvhu.py

index f562aa6d386ee891f4ab3a724bef53e20a6cec92..cfad3314642b0412f7fd31995828ee6ba8a6a5b9 100644 (file)
--- a/youtube_dl/extractor/sztvhu.py
+++ b/youtube_dl/extractor/sztvhu.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/tagesschau.py b/youtube_dl/extractor/tagesschau.py

index 73e7657d4bec7b1bc37753923744d92b769d8843..8670cee28d381de6011e3187db3024bcc40519de 100644 (file)
--- a/youtube_dl/extractor/tagesschau.py
+++ b/youtube_dl/extractor/tagesschau.py
@@ -1,45 +1,181 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
-from ..utils import parse_filesize
+from ..utils import (
+    determine_ext,
+    js_to_json,
+    parse_iso8601,
+    parse_filesize,
+)
+
+
+class TagesschauPlayerIE(InfoExtractor):
+    IE_NAME = 'tagesschau:player'
+    _VALID_URL = r'https?://(?:www\.)?tagesschau\.de/multimedia/(?P<kind>audio|video)/(?P=kind)-(?P<id>\d+)~player(?:_[^/?#&]+)?\.html'
+
+    _TESTS = [{
+        'url': 'http://www.tagesschau.de/multimedia/video/video-179517~player.html',
+        'md5': '8d09548d5c15debad38bee3a4d15ca21',
+        'info_dict': {
+            'id': '179517',
+            'ext': 'mp4',
+            'title': 'Marie Kristin Boese, ARD Berlin, über den zukünftigen Kurs der AfD',
+            'thumbnail': 're:^https?:.*\.jpg$',
+            'formats': 'mincount:6',
+        },
+    }, {
+        'url': 'https://www.tagesschau.de/multimedia/audio/audio-29417~player.html',
+        'md5': '76e6eec6ebd40740671cf0a2c88617e5',
+        'info_dict': {
+            'id': '29417',
+            'ext': 'mp3',
+            'title': 'Trabi - Bye, bye Rennpappe',
+            'thumbnail': 're:^https?:.*\.jpg$',
+            'formats': 'mincount:2',
+        },
+    }, {
+        'url': 'http://www.tagesschau.de/multimedia/audio/audio-29417~player_autoplay-true.html',
+        'only_matching': True,
+    }]
+
+    _FORMATS = {
+        'xs': {'quality': 0},
+        's': {'width': 320, 'height': 180, 'quality': 1},
+        'm': {'width': 512, 'height': 288, 'quality': 2},
+        'l': {'width': 960, 'height': 540, 'quality': 3},
+        'xl': {'width': 1280, 'height': 720, 'quality': 4},
+        'xxl': {'quality': 5},
+    }
+
+    def _extract_via_api(self, kind, video_id):
+        info = self._download_json(
+            'https://www.tagesschau.de/api/multimedia/{0}/{0}-{1}.json'.format(kind, video_id),
+            video_id)
+        title = info['headline']
+        formats = []
+        for media in info['mediadata']:
+            for format_id, format_url in media.items():
+                if determine_ext(format_url) == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, video_id, 'mp4',
+                        entry_protocol='m3u8_native', m3u8_id='hls'))
+                else:
+                    formats.append({
+                        'url': format_url,
+                        'format_id': format_id,
+                        'vcodec': 'none' if kind == 'audio' else None,
+                    })
+        self._sort_formats(formats)
+        timestamp = parse_iso8601(info.get('date'))
+        return {
+            'id': video_id,
+            'title': title,
+            'timestamp': timestamp,
+            'formats': formats,
+        }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+
+        # kind = mobj.group('kind').lower()
+        # if kind == 'video':
+        #     return self._extract_via_api(kind, video_id)
+
+        # JSON api does not provide some audio formats (e.g. ogg) thus
+        # extractiong audio via webpage
+
+        webpage = self._download_webpage(url, video_id)
+
+        title = self._og_search_title(webpage).strip()
+        formats = []
+
+        for media_json in re.findall(r'({src\s*:\s*["\']http[^}]+type\s*:[^}]+})', webpage):
+            media = self._parse_json(js_to_json(media_json), video_id, fatal=False)
+            if not media:
+                continue
+            src = media.get('src')
+            if not src:
+                return
+            quality = media.get('quality')
+            kind = media.get('type', '').split('/')[0]
+            ext = determine_ext(src)
+            f = {
+                'url': src,
+                'format_id': '%s_%s' % (quality, ext) if quality else ext,
+                'ext': ext,
+                'vcodec': 'none' if kind == 'audio' else None,
+            }
+            f.update(self._FORMATS.get(quality, {}))
+            formats.append(f)
+
+        self._sort_formats(formats)
+
+        thumbnail = self._og_search_thumbnail(webpage)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'formats': formats,
+        }
  
  
  class TagesschauIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?tagesschau\.de/multimedia/(?:[^/]+/)*?[^/#?]+?(?P<id>-?[0-9]+)(?:~_[^/#?]+?)?\.html'
+    _VALID_URL = r'https?://(?:www\.)?tagesschau\.de/(?P<path>[^/]+/(?:[^/]+/)*?(?P<id>[^/#?]+?(?:-?[0-9]+)?))(?:~_?[^/#?]+?)?\.html'
  
      _TESTS = [{
          'url': 'http://www.tagesschau.de/multimedia/video/video-102143.html',
-        'md5': '917a228bc7df7850783bc47979673a09',
+        'md5': 'f7c27a0eff3bfe8c7727e65f8fe1b1e6',
          'info_dict': {
-            'id': '102143',
+            'id': 'video-102143',
              'ext': 'mp4',
              'title': 'Regierungsumbildung in Athen: Neue Minister in Griechenland vereidigt',
-            'description': 'md5:171feccd9d9b3dd54d05d501568f6359',
+            'description': '18.07.2015 20:10 Uhr',
              'thumbnail': 're:^https?:.*\.jpg$',
          },
      }, {
          'url': 'http://www.tagesschau.de/multimedia/sendung/ts-5727.html',
          'md5': '3c54c1f6243d279b706bde660ceec633',
          'info_dict': {
-            'id': '5727',
+            'id': 'ts-5727',
              'ext': 'mp4',
-            'description': 'md5:695c01bfd98b7e313c501386327aea59',
              'title': 'Sendung: tagesschau \t04.12.2014 20:00 Uhr',
+            'description': 'md5:695c01bfd98b7e313c501386327aea59',
+            'thumbnail': 're:^https?:.*\.jpg$',
+        },
+    }, {
+        # exclusive audio
+        'url': 'http://www.tagesschau.de/multimedia/audio/audio-29417.html',
+        'md5': '76e6eec6ebd40740671cf0a2c88617e5',
+        'info_dict': {
+            'id': 'audio-29417',
+            'ext': 'mp3',
+            'title': 'Trabi - Bye, bye Rennpappe',
+            'description': 'md5:8687dda862cbbe2cfb2df09b56341317',
              'thumbnail': 're:^https?:.*\.jpg$',
          },
      }, {
-        'url': 'http://www.tagesschau.de/multimedia/politikimradio/audio-18407.html',
-        'md5': 'aef45de271c4bf0a5db834aa40bf774c',
+        # audio in article
+        'url': 'http://www.tagesschau.de/inland/bnd-303.html',
+        'md5': 'e0916c623e85fc1d2b26b78f299d3958',
          'info_dict': {
-            'id': '18407',
+            'id': 'bnd-303',
              'ext': 'mp3',
-            'title': 'Flüchtlingsdebatte: Hitzig, aber wenig hilfreich',
-            'description': 'Flüchtlingsdebatte: Hitzig, aber wenig hilfreich',
+            'title': 'Viele Baustellen für neuen BND-Chef',
+            'description': 'md5:1e69a54be3e1255b2b07cdbce5bcd8b4',
              'thumbnail': 're:^https?:.*\.jpg$',
          },
+    }, {
+        'url': 'http://www.tagesschau.de/inland/afd-parteitag-135.html',
+        'info_dict': {
+            'id': 'afd-parteitag-135',
+            'title': 'Möchtegern-Underdog mit Machtanspruch',
+        },
+        'playlist_count': 2,
      }, {
          'url': 'http://www.tagesschau.de/multimedia/sendung/tsg-3771.html',
          'only_matching': True,
@@ -61,88 +197,108 @@ class TagesschauIE(InfoExtractor):
      }, {
          'url': 'http://www.tagesschau.de/multimedia/video/video-102303~_bab-sendung-211.html',
          'only_matching': True,
+    }, {
+        'url': 'http://www.tagesschau.de/100sekunden/index.html',
+        'only_matching': True,
+    }, {
+        # playlist article with collapsing sections
+        'url': 'http://www.tagesschau.de/wirtschaft/faq-freihandelszone-eu-usa-101.html',
+        'only_matching': True,
      }]
  
-    _FORMATS = {
-        's': {'width': 256, 'height': 144, 'quality': 1},
-        'm': {'width': 512, 'height': 288, 'quality': 2},
-        'l': {'width': 960, 'height': 544, 'quality': 3},
-    }
+    @classmethod
+    def suitable(cls, url):
+        return False if TagesschauPlayerIE.suitable(url) else super(TagesschauIE, cls).suitable(url)
+
+    def _extract_formats(self, download_text, media_kind):
+        links = re.finditer(
+            r'<div class="button" title="(?P<title>[^"]*)"><a href="(?P<url>[^"]+)">(?P<name>.+?)</a></div>',
+            download_text)
+        formats = []
+        for l in links:
+            link_url = l.group('url')
+            if not link_url:
+                continue
+            format_id = self._search_regex(
+                r'.*/[^/.]+\.([^/]+)\.[^/.]+$', link_url, 'format ID',
+                default=determine_ext(link_url))
+            format = {
+                'format_id': format_id,
+                'url': l.group('url'),
+                'format_name': l.group('name'),
+            }
+            title = l.group('title')
+            if title:
+                if media_kind.lower() == 'video':
+                    m = re.match(
+                        r'''(?x)
+                            Video:\s*(?P<vcodec>[a-zA-Z0-9/._-]+)\s*&\#10;
+                            (?P<width>[0-9]+)x(?P<height>[0-9]+)px&\#10;
+                            (?P<vbr>[0-9]+)kbps&\#10;
+                            Audio:\s*(?P<abr>[0-9]+)kbps,\s*(?P<audio_desc>[A-Za-z\.0-9]+)&\#10;
+                            Gr&ouml;&szlig;e:\s*(?P<filesize_approx>[0-9.,]+\s+[a-zA-Z]*B)''',
+                        title)
+                    if m:
+                        format.update({
+                            'format_note': m.group('audio_desc'),
+                            'vcodec': m.group('vcodec'),
+                            'width': int(m.group('width')),
+                            'height': int(m.group('height')),
+                            'abr': int(m.group('abr')),
+                            'vbr': int(m.group('vbr')),
+                            'filesize_approx': parse_filesize(m.group('filesize_approx')),
+                        })
+                else:
+                    m = re.match(
+                        r'(?P<format>.+?)-Format\s*:\s*(?P<abr>\d+)kbps\s*,\s*(?P<note>.+)',
+                        title)
+                    if m:
+                        format.update({
+                            'format_note': '%s, %s' % (m.group('format'), m.group('note')),
+                            'vcodec': 'none',
+                            'abr': int(m.group('abr')),
+                        })
+            formats.append(format)
+        self._sort_formats(formats)
+        return formats
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id') or mobj.group('path')
          display_id = video_id.lstrip('-')
+
          webpage = self._download_webpage(url, display_id)
  
-        player_url = self._html_search_meta(
-            'twitter:player', webpage, 'player URL', default=None)
-        if player_url:
-            playerpage = self._download_webpage(
-                player_url, display_id, 'Downloading player page')
-
-            formats = []
-            for media in re.finditer(
-                    r'''(?x)
-                        (?P<q_url>["\'])(?P<url>http://media.+?)(?P=q_url)
-                        ,\s*type:(?P<q_type>["\'])(?P<type>video|audio)/(?P<ext>.+?)(?P=q_type)
-                        (?:,\s*quality:(?P<q_quality>["\'])(?P<quality>.+?)(?P=q_quality))?
-                    ''', playerpage):
-                url = media.group('url')
-                type_ = media.group('type')
-                ext = media.group('ext')
-                res = media.group('quality')
-                f = {
-                    'format_id': '%s_%s' % (res, ext) if res else ext,
-                    'url': url,
-                    'ext': ext,
-                    'vcodec': 'none' if type_ == 'audio' else None,
-                }
-                f.update(self._FORMATS.get(res, {}))
-                formats.append(f)
-            thumbnail = self._og_search_thumbnail(playerpage)
-            title = self._og_search_title(webpage).strip()
-            description = self._og_search_description(webpage).strip()
-        else:
+        title = self._html_search_regex(
+            r'<span[^>]*class="headline"[^>]*>(.+?)</span>',
+            webpage, 'title', default=None) or self._og_search_title(webpage)
+
+        DOWNLOAD_REGEX = r'(?s)<p>Wir bieten dieses (?P<kind>Video|Audio) in folgenden Formaten zum Download an:</p>\s*<div class="controls">(?P<links>.*?)</div>\s*<p>'
+
+        webpage_type = self._og_search_property('type', webpage, default=None)
+        if webpage_type == 'website':  # Article
+            entries = []
+            for num, (entry_title, media_kind, download_text) in enumerate(re.findall(
+                    r'(?s)<p[^>]+class="infotext"[^>]*>\s*(?:<a[^>]+>)?\s*<strong>(.+?)</strong>.*?</p>.*?%s' % DOWNLOAD_REGEX,
+                    webpage), 1):
+                entries.append({
+                    'id': '%s-%d' % (display_id, num),
+                    'title': '%s' % entry_title,
+                    'formats': self._extract_formats(download_text, media_kind),
+                })
+            if len(entries) > 1:
+                return self.playlist_result(entries, display_id, title)
+            formats = entries[0]['formats']
+        else:  # Assume single video
              download_text = self._search_regex(
-                r'(?s)<p>Wir bieten dieses Video in folgenden Formaten zum Download an:</p>\s*<div class="controls">(.*?)</div>\s*<p>',
-                webpage, 'download links')
-            links = re.finditer(
-                r'<div class="button" title="(?P<title>[^"]*)"><a href="(?P<url>[^"]+)">(?P<name>.+?)</a></div>',
-                download_text)
-            formats = []
-            for l in links:
-                format_id = self._search_regex(
-                    r'.*/[^/.]+\.([^/]+)\.[^/.]+', l.group('url'), 'format ID')
-                format = {
-                    'format_id': format_id,
-                    'url': l.group('url'),
-                    'format_name': l.group('name'),
-                }
-                m = re.match(
-                    r'''(?x)
-                        Video:\s*(?P<vcodec>[a-zA-Z0-9/._-]+)\s*&\#10;
-                        (?P<width>[0-9]+)x(?P<height>[0-9]+)px&\#10;
-                        (?P<vbr>[0-9]+)kbps&\#10;
-                        Audio:\s*(?P<abr>[0-9]+)kbps,\s*(?P<audio_desc>[A-Za-z\.0-9]+)&\#10;
-                        Gr&ouml;&szlig;e:\s*(?P<filesize_approx>[0-9.,]+\s+[a-zA-Z]*B)''',
-                    l.group('title'))
-                if m:
-                    format.update({
-                        'format_note': m.group('audio_desc'),
-                        'vcodec': m.group('vcodec'),
-                        'width': int(m.group('width')),
-                        'height': int(m.group('height')),
-                        'abr': int(m.group('abr')),
-                        'vbr': int(m.group('vbr')),
-                        'filesize_approx': parse_filesize(m.group('filesize_approx')),
-                    })
-                formats.append(format)
-            thumbnail = self._og_search_thumbnail(webpage)
-            description = self._html_search_regex(
-                r'(?s)<p class="teasertext">(.*?)</p>',
-                webpage, 'description', default=None)
-            title = self._html_search_regex(
-                r'<span class="headline".*?>(.*?)</span>', webpage, 'title')
+                DOWNLOAD_REGEX, webpage, 'download links', group='links')
+            media_kind = self._search_regex(
+                DOWNLOAD_REGEX, webpage, 'media kind', default='Video', group='kind')
+            formats = self._extract_formats(download_text, media_kind)
+        thumbnail = self._og_search_thumbnail(webpage)
+        description = self._html_search_regex(
+            r'(?s)<p class="teasertext">(.*?)</p>',
+            webpage, 'description', default=None)
  
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/tapely.py b/youtube_dl/extractor/tapely.py

deleted file mode 100644 (file)

index ed560bd..0000000
--- a/youtube_dl/extractor/tapely.py
+++ /dev/null
@@ -1,109 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    clean_html,
-    ExtractorError,
-    float_or_none,
-    parse_iso8601,
-    sanitized_Request,
-)
-
-
-class TapelyIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:tape\.ly|tapely\.com)/(?P<id>[A-Za-z0-9\-_]+)(?:/(?P<songnr>\d+))?'
-    _API_URL = 'http://tape.ly/showtape?id={0:}'
-    _S3_SONG_URL = 'http://mytape.s3.amazonaws.com/{0:}'
-    _SOUNDCLOUD_SONG_URL = 'http://api.soundcloud.com{0:}'
-    _TESTS = [
-        {
-            'url': 'http://tape.ly/my-grief-as-told-by-water',
-            'info_dict': {
-                'id': 23952,
-                'title': 'my grief as told by water',
-                'thumbnail': 're:^https?://.*\.png$',
-                'uploader_id': 16484,
-                'timestamp': 1411848286,
-                'description': 'For Robin and Ponkers, whom the tides of life have taken out to sea.',
-            },
-            'playlist_count': 13,
-        },
-        {
-            'url': 'http://tape.ly/my-grief-as-told-by-water/1',
-            'md5': '79031f459fdec6530663b854cbc5715c',
-            'info_dict': {
-                'id': 258464,
-                'title': 'Dreaming Awake  (My Brightest Diamond)',
-                'ext': 'm4a',
-            },
-        },
-        {
-            'url': 'https://tapely.com/my-grief-as-told-by-water',
-            'only_matching': True,
-        },
-    ]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('id')
-
-        playlist_url = self._API_URL.format(display_id)
-        request = sanitized_Request(playlist_url)
-        request.add_header('X-Requested-With', 'XMLHttpRequest')
-        request.add_header('Accept', 'application/json')
-        request.add_header('Referer', url)
-
-        playlist = self._download_json(request, display_id)
-
-        tape = playlist['tape']
-
-        entries = []
-        for s in tape['songs']:
-            song = s['song']
-            entry = {
-                'id': song['id'],
-                'duration': float_or_none(song.get('songduration'), 1000),
-                'title': song['title'],
-            }
-            if song['source'] == 'S3':
-                entry.update({
-                    'url': self._S3_SONG_URL.format(song['filename']),
-                })
-                entries.append(entry)
-            elif song['source'] == 'YT':
-                self.to_screen('YouTube video detected')
-                yt_id = song['filename'].replace('/youtube/', '')
-                entry.update(self.url_result(yt_id, 'Youtube', video_id=yt_id))
-                entries.append(entry)
-            elif song['source'] == 'SC':
-                self.to_screen('SoundCloud song detected')
-                sc_url = self._SOUNDCLOUD_SONG_URL.format(song['filename'])
-                entry.update(self.url_result(sc_url, 'Soundcloud'))
-                entries.append(entry)
-            else:
-                self.report_warning('Unknown song source: %s' % song['source'])
-
-        if mobj.group('songnr'):
-            songnr = int(mobj.group('songnr')) - 1
-            try:
-                return entries[songnr]
-            except IndexError:
-                raise ExtractorError(
-                    'No song with index: %s' % mobj.group('songnr'),
-                    expected=True)
-
-        return {
-            '_type': 'playlist',
-            'id': tape['id'],
-            'display_id': display_id,
-            'title': tape['name'],
-            'entries': entries,
-            'thumbnail': tape.get('image_url'),
-            'description': clean_html(tape.get('subtext')),
-            'like_count': tape.get('likescount'),
-            'uploader_id': tape.get('user_id'),
-            'timestamp': parse_iso8601(tape.get('published_at')),
-        }
diff --git a/youtube_dl/extractor/tass.py b/youtube_dl/extractor/tass.py

index c4ef70778b8ac8d2289bbbf5da3bbae5f65c263b..5293393efc219526b61fe04ff12ff25f1d49b33c 100644 (file)
--- a/youtube_dl/extractor/tass.py
+++ b/youtube_dl/extractor/tass.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import json
diff --git a/youtube_dl/extractor/tbs.py b/youtube_dl/extractor/tbs.py

new file mode 100644 (file)

index 0000000..bf93eb8
--- /dev/null
+++ b/youtube_dl/extractor/tbs.py
@@ -0,0 +1,56 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .turner import TurnerBaseIE
+from ..utils import extract_attributes
+
+
+class TBSIE(TurnerBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com/videos/(?:[^/]+/)+(?P<id>[^/?#]+)\.html'
+    _TESTS = [{
+        'url': 'http://www.tbs.com/videos/people-of-earth/season-1/extras/2007318/theatrical-trailer.html',
+        'md5': '9e61d680e2285066ade7199e6408b2ee',
+        'info_dict': {
+            'id': '2007318',
+            'ext': 'mp4',
+            'title': 'Theatrical Trailer',
+            'description': 'Catch the latest comedy from TBS, People of Earth, premiering Halloween night--Monday, October 31, at 9/8c.',
+        }
+    }, {
+        'url': 'http://www.tntdrama.com/videos/good-behavior/season-1/extras/1538823/you-better-run.html',
+        'md5': 'ce53c6ead5e9f3280b4ad2031a6fab56',
+        'info_dict': {
+            'id': '1538823',
+            'ext': 'mp4',
+            'title': 'You Better Run',
+            'description': 'Letty Raines must figure out what she\'s running toward while running away from her past. Good Behavior premieres November 15 at 9/8c.',
+        }
+    }]
+
+    def _real_extract(self, url):
+        domain, display_id = re.match(self._VALID_URL, url).groups()
+        site = domain[:3]
+        webpage = self._download_webpage(url, display_id)
+        video_params = extract_attributes(self._search_regex(r'(<[^>]+id="page-video"[^>]*>)', webpage, 'video params'))
+        query = None
+        clip_id = video_params.get('clipid')
+        if clip_id:
+            query = 'id=' + clip_id
+        else:
+            query = 'titleId=' + video_params['titleid']
+        return self._extract_cvp_info(
+            'http://www.%s.com/service/cvpXml?%s' % (domain, query), display_id, {
+                'default': {
+                    'media_src': 'http://ht.cdn.turner.com/%s/big' % site,
+                },
+                'secure': {
+                    'media_src': 'http://androidhls-secure.cdn.turner.com/%s/big' % site,
+                    'tokenizer_src': 'http://www.%s.com/video/processors/services/token_ipadAdobe.do' % domain,
+                },
+            }, {
+                'url': url,
+                'site_name': site.upper(),
+                'auth_required': video_params.get('isAuthRequired') != 'false',
+            })
diff --git a/youtube_dl/extractor/tdslifeway.py b/youtube_dl/extractor/tdslifeway.py

new file mode 100644 (file)

index 0000000..4d1f5c8
--- /dev/null
+++ b/youtube_dl/extractor/tdslifeway.py
@@ -0,0 +1,33 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class TDSLifewayIE(InfoExtractor):
+    _VALID_URL = r'https?://tds\.lifeway\.com/v1/trainingdeliverysystem/courses/(?P<id>\d+)/index\.html'
+
+    _TEST = {
+        # From http://www.ministrygrid.com/training-viewer/-/training/t4g-2014-conference/the-gospel-by-numbers-4/the-gospel-by-numbers
+        'url': 'http://tds.lifeway.com/v1/trainingdeliverysystem/courses/3453494717001/index.html?externalRegistration=AssetId%7C34F466F1-78F3-4619-B2AB-A8EFFA55E9E9%21InstanceId%7C0%21UserId%7Caaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa&grouping=http%3A%2F%2Flifeway.com%2Fvideo%2F3453494717001&activity_id=http%3A%2F%2Flifeway.com%2Fvideo%2F3453494717001&content_endpoint=http%3A%2F%2Ftds.lifeway.com%2Fv1%2Ftrainingdeliverysystem%2FScormEngineInterface%2FTCAPI%2Fcontent%2F&actor=%7B%22name%22%3A%5B%22Guest%20Guest%22%5D%2C%22account%22%3A%5B%7B%22accountServiceHomePage%22%3A%22http%3A%2F%2Fscorm.lifeway.com%2F%22%2C%22accountName%22%3A%22aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa%22%7D%5D%2C%22objectType%22%3A%22Agent%22%7D&content_token=462a50b2-b6f9-4970-99b1-930882c499fb&registration=93d6ec8e-7f7b-4ed3-bbc8-a857913c0b2a&externalConfiguration=access%7CFREE%21adLength%7C-1%21assignOrgId%7C4AE36F78-299A-425D-91EF-E14A899B725F%21assignOrgParentId%7C%21courseId%7C%21isAnonymous%7Cfalse%21previewAsset%7Cfalse%21previewLength%7C-1%21previewMode%7Cfalse%21royalty%7CFREE%21sessionId%7C671422F9-8E79-48D4-9C2C-4EE6111EA1CD%21trackId%7C&auth=Basic%20OjhmZjk5MDBmLTBlYTMtNDJhYS04YjFlLWE4MWQ3NGNkOGRjYw%3D%3D&endpoint=http%3A%2F%2Ftds.lifeway.com%2Fv1%2Ftrainingdeliverysystem%2FScormEngineInterface%2FTCAPI%2F',
+        'info_dict': {
+            'id': '3453494717001',
+            'ext': 'mp4',
+            'title': 'The Gospel by Numbers',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'upload_date': '20140410',
+            'description': 'Coming soon from T4G 2014!',
+            'uploader_id': '2034960640001',
+            'timestamp': 1397145591,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'add_ie': ['BrightcoveNew'],
+    }
+
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/2034960640001/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        brightcove_id = self._match_id(url)
+        return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/teachertube.py b/youtube_dl/extractor/teachertube.py

index 82675431f863fded8768241e2ad21c4874f8525d..df5d5556fadf82c8dc680643389fdeccf989793f 100644 (file)
--- a/youtube_dl/extractor/teachertube.py
+++ b/youtube_dl/extractor/teachertube.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/teachingchannel.py b/youtube_dl/extractor/teachingchannel.py

index e0477382ceabea0769bd0575ceb1f350ce8c0911..e89759714e6e3cea3da8a7007df838618f6f1cc1 100644 (file)
--- a/youtube_dl/extractor/teachingchannel.py
+++ b/youtube_dl/extractor/teachingchannel.py
@@ -7,10 +7,11 @@ from .ooyala import OoyalaIE
  
  
  class TeachingChannelIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.teachingchannel\.org/videos/(?P<title>.+)'
+    _VALID_URL = r'https?://(?:www\.)?teachingchannel\.org/videos/(?P<title>.+)'
  
      _TEST = {
          'url': 'https://www.teachingchannel.org/videos/teacher-teaming-evolution',
+        'md5': '3d6361864d7cac20b57c8784da17166f',
          'info_dict': {
              'id': 'F3bnlzbToeI6pLEfRyrlfooIILUjz4nM',
              'ext': 'mp4',
@@ -19,9 +20,9 @@ class TeachingChannelIE(InfoExtractor):
              'duration': 422.255,
          },
          'params': {
-            # m3u8 download
              'skip_download': True,
          },
+        'add_ie': ['Ooyala'],
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/teamcoco.py b/youtube_dl/extractor/teamcoco.py

index b49ab5f5b98c2d6219d1d17a1c0aea02eb534f61..75346393b017995098d08136df2cbffad1e1c6bb 100644 (file)
--- a/youtube_dl/extractor/teamcoco.py
+++ b/youtube_dl/extractor/teamcoco.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import base64
@@ -88,7 +88,7 @@ class TeamcocoIE(InfoExtractor):
          preload_codes = self._html_search_regex(
              r'(function.+)setTimeout\(function\(\)\{playlist',
              webpage, 'preload codes')
-        base64_fragments = re.findall(r'"([a-zA-z0-9+/=]+)"', preload_codes)
+        base64_fragments = re.findall(r'"([a-zA-Z0-9+/=]+)"', preload_codes)
          base64_fragments.remove('init')
  
          def _check_sequence(cur_fragments):
diff --git a/youtube_dl/extractor/techtalks.py b/youtube_dl/extractor/techtalks.py

index 16e945d8e624adc51e6a68eab786bdece0a29960..a5b62c717160380c873117e878017f5c3573939a 100644 (file)
--- a/youtube_dl/extractor/techtalks.py
+++ b/youtube_dl/extractor/techtalks.py
@@ -10,9 +10,9 @@ from ..utils import (
  
  
  class TechTalksIE(InfoExtractor):
-    _VALID_URL = r'https?://techtalks\.tv/talks/[^/]*/(?P<id>\d+)/'
+    _VALID_URL = r'https?://techtalks\.tv/talks/(?:[^/]+/)?(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://techtalks.tv/talks/learning-topic-models-going-beyond-svd/57758/',
          'info_dict': {
              'id': '57758',
@@ -38,7 +38,10 @@ class TechTalksIE(InfoExtractor):
              # rtmp download
              'skip_download': True,
          },
-    }
+    }, {
+        'url': 'http://techtalks.tv/talks/57758',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/ted.py b/youtube_dl/extractor/ted.py

index cf8851438bb74000abb2692c34607f3137505f1d..451cde76d2e757fcdfb30ad96847b16aa4d156ff 100644 (file)
--- a/youtube_dl/extractor/ted.py
+++ b/youtube_dl/extractor/ted.py
@@ -27,7 +27,7 @@ class TEDIE(InfoExtractor):
          '''
      _TESTS = [{
          'url': 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html',
-        'md5': 'fc94ac279feebbce69f21c0c6ee82810',
+        'md5': '0de43ac406aa3e4ea74b66c9c7789b13',
          'info_dict': {
              'id': '102',
              'ext': 'mp4',
@@ -37,21 +37,26 @@ class TEDIE(InfoExtractor):
                              'consciousness, but that half the time our brains are '
                              'actively fooling us.'),
              'uploader': 'Dan Dennett',
-            'width': 854,
+            'width': 853,
              'duration': 1308,
          }
      }, {
          'url': 'http://www.ted.com/watch/ted-institute/ted-bcg/vishal-sikka-the-beauty-and-power-of-algorithms',
-        'md5': '226f4fb9c62380d11b7995efa4c87994',
+        'md5': 'b899ac15e345fb39534d913f7606082b',
          'info_dict': {
-            'id': 'vishal-sikka-the-beauty-and-power-of-algorithms',
+            'id': 'tSVI8ta_P4w',
              'ext': 'mp4',
              'title': 'Vishal Sikka: The beauty and power of algorithms',
              'thumbnail': 're:^https?://.+\.jpg',
-            'description': 'Adaptive, intelligent, and consistent, algorithms are emerging as the ultimate app for everything from matching consumers to products to assessing medical diagnoses. Vishal Sikka shares his appreciation for the algorithm, charting both its inherent beauty and its growing power.',
-        }
+            'description': 'md5:6261fdfe3e02f4f579cbbfc00aff73f4',
+            'upload_date': '20140122',
+            'uploader_id': 'TEDInstitute',
+            'uploader': 'TED Institute',
+        },
+        'add_ie': ['Youtube'],
      }, {
          'url': 'http://www.ted.com/talks/gabby_giffords_and_mark_kelly_be_passionate_be_courageous_be_your_best',
+        'md5': '71b3ab2f4233012dce09d515c9c39ce2',
          'info_dict': {
              'id': '1972',
              'ext': 'mp4',
@@ -102,9 +107,9 @@ class TEDIE(InfoExtractor):
      }]
  
      _NATIVE_FORMATS = {
-        'low': {'preference': 1, 'width': 320, 'height': 180},
-        'medium': {'preference': 2, 'width': 512, 'height': 288},
-        'high': {'preference': 3, 'width': 854, 'height': 480},
+        'low': {'width': 320, 'height': 180},
+        'medium': {'width': 512, 'height': 288},
+        'high': {'width': 854, 'height': 480},
      }
  
      def _extract_info(self, webpage):
@@ -171,15 +176,21 @@ class TEDIE(InfoExtractor):
                  if finfo:
                      f.update(finfo)
  
+        http_url = None
          for format_id, resources in talk_info['resources'].items():
              if format_id == 'h264':
                  for resource in resources:
+                    h264_url = resource.get('file')
+                    if not h264_url:
+                        continue
                      bitrate = int_or_none(resource.get('bitrate'))
                      formats.append({
-                        'url': resource['file'],
+                        'url': h264_url,
                          'format_id': '%s-%sk' % (format_id, bitrate),
                          'tbr': bitrate,
                      })
+                    if re.search('\d+k', h264_url):
+                        http_url = h264_url
              elif format_id == 'rtmp':
                  streamer = talk_info.get('streamer')
                  if not streamer:
@@ -195,16 +206,24 @@ class TEDIE(InfoExtractor):
                          'tbr': int_or_none(resource.get('bitrate')),
                      })
              elif format_id == 'hls':
-                hls_formats = self._extract_m3u8_formats(
-                    resources.get('stream'), video_name, 'mp4', m3u8_id=format_id)
-                for f in hls_formats:
-                    if f.get('format_id') == 'hls-meta':
-                        continue
-                    if not f.get('height'):
-                        f['vcodec'] = 'none'
-                    else:
-                        f['acodec'] = 'none'
-                formats.extend(hls_formats)
+                formats.extend(self._extract_m3u8_formats(
+                    resources.get('stream'), video_name, 'mp4', m3u8_id=format_id, fatal=False))
+
+        m3u8_formats = list(filter(
+            lambda f: f.get('protocol') == 'm3u8' and f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
+            formats))
+        if http_url:
+            for m3u8_format in m3u8_formats:
+                bitrate = self._search_regex(r'(\d+k)', m3u8_format['url'], 'bitrate', default=None)
+                if not bitrate:
+                    continue
+                f = m3u8_format.copy()
+                f.update({
+                    'url': re.sub(r'\d+k', bitrate, http_url),
+                    'format_id': m3u8_format['format_id'].replace('hls', 'http'),
+                    'protocol': 'http',
+                })
+                formats.append(f)
  
          audio_download = talk_info.get('audioDownload')
          if audio_download:
@@ -212,7 +231,6 @@ class TEDIE(InfoExtractor):
                  'url': audio_download,
                  'format_id': 'audio',
                  'vcodec': 'none',
-                'preference': -0.5,
              })
  
          self._sort_formats(formats)
@@ -254,7 +272,11 @@ class TEDIE(InfoExtractor):
  
          config_json = self._html_search_regex(
              r'"pages\.jwplayer"\s*,\s*({.+?})\s*\)\s*</script>',
-            webpage, 'config')
+            webpage, 'config', default=None)
+        if not config_json:
+            embed_url = self._search_regex(
+                r"<iframe[^>]+class='pages-video-embed__video__object'[^>]+src='([^']+)'", webpage, 'embed url')
+            return self.url_result(self._proto_relative_url(embed_url))
          config = json.loads(config_json)['config']
          video_url = config['video']['url']
          thumbnail = config.get('image', {}).get('url')
diff --git a/youtube_dl/extractor/telebruxelles.py b/youtube_dl/extractor/telebruxelles.py

index a3d05f97d681b6cb4da6adf179a4f0a5744e5123..eefecc490c5d13476259497e79f7a3ebe68caee7 100644 (file)
--- a/youtube_dl/extractor/telebruxelles.py
+++ b/youtube_dl/extractor/telebruxelles.py
@@ -1,11 +1,13 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  
  
  class TeleBruxellesIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?telebruxelles\.be/(news|sport|dernier-jt)/?(?P<id>[^/#?]+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:telebruxelles|bx1)\.be/(news|sport|dernier-jt)/?(?P<id>[^/#?]+)'
      _TESTS = [{
          'url': 'http://www.telebruxelles.be/news/auditions-devant-parlement-francken-galant-tres-attendus/',
          'md5': '59439e568c9ee42fb77588b2096b214f',
@@ -39,18 +41,18 @@ class TeleBruxellesIE(InfoExtractor):
          webpage = self._download_webpage(url, display_id)
  
          article_id = self._html_search_regex(
-            r"<article id=\"post-(\d+)\"", webpage, 'article ID')
+            r"<article id=\"post-(\d+)\"", webpage, 'article ID', default=None)
          title = self._html_search_regex(
              r'<h1 class=\"entry-title\">(.*?)</h1>', webpage, 'title')
-        description = self._og_search_description(webpage)
+        description = self._og_search_description(webpage, default=None)
  
          rtmp_url = self._html_search_regex(
-            r"file: \"(rtmp://\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}/vod/mp4:\" \+ \"\w+\" \+ \".mp4)\"",
+            r'file\s*:\s*"(rtmp://[^/]+/vod/mp4:"\s*\+\s*"[^"]+"\s*\+\s*".mp4)"',
              webpage, 'RTMP url')
-        rtmp_url = rtmp_url.replace("\" + \"", "")
+        rtmp_url = re.sub(r'"\s*\+\s*"', '', rtmp_url)
  
          return {
-            'id': article_id,
+            'id': article_id or display_id,
              'display_id': display_id,
              'title': title,
              'description': description,
diff --git a/youtube_dl/extractor/telecinco.py b/youtube_dl/extractor/telecinco.py

index 4b4b740b44d325ffb8a8a5c6cba848b0c99ced13..d5abfc9e44ec82b492fcd98d9e4429b40c5c05b9 100644 (file)
--- a/youtube_dl/extractor/telecinco.py
+++ b/youtube_dl/extractor/telecinco.py
@@ -1,50 +1,41 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import json
+from .mitele import MiTeleBaseIE
  
-from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_unquote,
-    compat_urllib_parse_urlencode,
-    compat_urlparse,
-)
-from ..utils import (
-    get_element_by_attribute,
-    parse_duration,
-    strip_jsonp,
-)
  
-
-class TelecincoIE(InfoExtractor):
+class TelecincoIE(MiTeleBaseIE):
      IE_DESC = 'telecinco.es, cuatro.com and mediaset.es'
-    _VALID_URL = r'https?://www\.(?:telecinco\.es|cuatro\.com|mediaset\.es)/(?:[^/]+/)+(?P<id>.+?)\.html'
+    _VALID_URL = r'https?://(?:www\.)?(?:telecinco\.es|cuatro\.com|mediaset\.es)/(?:[^/]+/)+(?P<id>.+?)\.html'
  
      _TESTS = [{
          'url': 'http://www.telecinco.es/robinfood/temporada-01/t01xp14/Bacalao-cocochas-pil-pil_0_1876350223.html',
-        'md5': '5cbef3ad5ef17bf0d21570332d140729',
+        'md5': '8d7b2d5f699ee2709d992a63d5cd1712',
          'info_dict': {
-            'id': 'MDSVID20141015_0058',
+            'id': 'JEA5ijCnF6p5W08A1rNKn7',
              'ext': 'mp4',
-            'title': 'Con Martín Berasategui, hacer un bacalao al ...',
+            'title': 'Bacalao con kokotxas al pil-pil',
+            'description': 'md5:1382dacd32dd4592d478cbdca458e5bb',
              'duration': 662,
          },
      }, {
          'url': 'http://www.cuatro.com/deportes/futbol/barcelona/Leo_Messi-Champions-Roma_2_2052780128.html',
-        'md5': '0a5b9f3cc8b074f50a0578f823a12694',
+        'md5': '284393e5387b3b947b77c613ef04749a',
          'info_dict': {
-            'id': 'MDSVID20150916_0128',
+            'id': 'jn24Od1zGLG4XUZcnUnZB6',
              'ext': 'mp4',
-            'title': '¿Quién es este ex futbolista con el que hablan ...',
+            'title': '¿Quién es este ex futbolista con el que hablan Leo Messi y Luis Suárez?',
+            'description': 'md5:a62ecb5f1934fc787107d7b9a2262805',
              'duration': 79,
          },
      }, {
          'url': 'http://www.mediaset.es/12meses/campanas/doylacara/conlatratanohaytrato/Ayudame-dar-cara-trata-trato_2_1986630220.html',
-        'md5': 'ad1bfaaba922dd4a295724b05b68f86a',
+        'md5': '749afab6ea5a136a8806855166ae46a2',
          'info_dict': {
-            'id': 'MDSVID20150513_0220',
+            'id': 'aywerkD2Sv1vGNqq9b85Q2',
              'ext': 'mp4',
              'title': '#DOYLACARA. Con la trata no hay trato',
+            'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
              'duration': 50,
          },
      }, {
@@ -56,40 +47,16 @@ class TelecincoIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        episode = self._match_id(url)
-        webpage = self._download_webpage(url, episode)
-        embed_data_json = self._search_regex(
-            r'(?s)MSV\.embedData\[.*?\]\s*=\s*({.*?});', webpage, 'embed data',
-        ).replace('\'', '"')
-        embed_data = json.loads(embed_data_json)
-
-        domain = embed_data['mediaUrl']
-        if not domain.startswith('http'):
-            # only happens in telecinco.es videos
-            domain = 'http://' + domain
-        info_url = compat_urlparse.urljoin(
-            domain,
-            compat_urllib_parse_unquote(embed_data['flashvars']['host'])
-        )
-        info_el = self._download_xml(info_url, episode).find('./video/info')
-
-        video_link = info_el.find('videoUrl/link').text
-        token_query = compat_urllib_parse_urlencode({'id': video_link})
-        token_info = self._download_json(
-            embed_data['flashvars']['ov_tk'] + '?' + token_query,
-            episode,
-            transform_source=strip_jsonp
-        )
-        formats = self._extract_m3u8_formats(
-            token_info['tokenizedUrl'], episode, ext='mp4', entry_protocol='m3u8_native')
-        self._sort_formats(formats)
-
-        return {
-            'id': embed_data['videoId'],
-            'display_id': episode,
-            'title': info_el.find('title').text,
-            'formats': formats,
-            'description': get_element_by_attribute('class', 'text', webpage),
-            'thumbnail': info_el.find('thumb').text,
-            'duration': parse_duration(info_el.find('duration').text),
-        }
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        title = self._html_search_meta(
+            ['og:title', 'twitter:title'], webpage, 'title')
+        info = self._get_player_info(url, webpage)
+        info.update({
+            'display_id': display_id,
+            'title': title,
+            'description': self._html_search_meta(
+                ['og:description', 'twitter:description'],
+                webpage, 'title', fatal=False),
+        })
+        return info
diff --git a/youtube_dl/extractor/telegraaf.py b/youtube_dl/extractor/telegraaf.py

index 6f8333cfc0d40aee4d3637ed1e58867da1277e9f..58078c531d151e319fb7e707d8116a730507962b 100644 (file)
--- a/youtube_dl/extractor/telegraaf.py
+++ b/youtube_dl/extractor/telegraaf.py
@@ -2,14 +2,16 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import remove_end
+from ..utils import (
+    determine_ext,
+    remove_end,
+)
  
  
  class TelegraafIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?telegraaf\.nl/tv/(?:[^/]+/)+(?P<id>\d+)/[^/]+\.html'
      _TEST = {
          'url': 'http://www.telegraaf.nl/tv/nieuws/binnenland/24353229/__Tikibad_ontruimd_wegens_brand__.html',
-        'md5': '83245a9779bcc4a24454bfd53c65b6dc',
          'info_dict': {
              'id': '24353229',
              'ext': 'mp4',
@@ -18,18 +20,59 @@ class TelegraafIE(InfoExtractor):
              'thumbnail': 're:^https?://.*\.jpg$',
              'duration': 33,
          },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }
  
      def _real_extract(self, url):
-        playlist_id = self._match_id(url)
+        video_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, playlist_id)
+        webpage = self._download_webpage(url, video_id)
  
+        player_url = self._html_search_regex(
+            r'<iframe[^>]+src="([^"]+")', webpage, 'player URL')
+        player_page = self._download_webpage(
+            player_url, video_id, note='Download player webpage')
          playlist_url = self._search_regex(
-            r"iframe\.loadPlayer\('([^']+)'", webpage, 'player')
+            r'playlist\s*:\s*"([^"]+)"', player_page, 'playlist URL')
+        playlist_data = self._download_json(playlist_url, video_id)
+
+        item = playlist_data['items'][0]
+        formats = []
+        locations = item['locations']
+        for location in locations.get('adaptive', []):
+            manifest_url = location['src']
+            ext = determine_ext(manifest_url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    manifest_url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
+            elif ext == 'mpd':
+                formats.extend(self._extract_mpd_formats(
+                    manifest_url, video_id, mpd_id='dash', fatal=False))
+            else:
+                self.report_warning('Unknown adaptive format %s' % ext)
+        for location in locations.get('progressive', []):
+            formats.append({
+                'url': location['sources'][0]['src'],
+                'width': location.get('width'),
+                'height': location.get('height'),
+                'format_id': 'http-%s' % location['label'],
+            })
+
+        self._sort_formats(formats)
  
-        entries = self._extract_xspf_playlist(playlist_url, playlist_id)
          title = remove_end(self._og_search_title(webpage), ' - VIDEO')
          description = self._og_search_description(webpage)
+        duration = item.get('duration')
+        thumbnail = item.get('poster')
  
-        return self.playlist_result(entries, playlist_id, title, description)
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'formats': formats,
+            'duration': duration,
+            'thumbnail': thumbnail,
+        }
diff --git a/youtube_dl/extractor/telequebec.py b/youtube_dl/extractor/telequebec.py

new file mode 100644 (file)

index 0000000..4043fcb
--- /dev/null
+++ b/youtube_dl/extractor/telequebec.py
@@ -0,0 +1,36 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class TeleQuebecIE(InfoExtractor):
+    _VALID_URL = r'https?://zonevideo\.telequebec\.tv/media/(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://zonevideo.telequebec.tv/media/20984/le-couronnement-de-new-york/couronnement-de-new-york',
+        'md5': 'fe95a0957e5707b1b01f5013e725c90f',
+        'info_dict': {
+            'id': '20984',
+            'ext': 'mp4',
+            'title': 'Le couronnement de New York',
+            'description': 'md5:f5b3d27a689ec6c1486132b2d687d432',
+            'upload_date': '20160220',
+            'timestamp': 1455965438,
+        }
+    }
+
+    def _real_extract(self, url):
+        media_id = self._match_id(url)
+        media_data = self._download_json(
+            'https://mnmedias.api.telequebec.tv/api/v2/media/' + media_id,
+            media_id)['media']
+        return {
+            '_type': 'url_transparent',
+            'id': media_id,
+            'url': 'limelight:media:' + media_data['streamInfo']['sourceId'],
+            'title': media_data['title'],
+            'description': media_data.get('descriptions', [{'text': None}])[0].get('text'),
+            'duration': int_or_none(media_data.get('durationInMilliseconds'), 1000),
+            'ie_key': 'LimelightMedia',
+        }
diff --git a/youtube_dl/extractor/telewebion.py b/youtube_dl/extractor/telewebion.py

new file mode 100644 (file)

index 0000000..7786b28
--- /dev/null
+++ b/youtube_dl/extractor/telewebion.py
@@ -0,0 +1,55 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class TelewebionIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?telewebion\.com/#!/episode/(?P<id>\d+)'
+
+    _TEST = {
+        'url': 'http://www.telewebion.com/#!/episode/1263668/',
+        'info_dict': {
+            'id': '1263668',
+            'ext': 'mp4',
+            'title': 'قرعه\u200cکشی لیگ قهرمانان اروپا',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'view_count': int,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        secure_token = self._download_webpage(
+            'http://m.s2.telewebion.com/op/op?action=getSecurityToken', video_id)
+        episode_details = self._download_json(
+            'http://m.s2.telewebion.com/op/op', video_id,
+            query={'action': 'getEpisodeDetails', 'episode_id': video_id})
+
+        m3u8_url = 'http://m.s1.telewebion.com/smil/%s.m3u8?filepath=%s&m3u8=1&secure_token=%s' % (
+            video_id, episode_details['file_path'], secure_token)
+        formats = self._extract_m3u8_formats(
+            m3u8_url, video_id, ext='mp4', m3u8_id='hls')
+
+        picture_paths = [
+            episode_details.get('picture_path'),
+            episode_details.get('large_picture_path'),
+        ]
+
+        thumbnails = [{
+            'url': picture_path,
+            'preference': idx,
+        } for idx, picture_path in enumerate(picture_paths) if picture_path is not None]
+
+        return {
+            'id': video_id,
+            'title': episode_details['title'],
+            'formats': formats,
+            'thumbnails': thumbnails,
+            'view_count': episode_details.get('view_count'),
+        }
diff --git a/youtube_dl/extractor/tenplay.py b/youtube_dl/extractor/tenplay.py

deleted file mode 100644 (file)

index 02a31a6..0000000
--- a/youtube_dl/extractor/tenplay.py
+++ /dev/null
@@ -1,90 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    float_or_none,
-)
-
-
-class TenPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ten(play)?\.com\.au/.+'
-    _TEST = {
-        'url': 'http://tenplay.com.au/ten-insider/extra/season-2013/tenplay-tv-your-way',
-        'info_dict': {
-            'id': '2695695426001',
-            'ext': 'flv',
-            'title': 'TENplay: TV your way',
-            'description': 'Welcome to a new TV experience. Enjoy a taste of the TENplay benefits.',
-            'timestamp': 1380150606.889,
-            'upload_date': '20130925',
-            'uploader': 'TENplay',
-        },
-        'params': {
-            'skip_download': True,  # Requires rtmpdump
-        }
-    }
-
-    _video_fields = [
-        'id', 'name', 'shortDescription', 'longDescription', 'creationDate',
-        'publishedDate', 'lastModifiedDate', 'customFields', 'videoStillURL',
-        'thumbnailURL', 'referenceId', 'length', 'playsTotal',
-        'playsTrailingWeek', 'renditions', 'captioning', 'startDate', 'endDate']
-
-    def _real_extract(self, url):
-        webpage = self._download_webpage(url, url)
-        video_id = self._html_search_regex(
-            r'videoID: "(\d+?)"', webpage, 'video_id')
-        api_token = self._html_search_regex(
-            r'apiToken: "([a-zA-Z0-9-_\.]+?)"', webpage, 'api_token')
-        title = self._html_search_regex(
-            r'<meta property="og:title" content="\s*(.*?)\s*"\s*/?\s*>',
-            webpage, 'title')
-
-        json = self._download_json('https://api.brightcove.com/services/library?command=find_video_by_id&video_id=%s&token=%s&video_fields=%s' % (video_id, api_token, ','.join(self._video_fields)), title)
-
-        formats = []
-        for rendition in json['renditions']:
-            url = rendition['remoteUrl'] or rendition['url']
-            protocol = 'rtmp' if url.startswith('rtmp') else 'http'
-            ext = 'flv' if protocol == 'rtmp' else rendition['videoContainer'].lower()
-
-            if protocol == 'rtmp':
-                url = url.replace('&mp4:', '')
-
-                tbr = int_or_none(rendition.get('encodingRate'), 1000)
-
-            formats.append({
-                'format_id': '_'.join(
-                    ['rtmp', rendition['videoContainer'].lower(),
-                     rendition['videoCodec'].lower(), '%sk' % tbr]),
-                'width': int_or_none(rendition['frameWidth']),
-                'height': int_or_none(rendition['frameHeight']),
-                'tbr': tbr,
-                'filesize': int_or_none(rendition['size']),
-                'protocol': protocol,
-                'ext': ext,
-                'vcodec': rendition['videoCodec'].lower(),
-                'container': rendition['videoContainer'].lower(),
-                'url': url,
-            })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': json['referenceId'],
-            'title': json['name'],
-            'description': json['shortDescription'] or json['longDescription'],
-            'formats': formats,
-            'thumbnails': [{
-                'url': json['videoStillURL']
-            }, {
-                'url': json['thumbnailURL']
-            }],
-            'thumbnail': json['videoStillURL'],
-            'duration': float_or_none(json.get('length'), 1000),
-            'timestamp': float_or_none(json.get('creationDate'), 1000),
-            'uploader': json.get('customFields', {}).get('production_company_distributor') or 'TENplay',
-            'view_count': int_or_none(json.get('playsTotal')),
-        }
diff --git a/youtube_dl/extractor/tf1.py b/youtube_dl/extractor/tf1.py

index 3f54b2744cb16cd6385e5cb06919cbaf9628167a..e595c4a69b3f03361abc05f6bca61adecb61cf36 100644 (file)
--- a/youtube_dl/extractor/tf1.py
+++ b/youtube_dl/extractor/tf1.py
@@ -6,7 +6,7 @@ from .common import InfoExtractor
  
  class TF1IE(InfoExtractor):
      """TF1 uses the wat.tv player."""
-    _VALID_URL = r'https?://(?:(?:videos|www|lci)\.tf1|www\.tfou)\.fr/(?:[^/]+/)*(?P<id>.+?)\.html'
+    _VALID_URL = r'https?://(?:(?:videos|www|lci)\.tf1|(?:www\.)?(?:tfou|ushuaiatv|histoire|tvbreizh))\.fr/(?:[^/]+/)*(?P<id>[^/?#.]+)'
      _TESTS = [{
          'url': 'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
          'info_dict': {
@@ -48,6 +48,6 @@ class TF1IE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
          wat_id = self._html_search_regex(
-            r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})(?:#.*?)?\1',
+            r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
              webpage, 'wat id', group='id')
          return self.url_result('wat:%s' % wat_id, 'Wat')
diff --git a/youtube_dl/extractor/tfo.py b/youtube_dl/extractor/tfo.py

new file mode 100644 (file)

index 0000000..6f1eeac
--- /dev/null
+++ b/youtube_dl/extractor/tfo.py
@@ -0,0 +1,53 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+
+from .common import InfoExtractor
+from ..utils import (
+    HEADRequest,
+    ExtractorError,
+    int_or_none,
+)
+
+
+class TFOIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
+        'md5': '47c987d0515561114cf03d1226a9d4c7',
+        'info_dict': {
+            'id': '100463871',
+            'ext': 'mp4',
+            'title': 'Video Game Hackathon',
+            'description': 'md5:558afeba217c6c8d96c60e5421795c07',
+            'upload_date': '20160212',
+            'timestamp': 1455310233,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        self._request_webpage(HEADRequest('http://www.tfo.org/'), video_id)
+        infos = self._download_json(
+            'http://www.tfo.org/api/web/video/get_infos', video_id, data=json.dumps({
+                'product_id': video_id,
+            }).encode(), headers={
+                'X-tfo-session': self._get_cookies('http://www.tfo.org/')['tfo-session'].value,
+            })
+        if infos.get('success') == 0:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, infos['msg']), expected=True)
+        video_data = infos['data']
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': 'limelight:media:' + video_data['llid'],
+            'title': video_data['title'],
+            'description': video_data.get('description'),
+            'series': video_data.get('collection'),
+            'season_number': int_or_none(video_data.get('season')),
+            'episode_number': int_or_none(video_data.get('episode')),
+            'duration': int_or_none(video_data.get('duration')),
+            'ie_key': 'LimelightMedia',
+        }
diff --git a/youtube_dl/extractor/theintercept.py b/youtube_dl/extractor/theintercept.py

index 8cb3c3669f2af9929e702ce0c2a57606177f18e2..f23b587137a0e471ada57c8a08d2fbaf8ecc9722 100644 (file)
--- a/youtube_dl/extractor/theintercept.py
+++ b/youtube_dl/extractor/theintercept.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class TheInterceptIE(InfoExtractor):
-    _VALID_URL = r'https://theintercept.com/fieldofvision/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://theintercept\.com/fieldofvision/(?P<id>[^/?#]+)'
      _TESTS = [{
          'url': 'https://theintercept.com/fieldofvision/thisisacoup-episode-four-surrender-or-die/',
          'md5': '145f28b41d44aab2f87c0a4ac8ec95bd',
diff --git a/youtube_dl/extractor/theonion.py b/youtube_dl/extractor/theonion.py

deleted file mode 100644 (file)

index 10239c9..0000000
--- a/youtube_dl/extractor/theonion.py
+++ /dev/null
@@ -1,63 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class TheOnionIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?theonion\.com/video/[^,]+,(?P<id>[0-9]+)/?'
-    _TEST = {
-        'url': 'http://www.theonion.com/video/man-wearing-mm-jacket-gods-image,36918/',
-        'md5': '19eaa9a39cf9b9804d982e654dc791ee',
-        'info_dict': {
-            'id': '2133',
-            'ext': 'mp4',
-            'title': 'Man Wearing M&M Jacket Apparently Made In God\'s Image',
-            'description': 'md5:cc12448686b5600baae9261d3e180910',
-            'thumbnail': 're:^https?://.*\.jpg\?\d+$',
-        }
-    }
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-
-        video_id = self._search_regex(
-            r'"videoId":\s(\d+),', webpage, 'video ID')
-        title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-
-        sources = re.findall(r'<source src="([^"]+)" type="([^"]+)"', webpage)
-        formats = []
-        for src, type_ in sources:
-            if type_ == 'video/mp4':
-                formats.append({
-                    'format_id': 'mp4_sd',
-                    'preference': 1,
-                    'url': src,
-                })
-            elif type_ == 'video/webm':
-                formats.append({
-                    'format_id': 'webm_sd',
-                    'preference': 0,
-                    'url': src,
-                })
-            elif type_ == 'application/x-mpegURL':
-                formats.extend(
-                    self._extract_m3u8_formats(src, display_id, preference=-1))
-            else:
-                self.report_warning(
-                    'Encountered unexpected format: %s' % type_)
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'formats': formats,
-            'thumbnail': thumbnail,
-            'description': description,
-        }
diff --git a/youtube_dl/extractor/theplatform.py b/youtube_dl/extractor/theplatform.py

index 863914299234fb0d7dd279da51c0f0d8c18d6534..cfbf7f4e1562c78ea1d5ae44437694a5325eb70b 100644 (file)
--- a/youtube_dl/extractor/theplatform.py
+++ b/youtube_dl/extractor/theplatform.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -9,16 +9,19 @@ import hashlib
  
  
  from .once import OnceIE
+from .adobepass import AdobePassIE
  from ..compat import (
      compat_parse_qs,
      compat_urllib_parse_urlparse,
  )
  from ..utils import (
+    determine_ext,
      ExtractorError,
      float_or_none,
      int_or_none,
      sanitized_Request,
      unsmuggle_url,
+    update_url_query,
      xpath_with_ns,
      mimetype2ext,
      find_xpath_attr,
@@ -48,27 +51,32 @@ class ThePlatformBaseIE(OnceIE):
              if OnceIE.suitable(_format['url']):
                  formats.extend(self._extract_once_formats(_format['url']))
              else:
-                formats.append(_format)
+                media_url = _format['url']
+                if determine_ext(media_url) == 'm3u8':
+                    hdnea2 = self._get_cookies(media_url).get('hdnea2')
+                    if hdnea2:
+                        _format['url'] = update_url_query(media_url, {'hdnea3': hdnea2.value})
  
-        self._sort_formats(formats)
+                formats.append(_format)
  
          subtitles = self._parse_smil_subtitles(meta, default_ns)
  
          return formats, subtitles
  
-    def get_metadata(self, path, video_id):
+    def _download_theplatform_metadata(self, path, video_id):
          info_url = 'http://link.theplatform.com/s/%s?format=preview' % path
-        info = self._download_json(info_url, video_id)
+        return self._download_json(info_url, video_id)
  
+    def _parse_theplatform_metadata(self, info):
          subtitles = {}
          captions = info.get('captions')
          if isinstance(captions, list):
              for caption in captions:
                  lang, src, mime = caption.get('lang', 'en'), caption.get('src'), caption.get('type')
-                subtitles[lang] = [{
+                subtitles.setdefault(lang, []).append({
                      'ext': mimetype2ext(mime),
                      'url': src,
-                }]
+                })
  
          return {
              'title': info['title'],
@@ -76,13 +84,19 @@ class ThePlatformBaseIE(OnceIE):
              'description': info['description'],
              'thumbnail': info['defaultThumbnailUrl'],
              'duration': int_or_none(info.get('duration'), 1000),
+            'timestamp': int_or_none(info.get('pubDate'), 1000) or None,
+            'uploader': info.get('billingCode'),
          }
  
+    def _extract_theplatform_metadata(self, path, video_id):
+        info = self._download_theplatform_metadata(path, video_id)
+        return self._parse_theplatform_metadata(info)
  
-class ThePlatformIE(ThePlatformBaseIE):
+
+class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
      _VALID_URL = r'''(?x)
          (?:https?://(?:link|player)\.theplatform\.com/[sp]/(?P<provider_id>[^/]+)/
-           (?:(?P<media>(?:(?:[^/]+/)+select/)?media/)|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
+           (?:(?:(?:[^/]+/)+select/)?(?P<media>media/(?:guid/\d+/)?)?|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
           |theplatform:)(?P<id>[^/\?&]+)'''
  
      _TESTS = [{
@@ -94,11 +108,15 @@ class ThePlatformIE(ThePlatformBaseIE):
              'title': 'Blackberry\'s big, bold Z30',
              'description': 'The Z30 is Blackberry\'s biggest, baddest mobile messaging device yet.',
              'duration': 247,
+            'timestamp': 1383239700,
+            'upload_date': '20131031',
+            'uploader': 'CBSI-NEW',
          },
          'params': {
              # rtmp download
              'skip_download': True,
          },
+        'skip': '404 Not Found',
      }, {
          # from http://www.cnet.com/videos/tesla-model-s-a-second-step-towards-a-cleaner-motoring-future/
          'url': 'http://link.theplatform.com/s/kYEXFC/22d_qsQ6MIRT',
@@ -107,6 +125,9 @@ class ThePlatformIE(ThePlatformBaseIE):
              'ext': 'flv',
              'description': 'md5:ac330c9258c04f9d7512cf26b9595409',
              'title': 'Tesla Model S: A second step towards a cleaner motoring future',
+            'timestamp': 1426176191,
+            'upload_date': '20150312',
+            'uploader': 'CBSI-NEW',
          },
          'params': {
              # rtmp download
@@ -119,6 +140,7 @@ class ThePlatformIE(ThePlatformBaseIE):
              'ext': 'mp4',
              'description': 'md5:644ad9188d655b742f942bf2e06b002d',
              'title': 'HIGHLIGHTS: USA bag first ever series Cup win',
+            'uploader': 'EGSM',
          }
      }, {
          'url': 'http://player.theplatform.com/p/NnzsPC/widget/select/media/4Y0TlYUr_ZT7',
@@ -135,6 +157,7 @@ class ThePlatformIE(ThePlatformBaseIE):
              'thumbnail': 're:^https?://.*\.jpg$',
              'timestamp': 1435752600,
              'upload_date': '20150701',
+            'uploader': 'NBCU-NEWS',
          },
      }, {
          # From http://www.nbc.com/the-blacklist/video/sir-crispin-crandall/2928790?onid=137781#vc137781=1
@@ -143,6 +166,22 @@ class ThePlatformIE(ThePlatformBaseIE):
          'only_matching': True,
      }]
  
+    @classmethod
+    def _extract_urls(cls, webpage):
+        m = re.search(
+            r'''(?x)
+                    <meta\s+
+                        property=(["'])(?:og:video(?::(?:secure_)?url)?|twitter:player)\1\s+
+                        content=(["'])(?P<url>https?://player\.theplatform\.com/p/.+?)\2
+            ''', webpage)
+        if m:
+            return [m.group('url')]
+
+        matches = re.findall(
+            r'<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
+        if matches:
+            return list(zip(*matches))[1]
+
      @staticmethod
      def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False):
          flags = '10' if include_qs else '00'
@@ -151,11 +190,11 @@ class ThePlatformIE(ThePlatformBaseIE):
          def str_to_hex(str):
              return binascii.b2a_hex(str.encode('ascii')).decode('ascii')
  
-        def hex_to_str(hex):
-            return binascii.a2b_hex(hex)
+        def hex_to_bytes(hex):
+            return binascii.a2b_hex(hex.encode('ascii'))
  
-        relative_path = url.split('http://link.theplatform.com/s/')[1].split('?')[0]
-        clear_text = hex_to_str(flags + expiration_date + str_to_hex(relative_path))
+        relative_path = re.match(r'https?://link.theplatform.com/s/([^?]+)', url).group(1)
+        clear_text = hex_to_bytes(flags + expiration_date + str_to_hex(relative_path))
          checksum = hmac.new(sig_key.encode('ascii'), clear_text, hashlib.sha1).hexdigest()
          sig = flags + expiration_date + checksum + str_to_hex(sig_secret)
          return '%s&sig=%s' % (url, sig)
@@ -170,10 +209,10 @@ class ThePlatformIE(ThePlatformBaseIE):
          if not provider_id:
              provider_id = 'dJ5BDC'
  
-        path = provider_id
+        path = provider_id + '/'
          if mobj.group('media'):
-            path += '/media'
-        path += '/' + video_id
+            path += mobj.group('media')
+        path += video_id
  
          qs_dict = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
          if 'guid' in qs_dict:
@@ -231,8 +270,9 @@ class ThePlatformIE(ThePlatformBaseIE):
              smil_url = self._sign_url(smil_url, sig['key'], sig['secret'])
  
          formats, subtitles = self._extract_theplatform_smil(smil_url, video_id)
+        self._sort_formats(formats)
  
-        ret = self.get_metadata(path, video_id)
+        ret = self._extract_theplatform_metadata(path, video_id)
          combined_subtitles = self._merge_subtitles(ret.get('subtitles', {}), subtitles)
          ret.update({
              'id': video_id,
@@ -244,9 +284,9 @@ class ThePlatformIE(ThePlatformBaseIE):
  
  
  class ThePlatformFeedIE(ThePlatformBaseIE):
-    _URL_TEMPLATE = '%s//feed.theplatform.com/f/%s/%s?form=json&byGuid=%s'
-    _VALID_URL = r'https?://feed\.theplatform\.com/f/(?P<provider_id>[^/]+)/(?P<feed_id>[^?/]+)\?(?:[^&]+&)*byGuid=(?P<id>[a-zA-Z0-9_]+)'
-    _TEST = {
+    _URL_TEMPLATE = '%s//feed.theplatform.com/f/%s/%s?form=json&%s'
+    _VALID_URL = r'https?://feed\.theplatform\.com/f/(?P<provider_id>[^/]+)/(?P<feed_id>[^?/]+)\?(?:[^&]+&)*(?P<filter>by(?:Gui|I)d=(?P<id>[\w-]+))'
+    _TESTS = [{
          # From http://player.theplatform.com/p/7wvmTC/MSNBCEmbeddedOffSite?guid=n_hardball_5biden_140207
          'url': 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207',
          'md5': '6e32495b5073ab414471b615c5ded394',
@@ -260,33 +300,40 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
              'timestamp': 1391824260,
              'duration': 467.0,
              'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'],
+            'uploader': 'NBCU-NEWS',
          },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-
-        video_id = mobj.group('id')
-        provider_id = mobj.group('provider_id')
-        feed_id = mobj.group('feed_id')
+    }]
  
-        real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, video_id)
-        feed = self._download_json(real_url, video_id)
-        entry = feed['entries'][0]
+    def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}):
+        real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query)
+        entry = self._download_json(real_url, video_id)['entries'][0]
  
          formats = []
          subtitles = {}
          first_video_id = None
          duration = None
+        asset_types = []
          for item in entry['media$content']:
-            smil_url = item['plfile$url'] + '&mbr=true'
+            smil_url = item['plfile$url']
              cur_video_id = ThePlatformIE._match_id(smil_url)
              if first_video_id is None:
                  first_video_id = cur_video_id
                  duration = float_or_none(item.get('plfile$duration'))
-            cur_formats, cur_subtitles = self._extract_theplatform_smil(smil_url, video_id, 'Downloading SMIL data for %s' % cur_video_id)
-            formats.extend(cur_formats)
-            subtitles = self._merge_subtitles(subtitles, cur_subtitles)
+            for asset_type in item['plfile$assetTypes']:
+                if asset_type in asset_types:
+                    continue
+                asset_types.append(asset_type)
+                query = {
+                    'mbr': 'true',
+                    'formats': item['plfile$format'],
+                    'assetTypes': asset_type,
+                }
+                if asset_type in asset_types_query:
+                    query.update(asset_types_query[asset_type])
+                cur_formats, cur_subtitles = self._extract_theplatform_smil(update_url_query(
+                    smil_url, query), video_id, 'Downloading SMIL data for %s' % asset_type)
+                formats.extend(cur_formats)
+                subtitles = self._merge_subtitles(subtitles, cur_subtitles)
  
          self._sort_formats(formats)
  
@@ -299,7 +346,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
          timestamp = int_or_none(entry.get('media$availableDate'), scale=1000)
          categories = [item['media$name'] for item in entry.get('media$categories', [])]
  
-        ret = self.get_metadata('%s/%s' % (provider_id, first_video_id), video_id)
+        ret = self._extract_theplatform_metadata('%s/%s' % (provider_id, first_video_id), video_id)
          subtitles = self._merge_subtitles(subtitles, ret['subtitles'])
          ret.update({
              'id': video_id,
@@ -310,5 +357,17 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
              'timestamp': timestamp,
              'categories': categories,
          })
+        if custom_fields:
+            ret.update(custom_fields(entry))
  
          return ret
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+
+        video_id = mobj.group('id')
+        provider_id = mobj.group('provider_id')
+        feed_id = mobj.group('feed_id')
+        filter_query = mobj.group('filter')
+
+        return self._extract_feed_info(provider_id, feed_id, filter_query, video_id)
diff --git a/youtube_dl/extractor/thescene.py b/youtube_dl/extractor/thescene.py

index 3e4e14031a975d8176ffce6706a139318c12d25d..ce1326c03643186b4e1eb58905ef8f9c868588f6 100644 (file)
--- a/youtube_dl/extractor/thescene.py
+++ b/youtube_dl/extractor/thescene.py
@@ -7,7 +7,7 @@ from ..utils import qualities
  
  
  class TheSceneIE(InfoExtractor):
-    _VALID_URL = r'https://thescene\.com/watch/[^/]+/(?P<id>[^/#?]+)'
+    _VALID_URL = r'https?://thescene\.com/watch/[^/]+/(?P<id>[^/#?]+)'
  
      _TEST = {
          'url': 'https://thescene.com/watch/vogue/narciso-rodriguez-spring-2013-ready-to-wear',
diff --git a/youtube_dl/extractor/thesixtyone.py b/youtube_dl/extractor/thesixtyone.py

index d8b1fd2813eadc3d17a17a6d46766b3c9c4ea37a..d63aef5dea9a8543f2a919b19321582f20e8df86 100644 (file)
--- a/youtube_dl/extractor/thesixtyone.py
+++ b/youtube_dl/extractor/thesixtyone.py
@@ -12,7 +12,7 @@ class TheSixtyOneIE(InfoExtractor):
              s|
              song/comments/list|
              song
-        )/(?P<id>[A-Za-z0-9]+)/?$'''
+        )/(?:[^/]+/)?(?P<id>[A-Za-z0-9]+)/?$'''
      _SONG_URL_TEMPLATE = 'http://thesixtyone.com/s/{0:}'
      _SONG_FILE_URL_TEMPLATE = 'http://{audio_server:}/thesixtyone_production/audio/{0:}_stream'
      _THUMBNAIL_URL_TEMPLATE = '{photo_base_url:}_desktop'
@@ -45,6 +45,10 @@ class TheSixtyOneIE(InfoExtractor):
              'url': 'http://www.thesixtyone.com/song/SrE3zD7s1jt/',
              'only_matching': True,
          },
+        {
+            'url': 'http://www.thesixtyone.com/maryatmidnight/song/StrawberriesandCream/yvWtLp0c4GQ/',
+            'only_matching': True,
+        },
      ]
  
      _DECODE_MAP = {
diff --git a/youtube_dl/extractor/thestar.py b/youtube_dl/extractor/thestar.py

index b7e9af2afc87460fb63cf4becaa87d0ce5760f8e..c3f11889471b978f9bdbb29014fb7e9e08aca636 100644 (file)
--- a/youtube_dl/extractor/thestar.py
+++ b/youtube_dl/extractor/thestar.py
@@ -2,8 +2,6 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from .brightcove import BrightcoveLegacyIE
-from ..compat import compat_parse_qs
  
  
  class TheStarIE(InfoExtractor):
@@ -19,6 +17,10 @@ class TheStarIE(InfoExtractor):
              'uploader_id': '794267642001',
              'timestamp': 1454353482,
              'upload_date': '20160201',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          }
      }
      BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/794267642001/default_default/index.html?videoId=%s'
@@ -26,6 +28,9 @@ class TheStarIE(InfoExtractor):
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
-        brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
-        brightcove_id = compat_parse_qs(brightcove_legacy_url)['@videoPlayer'][0]
-        return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
+        brightcove_id = self._search_regex(
+            r'mainartBrightcoveVideoId["\']?\s*:\s*["\']?(\d+)',
+            webpage, 'brightcove id')
+        return self.url_result(
+            self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
+            'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/theweatherchannel.py b/youtube_dl/extractor/theweatherchannel.py

new file mode 100644 (file)

index 0000000..c34a49d
--- /dev/null
+++ b/youtube_dl/extractor/theweatherchannel.py
@@ -0,0 +1,79 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .theplatform import ThePlatformIE
+from ..utils import (
+    determine_ext,
+    parse_duration,
+)
+
+
+class TheWeatherChannelIE(ThePlatformIE):
+    _VALID_URL = r'https?://(?:www\.)?weather\.com/(?:[^/]+/)*video/(?P<id>[^/?#]+)'
+    _TESTS = [{
+        'url': 'https://weather.com/series/great-outdoors/video/ice-climber-is-in-for-a-shock',
+        'md5': 'ab924ac9574e79689c24c6b95e957def',
+        'info_dict': {
+            'id': 'cc82397e-cc3f-4d11-9390-a785add090e8',
+            'ext': 'mp4',
+            'title': 'Ice Climber Is In For A Shock',
+            'description': 'md5:55606ce1378d4c72e6545e160c9d9695',
+            'uploader': 'TWC - Digital (No Distro)',
+            'uploader_id': '6ccd5455-16bb-46f2-9c57-ff858bb9f62c',
+        }
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        drupal_settings = self._parse_json(self._search_regex(
+            r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
+            webpage, 'drupal settings'), display_id)
+        video_id = drupal_settings['twc']['contexts']['node']['uuid']
+        video_data = self._download_json(
+            'https://dsx.weather.com/cms/v4/asset-collection/en_US/' + video_id, video_id)
+        seo_meta = video_data.get('seometa', {})
+        title = video_data.get('title') or seo_meta['title']
+
+        urls = []
+        thumbnails = []
+        formats = []
+        for variant_id, variant_url in video_data.get('variants', []).items():
+            variant_url = variant_url.strip()
+            if not variant_url or variant_url in urls:
+                continue
+            urls.append(variant_url)
+            ext = determine_ext(variant_url)
+            if ext == 'jpg':
+                thumbnails.append({
+                    'url': variant_url,
+                    'id': variant_id,
+                })
+            elif ThePlatformIE.suitable(variant_url):
+                tp_formats, _ = self._extract_theplatform_smil(variant_url, video_id)
+                formats.extend(tp_formats)
+            elif ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    variant_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id=variant_id, fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    variant_url, video_id, f4m_id=variant_id, fatal=False))
+            else:
+                formats.append({
+                    'url': variant_url,
+                    'format_id': variant_id,
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': video_data.get('description') or seo_meta.get('description') or seo_meta.get('og:description'),
+            'duration': parse_duration(video_data.get('duration')),
+            'uploader': video_data.get('providername'),
+            'uploader_id': video_data.get('providerid'),
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/thisav.py b/youtube_dl/extractor/thisav.py

index 7f323c938762f6ec1337b6dbf6b9c64ec2993dd8..4473a3c773c3d9c4c26361e907769e8bb1ac9fad 100644 (file)
--- a/youtube_dl/extractor/thisav.py
+++ b/youtube_dl/extractor/thisav.py
@@ -3,13 +3,13 @@ from __future__ import unicode_literals
  
  import re
  
-from .common import InfoExtractor
-from ..utils import determine_ext
+from .jwplatform import JWPlatformBaseIE
+from ..utils import remove_end
  
  
-class ThisAVIE(InfoExtractor):
+class ThisAVIE(JWPlatformBaseIE):
      _VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html',
          'md5': '0480f1ef3932d901f0e0e719f188f19b',
          'info_dict': {
@@ -19,29 +19,49 @@ class ThisAVIE(InfoExtractor):
              'uploader': 'dj7970',
              'uploader_id': 'dj7970'
          }
-    }
+    }, {
+        'url': 'http://www.thisav.com/video/242352/nerdy-18yo-big-ass-tattoos-and-glasses.html',
+        'md5': 'ba90c076bd0f80203679e5b60bf523ee',
+        'info_dict': {
+            'id': '242352',
+            'ext': 'mp4',
+            'title': 'Nerdy 18yo Big Ass Tattoos and Glasses',
+            'uploader': 'cybersluts',
+            'uploader_id': 'cybersluts',
+        },
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
  
          video_id = mobj.group('id')
          webpage = self._download_webpage(url, video_id)
-        title = self._html_search_regex(r'<h1>([^<]*)</h1>', webpage, 'title')
+        title = remove_end(self._html_search_regex(
+            r'<title>([^<]+)</title>', webpage, 'title'),
+            ' - 視頻 - ThisAV.com-世界第一中文成人娛樂網站')
          video_url = self._html_search_regex(
-            r"addVariable\('file','([^']+)'\);", webpage, 'video url')
+            r"addVariable\('file','([^']+)'\);", webpage, 'video url', default=None)
+        if video_url:
+            info_dict = {
+                'formats': [{
+                    'url': video_url,
+                }],
+            }
+        else:
+            info_dict = self._extract_jwplayer_data(
+                webpage, video_id, require_title=False)
          uploader = self._html_search_regex(
              r': <a href="http://www.thisav.com/user/[0-9]+/(?:[^"]+)">([^<]+)</a>',
              webpage, 'uploader name', fatal=False)
          uploader_id = self._html_search_regex(
              r': <a href="http://www.thisav.com/user/[0-9]+/([^"]+)">(?:[^<]+)</a>',
              webpage, 'uploader id', fatal=False)
-        ext = determine_ext(video_url)
  
-        return {
+        info_dict.update({
              'id': video_id,
-            'url': video_url,
              'uploader': uploader,
              'uploader_id': uploader_id,
              'title': title,
-            'ext': ext,
-        }
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/thisoldhouse.py b/youtube_dl/extractor/thisoldhouse.py

new file mode 100644 (file)

index 0000000..7629f0d
--- /dev/null
+++ b/youtube_dl/extractor/thisoldhouse.py
@@ -0,0 +1,32 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class ThisOldHouseIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to)/(?P<id>[^/?#]+)'
+    _TESTS = [{
+        'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
+        'md5': '568acf9ca25a639f0c4ff905826b662f',
+        'info_dict': {
+            'id': '2REGtUDQ',
+            'ext': 'mp4',
+            'title': 'How to Build a Storage Bench',
+            'description': 'In the workshop, Tom Silva and Kevin O\'Connor build a storage bench for an entryway.',
+            'timestamp': 1442548800,
+            'upload_date': '20150918',
+        }
+    }, {
+        'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        drupal_settings = self._parse_json(self._search_regex(
+            r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
+            webpage, 'drupal settings'), display_id)
+        video_id = drupal_settings['jwplatform']['video_id']
+        return self.url_result('jwplatform:' + video_id, 'JWPlatform', video_id)
diff --git a/youtube_dl/extractor/threeqsdn.py b/youtube_dl/extractor/threeqsdn.py

new file mode 100644 (file)

index 0000000..f26937d
--- /dev/null
+++ b/youtube_dl/extractor/threeqsdn.py
@@ -0,0 +1,142 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    js_to_json,
+    mimetype2ext,
+)
+
+
+class ThreeQSDNIE(InfoExtractor):
+    IE_NAME = '3qsdn'
+    IE_DESC = '3Q SDN'
+    _VALID_URL = r'https?://playout\.3qsdn\.com/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
+    _TESTS = [{
+        # ondemand from http://www.philharmonie.tv/veranstaltung/26/
+        'url': 'http://playout.3qsdn.com/0280d6b9-1215-11e6-b427-0cc47a188158?protocol=http',
+        'md5': 'ab040e37bcfa2e0c079f92cb1dd7f6cd',
+        'info_dict': {
+            'id': '0280d6b9-1215-11e6-b427-0cc47a188158',
+            'ext': 'mp4',
+            'title': '0280d6b9-1215-11e6-b427-0cc47a188158',
+            'is_live': False,
+        },
+        'expected_warnings': ['Failed to download MPD manifest', 'Failed to parse JSON'],
+    }, {
+        # live video stream
+        'url': 'https://playout.3qsdn.com/d755d94b-4ab9-11e3-9162-0025907ad44f?js=true',
+        'info_dict': {
+            'id': 'd755d94b-4ab9-11e3-9162-0025907ad44f',
+            'ext': 'mp4',
+            'title': 're:^d755d94b-4ab9-11e3-9162-0025907ad44f [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+            'is_live': True,
+        },
+        'params': {
+            'skip_download': True,  # m3u8 downloads
+        },
+        'expected_warnings': ['Failed to download MPD manifest'],
+    }, {
+        # live audio stream
+        'url': 'http://playout.3qsdn.com/9edf36e0-6bf2-11e2-a16a-9acf09e2db48',
+        'only_matching': True,
+    }, {
+        # live audio stream with some 404 URLs
+        'url': 'http://playout.3qsdn.com/ac5c3186-777a-11e2-9c30-9acf09e2db48',
+        'only_matching': True,
+    }, {
+        # geo restricted with 'This content is not available in your country'
+        'url': 'http://playout.3qsdn.com/d63a3ffe-75e8-11e2-9c30-9acf09e2db48',
+        'only_matching': True,
+    }, {
+        # geo restricted with 'playout.3qsdn.com/forbidden'
+        'url': 'http://playout.3qsdn.com/8e330f26-6ae2-11e2-a16a-9acf09e2db48',
+        'only_matching': True,
+    }, {
+        # live video with rtmp link
+        'url': 'https://playout.3qsdn.com/6092bb9e-8f72-11e4-a173-002590c750be',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+\b(?:data-)?src=(["\'])(?P<url>%s.*?)\1' % ThreeQSDNIE._VALID_URL, webpage)
+        if mobj:
+            return mobj.group('url')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        js = self._download_webpage(
+            'http://playout.3qsdn.com/%s' % video_id, video_id,
+            query={'js': 'true'})
+
+        if any(p in js for p in (
+                '>This content is not available in your country',
+                'playout.3qsdn.com/forbidden')):
+            self.raise_geo_restricted()
+
+        stream_content = self._search_regex(
+            r'streamContent\s*:\s*(["\'])(?P<content>.+?)\1', js,
+            'stream content', default='demand', group='content')
+
+        live = stream_content == 'live'
+
+        stream_type = self._search_regex(
+            r'streamType\s*:\s*(["\'])(?P<type>audio|video)\1', js,
+            'stream type', default='video', group='type')
+
+        formats = []
+        urls = set()
+
+        def extract_formats(item_url, item={}):
+            if not item_url or item_url in urls:
+                return
+            urls.add(item_url)
+            ext = mimetype2ext(item.get('type')) or determine_ext(item_url, default_ext=None)
+            if ext == 'mpd':
+                formats.extend(self._extract_mpd_formats(
+                    item_url, video_id, mpd_id='mpd', fatal=False))
+            elif ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    item_url, video_id, 'mp4',
+                    entry_protocol='m3u8' if live else 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    item_url, video_id, f4m_id='hds', fatal=False))
+            else:
+                if not self._is_valid_url(item_url, video_id):
+                    return
+                formats.append({
+                    'url': item_url,
+                    'format_id': item.get('quality'),
+                    'ext': 'mp4' if item_url.startswith('rtsp') else ext,
+                    'vcodec': 'none' if stream_type == 'audio' else None,
+                })
+
+        for item_js in re.findall(r'({[^{]*?\b(?:src|source)\s*:\s*["\'].+?})', js):
+            f = self._parse_json(
+                item_js, video_id, transform_source=js_to_json, fatal=False)
+            if not f:
+                continue
+            extract_formats(f.get('src'), f)
+
+        # More relaxed version to collect additional URLs and acting
+        # as a future-proof fallback
+        for _, src in re.findall(r'\b(?:src|source)\s*:\s*(["\'])((?:https?|rtsp)://.+?)\1', js):
+            extract_formats(src)
+
+        self._sort_formats(formats)
+
+        title = self._live_title(video_id) if live else video_id
+
+        return {
+            'id': video_id,
+            'title': title,
+            'is_live': live,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/thvideo.py b/youtube_dl/extractor/thvideo.py

deleted file mode 100644 (file)

index 406f4a8..0000000
--- a/youtube_dl/extractor/thvideo.py
+++ /dev/null
@@ -1,84 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    unified_strdate
-)
-
-
-class THVideoIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?thvideo\.tv/(?:v/th|mobile\.php\?cid=)(?P<id>[0-9]+)'
-    _TEST = {
-        'url': 'http://thvideo.tv/v/th1987/',
-        'md5': 'fa107b1f73817e325e9433505a70db50',
-        'info_dict': {
-            'id': '1987',
-            'ext': 'mp4',
-            'title': '【动画】秘封活动记录 ～ The Sealed Esoteric History.分镜稿预览',
-            'display_id': 'th1987',
-            'thumbnail': 'http://thvideo.tv/uploadfile/2014/0722/20140722013459856.jpg',
-            'description': '社团京都幻想剧团的第一个东方二次同人动画作品「秘封活动记录 ～ The Sealed Esoteric History.」 本视频是该动画第一期的分镜草稿...',
-            'upload_date': '20140722'
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        # extract download link from mobile player page
-        webpage_player = self._download_webpage(
-            'http://thvideo.tv/mobile.php?cid=%s-0' % (video_id),
-            video_id, note='Downloading video source page')
-        video_url = self._html_search_regex(
-            r'<source src="(.*?)" type', webpage_player, 'video url')
-
-        # extract video info from main page
-        webpage = self._download_webpage(
-            'http://thvideo.tv/v/th%s' % (video_id), video_id)
-        title = self._og_search_title(webpage)
-        display_id = 'th%s' % video_id
-        thumbnail = self._og_search_thumbnail(webpage)
-        description = self._og_search_description(webpage)
-        upload_date = unified_strdate(self._html_search_regex(
-            r'span itemprop="datePublished" content="(.*?)">', webpage,
-            'upload date', fatal=False))
-
-        return {
-            'id': video_id,
-            'ext': 'mp4',
-            'url': video_url,
-            'title': title,
-            'display_id': display_id,
-            'thumbnail': thumbnail,
-            'description': description,
-            'upload_date': upload_date
-        }
-
-
-class THVideoPlaylistIE(InfoExtractor):
-    _VALID_URL = r'http?://(?:www\.)?thvideo\.tv/mylist(?P<id>[0-9]+)'
-    _TEST = {
-        'url': 'http://thvideo.tv/mylist2',
-        'info_dict': {
-            'id': '2',
-            'title': '幻想万華鏡',
-        },
-        'playlist_mincount': 23,
-    }
-
-    def _real_extract(self, url):
-        playlist_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, playlist_id)
-        list_title = self._html_search_regex(
-            r'<h1 class="show_title">(.*?)<b id', webpage, 'playlist title',
-            fatal=False)
-
-        entries = [
-            self.url_result('http://thvideo.tv/v/th' + id, 'THVideo')
-            for id in re.findall(r'<dd><a href="http://thvideo.tv/v/th(\d+)/" target=', webpage)]
-
-        return self.playlist_result(entries, playlist_id, list_title)
diff --git a/youtube_dl/extractor/tlc.py b/youtube_dl/extractor/tlc.py

index abad3ff64b5e519414615d3dd3cf8da345e9a2f3..fd145ba429fbc94ec5582b6100660f2897b25f5f 100644 (file)
--- a/youtube_dl/extractor/tlc.py
+++ b/youtube_dl/extractor/tlc.py
@@ -1,15 +1,19 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
+
  import re
  
  from .common import InfoExtractor
  from .brightcove import BrightcoveLegacyIE
-from ..compat import compat_parse_qs
+from ..compat import (
+    compat_parse_qs,
+    compat_urlparse,
+)
  
  
  class TlcDeIE(InfoExtractor):
      IE_NAME = 'tlc.de'
-    _VALID_URL = r'https?://www\.tlc\.de/(?:[^/]+/)*videos/(?P<title>[^/?#]+)?(?:.*#(?P<id>\d+))?'
+    _VALID_URL = r'https?://(?:www\.)?tlc\.de/(?:[^/]+/)*videos/(?P<title>[^/?#]+)?(?:.*#(?P<id>\d+))?'
  
      _TEST = {
          'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
@@ -35,5 +39,5 @@ class TlcDeIE(InfoExtractor):
              title = mobj.group('title')
              webpage = self._download_webpage(url, title)
              brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
-            brightcove_id = compat_parse_qs(brightcove_legacy_url)['@videoPlayer'][0]
+            brightcove_id = compat_parse_qs(compat_urlparse.urlparse(brightcove_legacy_url).query)['@videoPlayer'][0]
          return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/tmz.py b/youtube_dl/extractor/tmz.py

index 7dbe68b5c228ea6d3fa20d835a60ecf38b820816..419f9d92eea375e4aeabc0c39f736955b818c921 100644 (file)
--- a/youtube_dl/extractor/tmz.py
+++ b/youtube_dl/extractor/tmz.py
@@ -5,43 +5,42 @@ from .common import InfoExtractor
  
  
  class TMZIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?tmz\.com/videos/(?P<id>[^/]+)/?'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?tmz\.com/videos/(?P<id>[^/?#]+)'
+    _TESTS = [{
          'url': 'http://www.tmz.com/videos/0_okj015ty/',
-        'md5': '791204e3bf790b1426cb2db0706184c0',
+        'md5': '4d22a51ef205b6c06395d8394f72d560',
          'info_dict': {
              'id': '0_okj015ty',
-            'url': 'http://tmz.vo.llnwd.net/o28/2014-03/13/0_okj015ty_0_rt8ro3si_2.mp4',
              'ext': 'mp4',
              'title': 'Kim Kardashian\'s Boobs Unlock a Mystery!',
              'description': 'Did Kim Kardasain try to one-up Khloe by one-upping Kylie???  Or is she just showing off her amazing boobs?',
-            'thumbnail': r're:http://cdnbakmi\.kaltura\.com/.*thumbnail.*',
+            'timestamp': 1394747163,
+            'uploader_id': 'batchUser',
+            'upload_date': '20140313',
          }
-    }
+    }, {
+        'url': 'http://www.tmz.com/videos/0-cegprt2p/',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        return {
-            'id': video_id,
-            'url': self._html_search_meta('VideoURL', webpage, fatal=True),
-            'title': self._og_search_title(webpage),
-            'description': self._og_search_description(webpage),
-            'thumbnail': self._html_search_meta('ThumbURL', webpage),
-        }
+        video_id = self._match_id(url).replace('-', '_')
+        return self.url_result('kaltura:591531:%s' % video_id, 'Kaltura', video_id)
  
  
  class TMZArticleIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?tmz\.com/\d{4}/\d{2}/\d{2}/(?P<id>[^/]+)/?'
      _TEST = {
          'url': 'http://www.tmz.com/2015/04/19/bobby-brown-bobbi-kristina-awake-video-concert',
-        'md5': 'e482a414a38db73087450e3a6ce69d00',
+        'md5': '3316ff838ae5bb7f642537825e1e90d2',
          'info_dict': {
              'id': '0_6snoelag',
-            'ext': 'mp4',
+            'ext': 'mov',
              'title': 'Bobby Brown Tells Crowd ... Bobbi Kristina is Awake',
              'description': 'Bobby Brown stunned his audience during a concert Saturday night, when he told the crowd, "Bobbi is awake.  She\'s watching me."',
+            'timestamp': 1429467813,
+            'upload_date': '20150419',
+            'uploader_id': 'batchUser',
          }
      }
  
@@ -49,12 +48,9 @@ class TMZArticleIE(InfoExtractor):
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
-        embedded_video_info_str = self._html_search_regex(
-            r'tmzVideoEmbedV2\("([^)]+)"\);', webpage, 'embedded video info')
-
-        embedded_video_info = self._parse_json(
-            embedded_video_info_str, video_id,
-            transform_source=lambda s: s.replace('\\', ''))
+        embedded_video_info = self._parse_json(self._html_search_regex(
+            r'tmzVideoEmbed\(({.+?})\);', webpage, 'embedded video info'),
+            video_id)
  
          return self.url_result(
              'http://www.tmz.com/videos/%s/' % embedded_video_info['id'])
diff --git a/youtube_dl/extractor/tnaflix.py b/youtube_dl/extractor/tnaflix.py

index 79f036fe4eca77f57ddd9e1fd912317f9af00ba5..77d56b8ca87306a66c22a7e41c5d01de6bba9cb6 100644 (file)
--- a/youtube_dl/extractor/tnaflix.py
+++ b/youtube_dl/extractor/tnaflix.py
@@ -10,6 +10,7 @@ from ..utils import (
      int_or_none,
      parse_duration,
      str_to_int,
+    unescapeHTML,
      xpath_text,
  )
  
@@ -76,7 +77,12 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
          webpage = self._download_webpage(url, display_id)
  
          cfg_url = self._proto_relative_url(self._html_search_regex(
-            self._CONFIG_REGEX, webpage, 'flashvars.config'), 'http:')
+            self._CONFIG_REGEX, webpage, 'flashvars.config', default=None), 'http:')
+
+        if not cfg_url:
+            inputs = self._hidden_inputs(webpage)
+            cfg_url = ('https://cdn-fck.tnaflix.com/tnaflix/%s.fid?key=%s&VID=%s&premium=1&vip=1&alpha'
+                       % (inputs['vkey'], inputs['nkey'], video_id))
  
          cfg_xml = self._download_xml(
              cfg_url, display_id, 'Downloading metadata',
@@ -85,7 +91,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
          formats = []
  
          def extract_video_url(vl):
-            return re.sub('speed=\d+', 'speed=', vl.text)
+            return re.sub('speed=\d+', 'speed=', unescapeHTML(vl.text))
  
          video_link = cfg_xml.find('./videoLink')
          if video_link is not None:
@@ -114,8 +120,12 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
              xpath_text(cfg_xml, './startThumb', 'thumbnail'), 'http:')
          thumbnails = self._extract_thumbnails(cfg_xml)
  
-        title = self._html_search_regex(
-            self._TITLE_REGEX, webpage, 'title') if self._TITLE_REGEX else self._og_search_title(webpage)
+        title = None
+        if self._TITLE_REGEX:
+            title = self._html_search_regex(
+                self._TITLE_REGEX, webpage, 'title', default=None)
+        if not title:
+            title = self._og_search_title(webpage)
  
          age_limit = self._rta_search(webpage) or 18
  
@@ -132,7 +142,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
          average_rating = float_or_none(extract_field(self._AVERAGE_RATING_REGEX, 'average rating'))
  
          categories_str = extract_field(self._CATEGORIES_REGEX, 'categories')
-        categories = categories_str.split(', ') if categories_str is not None else []
+        categories = [c.strip() for c in categories_str.split(',')] if categories_str is not None else []
  
          return {
              'id': video_id,
@@ -185,9 +195,10 @@ class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE):
  class TNAFlixIE(TNAFlixNetworkBaseIE):
      _VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'
  
-    _TITLE_REGEX = r'<title>(.+?) - TNAFlix Porn Videos</title>'
-    _DESCRIPTION_REGEX = r'<h3 itemprop="description">([^<]+)</h3>'
-    _UPLOADER_REGEX = r'(?s)<span[^>]+class="infoTitle"[^>]*>Uploaded By:</span>(.+?)<div'
+    _TITLE_REGEX = r'<title>(.+?) - (?:TNAFlix Porn Videos|TNAFlix\.com)</title>'
+    _DESCRIPTION_REGEX = r'(?s)>Description:</[^>]+>(.+?)<'
+    _UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h\d+>(.+?)<'
+    _CATEGORIES_REGEX = r'(?s)<span[^>]*>Categories:</span>(.+?)</div>'
  
      _TESTS = [{
          # anonymous uploader, no categories
@@ -201,8 +212,7 @@ class TNAFlixIE(TNAFlixNetworkBaseIE):
              'thumbnail': 're:https?://.*\.jpg$',
              'duration': 91,
              'age_limit': 18,
-            'uploader': 'Anonymous',
-            'categories': [],
+            'categories': ['Porn Stars'],
          }
      }, {
          # non-anonymous uploader, categories
diff --git a/youtube_dl/extractor/tonline.py b/youtube_dl/extractor/tonline.py

new file mode 100644 (file)

index 0000000..cc11eae
--- /dev/null
+++ b/youtube_dl/extractor/tonline.py
@@ -0,0 +1,59 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class TOnlineIE(InfoExtractor):
+    IE_NAME = 't-online.de'
+    _VALID_URL = r'https?://(?:www\.)?t-online\.de/tv/(?:[^/]+/)*id_(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.t-online.de/tv/sport/fussball/id_79166266/drittes-remis-zidane-es-muss-etwas-passieren-.html',
+        'md5': '7d94dbdde5f9d77c5accc73c39632c29',
+        'info_dict': {
+            'id': '79166266',
+            'ext': 'mp4',
+            'title': 'Drittes Remis! Zidane: "Es muss etwas passieren"',
+            'description': 'Es läuft nicht rund bei Real Madrid. Das 1:1 gegen den SD Eibar war das dritte Unentschieden in Folge in der Liga.',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_json(
+            'http://www.t-online.de/tv/id_%s/tid_json_video' % video_id, video_id)
+        title = video_data['subtitle']
+
+        formats = []
+        for asset in video_data.get('assets', []):
+            asset_source = asset.get('source') or asset.get('source2')
+            if not asset_source:
+                continue
+            formats_id = []
+            for field_key in ('type', 'profile'):
+                field_value = asset.get(field_key)
+                if field_value:
+                    formats_id.append(field_value)
+            formats.append({
+                'format_id': '-'.join(formats_id),
+                'url': asset_source,
+            })
+
+        thumbnails = []
+        for image in video_data.get('images', []):
+            image_source = image.get('source')
+            if not image_source:
+                continue
+            thumbnails.append({
+                'url': image_source,
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'duration': int_or_none(video_data.get('duration')),
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/toutv.py b/youtube_dl/extractor/toutv.py

index 4797d1310aaeec2664d822c052f26be5ea5210af..26d770992ab1618c95094513371a01fcf99d1a70 100644 (file)
--- a/youtube_dl/extractor/toutv.py
+++ b/youtube_dl/extractor/toutv.py
@@ -1,74 +1,100 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
+    int_or_none,
+    js_to_json,
      ExtractorError,
-    unified_strdate,
+    urlencode_postdata,
+    extract_attributes,
+    smuggle_url,
  )
  
  
  class TouTvIE(InfoExtractor):
+    _NETRC_MACHINE = 'toutv'
      IE_NAME = 'tou.tv'
-    _VALID_URL = r'https?://www\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+(?:/(?P<episode>S[0-9]+E[0-9]+)))'
+    _VALID_URL = r'https?://ici\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+(?:/S[0-9]+E[0-9]+)?)'
+    _access_token = None
+    _claims = None
  
-    _TEST = {
-        'url': 'http://www.tou.tv/30-vies/S04E41',
+    _TESTS = [{
+        'url': 'http://ici.tou.tv/garfield-tout-court/S2015E17',
          'info_dict': {
-            'id': '30-vies_S04E41',
+            'id': '122017',
              'ext': 'mp4',
-            'title': '30 vies Saison 4 / Épisode 41',
-            'description': 'md5:da363002db82ccbe4dafeb9cab039b09',
-            'age_limit': 8,
-            'uploader': 'Groupe des Nouveaux Médias',
-            'duration': 1296,
-            'upload_date': '20131118',
-            'thumbnail': 'http://static.tou.tv/medias/images/2013-11-18_19_00_00_30VIES_0341_01_L.jpeg',
+            'title': 'Saison 2015 Épisode 17',
+            'description': 'La photo de famille 2',
+            'upload_date': '20100717',
          },
          'params': {
-            'skip_download': True,  # Requires rtmpdump
+            # m3u8 download
+            'skip_download': True,
          },
-        'skip': 'Only available in Canada'
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        webpage = self._download_webpage(url, video_id)
-
-        mediaId = self._search_regex(
-            r'"idMedia":\s*"([^"]+)"', webpage, 'media ID')
-
-        streams_url = 'http://release.theplatform.com/content.select?pid=' + mediaId
-        streams_doc = self._download_xml(
-            streams_url, video_id, note='Downloading stream list')
+        'skip': '404 Not Found',
+    }, {
+        'url': 'http://ici.tou.tv/hackers',
+        'only_matching': True,
+    }]
  
-        video_url = next(n.text
-                         for n in streams_doc.findall('.//choice/url')
-                         if '//ad.doubleclick' not in n.text)
-        if video_url.endswith('/Unavailable.flv'):
-            raise ExtractorError(
-                'Access to this video is blocked from outside of Canada',
-                expected=True)
+    def _real_initialize(self):
+        email, password = self._get_login_info()
+        if email is None:
+            return
+        state = 'http://ici.tou.tv//'
+        webpage = self._download_webpage(state, None, 'Downloading homepage')
+        toutvlogin = self._parse_json(self._search_regex(
+            r'(?s)toutvlogin\s*=\s*({.+?});', webpage, 'toutvlogin'), None, js_to_json)
+        authorize_url = toutvlogin['host'] + '/auth/oauth/v2/authorize'
+        login_webpage = self._download_webpage(
+            authorize_url, None, 'Downloading login page', query={
+                'client_id': toutvlogin['clientId'],
+                'redirect_uri': 'https://ici.tou.tv/login/loginCallback',
+                'response_type': 'token',
+                'scope': 'media-drmt openid profile email id.write media-validation.read.privileged',
+                'state': state,
+            })
+        login_form = self._search_regex(
+            r'(?s)(<form[^>]+(?:id|name)="Form-login".+?</form>)', login_webpage, 'login form')
+        form_data = self._hidden_inputs(login_form)
+        form_data.update({
+            'login-email': email,
+            'login-password': password,
+        })
+        post_url = extract_attributes(login_form).get('action') or authorize_url
+        _, urlh = self._download_webpage_handle(
+            post_url, None, 'Logging in', data=urlencode_postdata(form_data))
+        self._access_token = self._search_regex(
+            r'access_token=([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
+            urlh.geturl(), 'access token')
+        self._claims = self._download_json(
+            'https://services.radio-canada.ca/media/validation/v2/getClaims',
+            None, 'Extracting Claims', query={
+                'token': self._access_token,
+                'access_token': self._access_token,
+            })['claims']
  
-        duration_str = self._html_search_meta(
-            'video:duration', webpage, 'duration')
-        duration = int(duration_str) if duration_str else None
-        upload_date_str = self._html_search_meta(
-            'video:release_date', webpage, 'upload date')
-        upload_date = unified_strdate(upload_date_str) if upload_date_str else None
+    def _real_extract(self, url):
+        path = self._match_id(url)
+        metadata = self._download_json('http://ici.tou.tv/presentation/%s' % path, path)
+        if metadata.get('IsDrm'):
+            raise ExtractorError('This video is DRM protected.', expected=True)
+        video_id = metadata['IdMedia']
+        details = metadata['Details']
+        title = details['OriginalTitle']
+        video_url = 'radiocanada:%s:%s' % (metadata.get('AppCode', 'toutv'), video_id)
+        if self._access_token and self._claims:
+            video_url = smuggle_url(video_url, {
+                'access_token': self._access_token,
+                'claims': self._claims,
+            })
  
          return {
-            'id': video_id,
-            'title': self._og_search_title(webpage),
+            '_type': 'url_transparent',
              'url': video_url,
-            'description': self._og_search_description(webpage),
-            'uploader': self._dc_search_uploader(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
-            'age_limit': self._media_rating_search(webpage),
-            'duration': duration,
-            'upload_date': upload_date,
-            'ext': 'mp4',
+            'id': video_id,
+            'title': title,
+            'thumbnail': details.get('ImageUrl'),
+            'duration': int_or_none(details.get('LengthInSeconds')),
          }
diff --git a/youtube_dl/extractor/toypics.py b/youtube_dl/extractor/toypics.py

index 2579ba8c67498c91aa117c6853b83f391ccb3ba6..938e05076313cb5b3d3284083d2cc7e699241d21 100644 (file)
--- a/youtube_dl/extractor/toypics.py
+++ b/youtube_dl/extractor/toypics.py
@@ -1,4 +1,4 @@
-# -*- coding:utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
diff --git a/youtube_dl/extractor/trollvids.py b/youtube_dl/extractor/trollvids.py

deleted file mode 100644 (file)

index 6577056..0000000
--- a/youtube_dl/extractor/trollvids.py
+++ /dev/null
@@ -1,36 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .nuevo import NuevoBaseIE
-
-
-class TrollvidsIE(NuevoBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?trollvids\.com/video/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
-    IE_NAME = 'trollvids'
-    _TEST = {
-        'url': 'http://trollvids.com/video/2349002/%E3%80%90MMD-R-18%E3%80%91%E3%82%AC%E3%83%BC%E3%83%AB%E3%83%95%E3%83%AC%E3%83%B3%E3%83%89-carrymeoff',
-        'md5': '1d53866b2c514b23ed69e4352fdc9839',
-        'info_dict': {
-            'id': '2349002',
-            'ext': 'mp4',
-            'title': '【MMD R-18】ガールフレンド carry_me_off',
-            'age_limit': 18,
-            'duration': 216.78,
-        },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-
-        info = self._extract_nuevo(
-            'http://trollvids.com/nuevo/player/config.php?v=%s' % video_id,
-            video_id)
-        info.update({
-            'display_id': display_id,
-            'age_limit': 18
-        })
-        return info
diff --git a/youtube_dl/extractor/trutube.py b/youtube_dl/extractor/trutube.py

deleted file mode 100644 (file)

index d55e0c5..0000000
--- a/youtube_dl/extractor/trutube.py
+++ /dev/null
@@ -1,26 +0,0 @@
-from __future__ import unicode_literals
-
-from .nuevo import NuevoBaseIE
-
-
-class TruTubeIE(NuevoBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?trutube\.tv/(?:video/|nuevo/player/embed\.php\?v=)(?P<id>\d+)'
-    _TESTS = [{
-        'url': 'http://trutube.tv/video/14880/Ramses-II-Proven-To-Be-A-Red-Headed-Caucasoid-',
-        'md5': 'c5b6e301b0a2040b074746cbeaa26ca1',
-        'info_dict': {
-            'id': '14880',
-            'ext': 'flv',
-            'title': 'Ramses II - Proven To Be A Red Headed Caucasoid',
-            'thumbnail': 're:^http:.*\.jpg$',
-        }
-    }, {
-        'url': 'https://trutube.tv/nuevo/player/embed.php?v=14880',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        return self._extract_nuevo(
-            'https://trutube.tv/nuevo/player/config.php?v=%s' % video_id,
-            video_id)
diff --git a/youtube_dl/extractor/trutv.py b/youtube_dl/extractor/trutv.py

new file mode 100644 (file)

index 0000000..3a57825
--- /dev/null
+++ b/youtube_dl/extractor/trutv.py
@@ -0,0 +1,47 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .turner import TurnerBaseIE
+
+
+class TruTVIE(TurnerBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?trutv\.com(?:(?P<path>/shows/[^/]+/videos/[^/?#]+?)\.html|/full-episodes/[^/]+/(?P<id>\d+))'
+    _TEST = {
+        'url': 'http://www.trutv.com/shows/10-things/videos/you-wont-believe-these-sports-bets.html',
+        'md5': '2cdc844f317579fed1a7251b087ff417',
+        'info_dict': {
+            'id': '/shows/10-things/videos/you-wont-believe-these-sports-bets',
+            'ext': 'mp4',
+            'title': 'You Won\'t Believe These Sports Bets',
+            'description': 'Jamie Lee sits down with a bookie to discuss the bizarre world of illegal sports betting.',
+            'upload_date': '20130305',
+        }
+    }
+
+    def _real_extract(self, url):
+        path, video_id = re.match(self._VALID_URL, url).groups()
+        auth_required = False
+        if path:
+            data_src = 'http://www.trutv.com/video/cvp/v2/xml/content.xml?id=%s.xml' % path
+        else:
+            webpage = self._download_webpage(url, video_id)
+            video_id = self._search_regex(
+                r"TTV\.TVE\.episodeId\s*=\s*'([^']+)';",
+                webpage, 'video id', default=video_id)
+            auth_required = self._search_regex(
+                r'TTV\.TVE\.authRequired\s*=\s*(true|false);',
+                webpage, 'auth required', default='false') == 'true'
+            data_src = 'http://www.trutv.com/tveverywhere/services/cvpXML.do?titleId=' + video_id
+        return self._extract_cvp_info(
+            data_src, path, {
+                'secure': {
+                    'media_src': 'http://androidhls-secure.cdn.turner.com/trutv/big',
+                    'tokenizer_src': 'http://www.trutv.com/tveverywhere/processors/services/token_ipadAdobe.do',
+                },
+            }, {
+                'url': url,
+                'site_name': 'truTV',
+                'auth_required': auth_required,
+            })
diff --git a/youtube_dl/extractor/tube8.py b/youtube_dl/extractor/tube8.py

index 1d9271d1e70c3f2b5ad30a6c69f6602bd58ac89a..1853a1104c2b8957793ede25c6296598eb0babc9 100644 (file)
--- a/youtube_dl/extractor/tube8.py
+++ b/youtube_dl/extractor/tube8.py
@@ -2,17 +2,14 @@ from __future__ import unicode_literals
  
  import re
  
-from .common import InfoExtractor
-from ..compat import compat_str
  from ..utils import (
      int_or_none,
-    sanitized_Request,
      str_to_int,
  )
-from ..aes import aes_decrypt_text
+from .keezmovies import KeezMoviesIE
  
  
-class Tube8IE(InfoExtractor):
+class Tube8IE(KeezMoviesIE):
      _VALID_URL = r'https?://(?:www\.)?tube8\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://www.tube8.com/teen/kasia-music-video/229795/',
@@ -26,54 +23,26 @@ class Tube8IE(InfoExtractor):
              'title': 'Kasia music video',
              'age_limit': 18,
              'duration': 230,
-        }
+            'categories': ['Teen'],
+            'tags': ['dancing'],
+        },
      }, {
          'url': 'http://www.tube8.com/shemale/teen/blonde-cd-gets-kidnapped-by-two-blacks-and-punished-for-being-a-slutty-girl/19569151/',
          'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        webpage, info = self._extract_info(url)
  
-        req = sanitized_Request(url)
-        req.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(req, display_id)
+        if not info['title']:
+            info['title'] = self._html_search_regex(
+                r'videoTitle\s*=\s*"([^"]+)', webpage, 'title')
  
-        flashvars = self._parse_json(
-            self._search_regex(
-                r'flashvars\s*=\s*({.+?});\r?\n', webpage, 'flashvars'),
-            video_id)
-
-        formats = []
-        for key, video_url in flashvars.items():
-            if not isinstance(video_url, compat_str) or not video_url.startswith('http'):
-                continue
-            height = self._search_regex(
-                r'quality_(\d+)[pP]', key, 'height', default=None)
-            if not height:
-                continue
-            if flashvars.get('encrypted') is True:
-                video_url = aes_decrypt_text(
-                    video_url, flashvars['video_title'], 32).decode('utf-8')
-            formats.append({
-                'url': video_url,
-                'format_id': '%sp' % height,
-                'height': int(height),
-            })
-        self._sort_formats(formats)
-
-        thumbnail = flashvars.get('image_url')
-
-        title = self._html_search_regex(
-            r'videoTitle\s*=\s*"([^"]+)', webpage, 'title')
          description = self._html_search_regex(
              r'>Description:</strong>\s*(.+?)\s*<', webpage, 'description', fatal=False)
          uploader = self._html_search_regex(
              r'<span class="username">\s*(.+?)\s*<',
              webpage, 'uploader', fatal=False)
-        duration = int_or_none(flashvars.get('video_duration'))
  
          like_count = int_or_none(self._search_regex(
              r'rupVar\s*=\s*"(\d+)"', webpage, 'like count', fatal=False))
@@ -86,18 +55,26 @@ class Tube8IE(InfoExtractor):
              r'<span id="allCommentsCount">(\d+)</span>',
              webpage, 'comment count', fatal=False))
  
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
+        category = self._search_regex(
+            r'Category:\s*</strong>\s*<a[^>]+href=[^>]+>([^<]+)',
+            webpage, 'category', fatal=False)
+        categories = [category] if category else None
+
+        tags_str = self._search_regex(
+            r'(?s)Tags:\s*</strong>(.+?)</(?!a)',
+            webpage, 'tags', fatal=False)
+        tags = [t for t in re.findall(
+            r'<a[^>]+href=[^>]+>([^<]+)', tags_str)] if tags_str else None
+
+        info.update({
              'description': description,
-            'thumbnail': thumbnail,
              'uploader': uploader,
-            'duration': duration,
              'view_count': view_count,
              'like_count': like_count,
              'dislike_count': dislike_count,
              'comment_count': comment_count,
-            'age_limit': 18,
-            'formats': formats,
-        }
+            'categories': categories,
+            'tags': tags,
+        })
+
+        return info
diff --git a/youtube_dl/extractor/tubitv.py b/youtube_dl/extractor/tubitv.py

index 7af233cd6bcb5695fa385850e2608f8a5d81dd61..3a37df2e8eb710c68d448b6231850ddc846ec716 100644 (file)
--- a/youtube_dl/extractor/tubitv.py
+++ b/youtube_dl/extractor/tubitv.py
@@ -1,7 +1,6 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import codecs
  import re
  
  from .common import InfoExtractor
@@ -14,21 +13,18 @@ from ..utils import (
  
  
  class TubiTvIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?tubitv\.com/video\?id=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?tubitv\.com/video/(?P<id>[0-9]+)'
      _LOGIN_URL = 'http://tubitv.com/login'
      _NETRC_MACHINE = 'tubitv'
      _TEST = {
-        'url': 'http://tubitv.com/video?id=54411&title=The_Kitchen_Musical_-_EP01',
+        'url': 'http://tubitv.com/video/283829/the_comedian_at_the_friday',
+        'md5': '43ac06be9326f41912dc64ccf7a80320',
          'info_dict': {
-            'id': '54411',
+            'id': '283829',
              'ext': 'mp4',
-            'title': 'The Kitchen Musical - EP01',
-            'thumbnail': 're:^https?://.*\.png$',
-            'description': 'md5:37532716166069b353e8866e71fefae7',
-            'duration': 2407,
-        },
-        'params': {
-            'skip_download': 'HLS download',
+            'title': 'The Comedian at The Friday',
+            'description': 'A stand up comedian is forced to look at the decisions in his life while on a one week trip to the west coast.',
+            'uploader_id': 'bc168bee0d18dd1cb3b86c68706ab434',
          },
      }
  
@@ -55,27 +51,39 @@ class TubiTvIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        video_data = self._download_json(
+            'http://tubitv.com/oz/videos/%s/content' % video_id, video_id)
+        title = video_data['title']
  
-        webpage = self._download_webpage(url, video_id)
-        if re.search(r"<(?:DIV|div) class='login-required-screen'>", webpage):
-            self.raise_login_required('This video requires login')
+        formats = self._extract_m3u8_formats(
+            self._proto_relative_url(video_data['url']),
+            video_id, 'mp4', 'm3u8_native')
+        self._sort_formats(formats)
  
-        title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-        duration = int_or_none(self._html_search_meta(
-            'video:duration', webpage, 'duration'))
+        thumbnails = []
+        for thumbnail_url in video_data.get('thumbnails', []):
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'url': self._proto_relative_url(thumbnail_url),
+            })
  
-        apu = self._search_regex(r"apu='([^']+)'", webpage, 'apu')
-        m3u8_url = codecs.decode(apu, 'rot_13')[::-1]
-        formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
-        self._sort_formats(formats)
+        subtitles = {}
+        for sub in video_data.get('subtitles', []):
+            sub_url = sub.get('url')
+            if not sub_url:
+                continue
+            subtitles.setdefault(sub.get('lang', 'English'), []).append({
+                'url': self._proto_relative_url(sub_url),
+            })
  
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
-            'thumbnail': thumbnail,
-            'description': description,
-            'duration': duration,
+            'subtitles': subtitles,
+            'thumbnails': thumbnails,
+            'description': video_data.get('description'),
+            'duration': int_or_none(video_data.get('duration')),
+            'uploader_id': video_data.get('publisher_id'),
          }
diff --git a/youtube_dl/extractor/tudou.py b/youtube_dl/extractor/tudou.py

index 9892e8a62bf9731b8d3e5768333f71baaecd6288..bb8b8e23424e7943f2133028aca187d4fcffeab9 100644 (file)
--- a/youtube_dl/extractor/tudou.py
+++ b/youtube_dl/extractor/tudou.py
@@ -5,6 +5,7 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
+    ExtractorError,
      int_or_none,
      InAdvancePagedList,
      float_or_none,
@@ -46,11 +47,27 @@ class TudouIE(InfoExtractor):
  
      _PLAYER_URL = 'http://js.tudouui.com/bin/lingtong/PortalPlayer_177.swf'
  
+    # Translated from tudou/tools/TVCHelper.as in PortalPlayer_193.swf
+    # 0001, 0002 and 4001 are not included as they indicate temporary issues
+    TVC_ERRORS = {
+        '0003': 'The video is deleted or does not exist',
+        '1001': 'This video is unavailable due to licensing issues',
+        '1002': 'This video is unavailable as it\'s under review',
+        '1003': 'This video is unavailable as it\'s under review',
+        '3001': 'Password required',
+        '5001': 'This video is available in Mainland China only due to licensing issues',
+        '7001': 'This video is unavailable',
+        '8001': 'This video is unavailable due to licensing issues',
+    }
+
      def _url_for_id(self, video_id, quality=None):
          info_url = 'http://v2.tudou.com/f?id=' + compat_str(video_id)
          if quality:
              info_url += '&hd' + quality
          xml_data = self._download_xml(info_url, video_id, 'Opening the info XML page')
+        error = xml_data.attrib.get('error')
+        if error is not None:
+            raise ExtractorError('Tudou said: %s' % error, expected=True)
          final_url = xml_data.text
          return final_url
  
@@ -63,6 +80,15 @@ class TudouIE(InfoExtractor):
          if youku_vcode:
              return self.url_result('youku:' + youku_vcode, ie='Youku')
  
+        if not item_data.get('itemSegs'):
+            tvc_code = item_data.get('tvcCode')
+            if tvc_code:
+                err_msg = self.TVC_ERRORS.get(tvc_code)
+                if err_msg:
+                    raise ExtractorError('Tudou said: %s' % err_msg, expected=True)
+                raise ExtractorError('Unexpected error %s returned from Tudou' % tvc_code)
+            raise ExtractorError('Unxpected error returned from Tudou')
+
          title = unescapeHTML(item_data['kw'])
          description = item_data.get('desc')
          thumbnail_url = item_data.get('pic')
diff --git a/youtube_dl/extractor/tumblr.py b/youtube_dl/extractor/tumblr.py

index e5bcf7798d69f55e8725fee2dc7ada9c7bf6ef85..ebe411e12aa5fa44e201dcaefc52e839e5b2d212 100644 (file)
--- a/youtube_dl/extractor/tumblr.py
+++ b/youtube_dl/extractor/tumblr.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -8,7 +8,7 @@ from ..utils import int_or_none
  
  
  class TumblrIE(InfoExtractor):
-    _VALID_URL = r'https?://(?P<blog_name>.*?)\.tumblr\.com/(?:post|video)/(?P<id>[0-9]+)(?:$|[/?#])'
+    _VALID_URL = r'https?://(?P<blog_name>[^/?#&]+)\.tumblr\.com/(?:post|video)/(?P<id>[0-9]+)(?:$|[/?#])'
      _TESTS = [{
          'url': 'http://tatianamaslanydaily.tumblr.com/post/54196191430/orphan-black-dvd-extra-behind-the-scenes',
          'md5': '479bb068e5b16462f5176a6828829767',
diff --git a/youtube_dl/extractor/turner.py b/youtube_dl/extractor/turner.py

new file mode 100644 (file)

index 0000000..57ffedb
--- /dev/null
+++ b/youtube_dl/extractor/turner.py
@@ -0,0 +1,175 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .adobepass import AdobePassIE
+from ..compat import compat_str
+from ..utils import (
+    xpath_text,
+    int_or_none,
+    determine_ext,
+    parse_duration,
+    xpath_attr,
+    update_url_query,
+    ExtractorError,
+)
+
+
+class TurnerBaseIE(AdobePassIE):
+    def _extract_timestamp(self, video_data):
+        return int_or_none(xpath_attr(video_data, 'dateCreated', 'uts'))
+
+    def _extract_cvp_info(self, data_src, video_id, path_data={}, ap_data={}):
+        video_data = self._download_xml(data_src, video_id)
+        video_id = video_data.attrib['id']
+        title = xpath_text(video_data, 'headline', fatal=True)
+        content_id = xpath_text(video_data, 'contentId') or video_id
+        # rtmp_src = xpath_text(video_data, 'akamai/src')
+        # if rtmp_src:
+        #     splited_rtmp_src = rtmp_src.split(',')
+        #     if len(splited_rtmp_src) == 2:
+        #         rtmp_src = splited_rtmp_src[1]
+        # aifp = xpath_text(video_data, 'akamai/aifp', default='')
+
+        tokens = {}
+        urls = []
+        formats = []
+        rex = re.compile(
+            r'(?P<width>[0-9]+)x(?P<height>[0-9]+)(?:_(?P<bitrate>[0-9]+))?')
+        # Possible formats locations: files/file, files/groupFiles/files
+        # and maybe others
+        for video_file in video_data.findall('.//file'):
+            video_url = video_file.text.strip()
+            if not video_url:
+                continue
+            ext = determine_ext(video_url)
+            if video_url.startswith('/mp4:protected/'):
+                continue
+                # TODO Correct extraction for these files
+                # protected_path_data = path_data.get('protected')
+                # if not protected_path_data or not rtmp_src:
+                #     continue
+                # protected_path = self._search_regex(
+                #     r'/mp4:(.+)\.[a-z0-9]', video_url, 'secure path')
+                # auth = self._download_webpage(
+                #     protected_path_data['tokenizer_src'], query={
+                #         'path': protected_path,
+                #         'videoId': content_id,
+                #         'aifp': aifp,
+                #     })
+                # token = xpath_text(auth, 'token')
+                # if not token:
+                #     continue
+                # video_url = rtmp_src + video_url + '?' + token
+            elif video_url.startswith('/secure/'):
+                secure_path_data = path_data.get('secure')
+                if not secure_path_data:
+                    continue
+                video_url = secure_path_data['media_src'] + video_url
+                secure_path = self._search_regex(r'https?://[^/]+(.+/)', video_url, 'secure path') + '*'
+                token = tokens.get(secure_path)
+                if not token:
+                    query = {
+                        'path': secure_path,
+                        'videoId': content_id,
+                    }
+                    if ap_data.get('auth_required'):
+                        query['accessToken'] = self._extract_mvpd_auth(ap_data['url'], video_id, ap_data['site_name'], ap_data['site_name'])
+                    auth = self._download_xml(
+                        secure_path_data['tokenizer_src'], video_id, query=query)
+                    error_msg = xpath_text(auth, 'error/msg')
+                    if error_msg:
+                        raise ExtractorError(error_msg, expected=True)
+                    token = xpath_text(auth, 'token')
+                    if not token:
+                        continue
+                    tokens[secure_path] = token
+                video_url = video_url + '?hdnea=' + token
+            elif not re.match('https?://', video_url):
+                base_path_data = path_data.get(ext, path_data.get('default', {}))
+                media_src = base_path_data.get('media_src')
+                if not media_src:
+                    continue
+                video_url = media_src + video_url
+            if video_url in urls:
+                continue
+            urls.append(video_url)
+            format_id = video_file.get('bitrate')
+            if ext == 'smil':
+                formats.extend(self._extract_smil_formats(
+                    video_url, video_id, fatal=False))
+            elif ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4',
+                    m3u8_id=format_id or 'hls', fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    update_url_query(video_url, {'hdcore': '3.7.0'}),
+                    video_id, f4m_id=format_id or 'hds', fatal=False))
+            else:
+                f = {
+                    'format_id': format_id,
+                    'url': video_url,
+                    'ext': ext,
+                }
+                mobj = rex.search(format_id + video_url)
+                if mobj:
+                    f.update({
+                        'width': int(mobj.group('width')),
+                        'height': int(mobj.group('height')),
+                        'tbr': int_or_none(mobj.group('bitrate')),
+                    })
+                elif isinstance(format_id, compat_str):
+                    if format_id.isdigit():
+                        f['tbr'] = int(format_id)
+                    else:
+                        mobj = re.match(r'ios_(audio|[0-9]+)$', format_id)
+                        if mobj:
+                            if mobj.group(1) == 'audio':
+                                f.update({
+                                    'vcodec': 'none',
+                                    'ext': 'm4a',
+                                })
+                            else:
+                                f['tbr'] = int(mobj.group(1))
+                formats.append(f)
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for source in video_data.findall('closedCaptions/source'):
+            for track in source.findall('track'):
+                track_url = track.get('url')
+                if not isinstance(track_url, compat_str) or track_url.endswith('/big'):
+                    continue
+                lang = track.get('lang') or track.get('label') or 'en'
+                subtitles.setdefault(lang, []).append({
+                    'url': track_url,
+                    'ext': {
+                        'scc': 'scc',
+                        'webvtt': 'vtt',
+                        'smptett': 'tt',
+                    }.get(source.get('format'))
+                })
+
+        thumbnails = [{
+            'id': image.get('cut'),
+            'url': image.text,
+            'width': int_or_none(image.get('width')),
+            'height': int_or_none(image.get('height')),
+        } for image in video_data.findall('images/image')]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+            'subtitles': subtitles,
+            'thumbnails': thumbnails,
+            'description': xpath_text(video_data, 'description'),
+            'duration': parse_duration(xpath_text(video_data, 'length') or xpath_text(video_data, 'trt')),
+            'timestamp': self._extract_timestamp(video_data),
+            'upload_date': xpath_attr(video_data, 'metas', 'version'),
+            'series': xpath_text(video_data, 'showTitle'),
+            'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
+            'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
+        }
diff --git a/youtube_dl/extractor/tv2.py b/youtube_dl/extractor/tv2.py

index 86bb7915db170ecf4c75fdc2b960160a65daa1c0..bd28267b0cb6a0154133c98f567c24f054b5459a 100644 (file)
--- a/youtube_dl/extractor/tv2.py
+++ b/youtube_dl/extractor/tv2.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -8,6 +8,7 @@ from ..utils import (
      determine_ext,
      int_or_none,
      float_or_none,
+    js_to_json,
      parse_iso8601,
      remove_end,
  )
@@ -54,10 +55,11 @@ class TV2IE(InfoExtractor):
                  ext = determine_ext(video_url)
                  if ext == 'f4m':
                      formats.extend(self._extract_f4m_formats(
-                        video_url, video_id, f4m_id=format_id))
+                        video_url, video_id, f4m_id=format_id, fatal=False))
                  elif ext == 'm3u8':
                      formats.extend(self._extract_m3u8_formats(
-                        video_url, video_id, 'mp4', m3u8_id=format_id))
+                        video_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                        m3u8_id=format_id, fatal=False))
                  elif ext == 'ism' or video_url.endswith('.ism/Manifest'):
                      pass
                  else:
@@ -105,7 +107,7 @@ class TV2ArticleIE(InfoExtractor):
          'url': 'http://www.tv2.no/2015/05/16/nyheter/alesund/krim/pingvin/6930542',
          'info_dict': {
              'id': '6930542',
-            'title': 'Russen hetses etter pingvintyveri – innrømmer å ha åpnet luken på buret',
+            'title': 'Russen hetses etter pingvintyveri - innrømmer å ha åpnet luken på buret',
              'description': 'md5:339573779d3eea3542ffe12006190954',
          },
          'playlist_count': 2,
@@ -119,9 +121,23 @@ class TV2ArticleIE(InfoExtractor):
  
          webpage = self._download_webpage(url, playlist_id)
  
+        # Old embed pattern (looks unused nowadays)
+        assets = re.findall(r'data-assetid=["\'](\d+)', webpage)
+
+        if not assets:
+            # New embed pattern
+            for v in re.findall('TV2ContentboxVideo\(({.+?})\)', webpage):
+                video = self._parse_json(
+                    v, playlist_id, transform_source=js_to_json, fatal=False)
+                if not video:
+                    continue
+                asset = video.get('assetId')
+                if asset:
+                    assets.append(asset)
+
          entries = [
-            self.url_result('http://www.tv2.no/v/%s' % video_id, 'TV2')
-            for video_id in re.findall(r'data-assetid="(\d+)"', webpage)]
+            self.url_result('http://www.tv2.no/v/%s' % asset_id, 'TV2')
+            for asset_id in assets]
  
          title = remove_end(self._og_search_title(webpage), ' - TV2.no')
          description = remove_end(self._og_search_description(webpage), ' - TV2.no')
diff --git a/youtube_dl/extractor/tv3.py b/youtube_dl/extractor/tv3.py

index d3f690dc713aa7f7c78238db1c9ea1fe47e6b651..3867ec90d016e84dcc717c9009cd12f190369364 100644 (file)
--- a/youtube_dl/extractor/tv3.py
+++ b/youtube_dl/extractor/tv3.py
@@ -21,6 +21,7 @@ class TV3IE(InfoExtractor):
              'Failed to download MPD manifest'
          ],
          'params': {
+            # m3u8 download
              'skip_download': True,
          },
      }
diff --git a/youtube_dl/extractor/tv4.py b/youtube_dl/extractor/tv4.py

index 343edf20663d172a4e071a77f228fc4d9962003d..5d2d8f13239e6ac5b10f5506143216301e5d4ecf 100644 (file)
--- a/youtube_dl/extractor/tv4.py
+++ b/youtube_dl/extractor/tv4.py
@@ -2,9 +2,13 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
+    int_or_none,
      parse_iso8601,
+    try_get,
+    update_url_query,
  )
  
  
@@ -65,36 +69,47 @@ class TV4IE(InfoExtractor):
          video_id = self._match_id(url)
  
          info = self._download_json(
-            'http://www.tv4play.se/player/assets/%s.json' % video_id, video_id, 'Downloading video info JSON')
+            'http://www.tv4play.se/player/assets/%s.json' % video_id,
+            video_id, 'Downloading video info JSON')
  
          # If is_geo_restricted is true, it doesn't necessarily mean we can't download it
-        if info['is_geo_restricted']:
+        if info.get('is_geo_restricted'):
              self.report_warning('This content might not be available in your country due to licensing restrictions.')
-        if info['requires_subscription']:
+        if info.get('requires_subscription'):
              raise ExtractorError('This content requires subscription.', expected=True)
  
-        sources_data = self._download_json(
-            'https://prima.tv4play.se/api/web/asset/%s/play.json?protocol=http&videoFormat=MP4' % video_id, video_id, 'Downloading sources JSON')
-        sources = sources_data['playback']
+        title = info['title']
  
          formats = []
-        for item in sources.get('items', {}).get('item', []):
-            ext, bitrate = item['mediaFormat'], item['bitrate']
-            formats.append({
-                'format_id': '%s_%s' % (ext, bitrate),
-                'tbr': bitrate,
-                'ext': ext,
-                'url': item['url'],
-            })
+        # http formats are linked with unresolvable host
+        for kind in ('hls', ''):
+            data = self._download_json(
+                'https://prima.tv4play.se/api/web/asset/%s/play.json' % video_id,
+                video_id, 'Downloading sources JSON', query={
+                    'protocol': kind,
+                    'videoFormat': 'MP4+WEBVTTS+WEBVTT',
+                })
+            item = try_get(data, lambda x: x['playback']['items']['item'], dict)
+            manifest_url = item.get('url')
+            if not isinstance(manifest_url, compat_str):
+                continue
+            if kind == 'hls':
+                formats.extend(self._extract_m3u8_formats(
+                    manifest_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id=kind, fatal=False))
+            else:
+                formats.extend(self._extract_f4m_formats(
+                    update_url_query(manifest_url, {'hdcore': '3.8.0'}),
+                    video_id, f4m_id='hds', fatal=False))
          self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'title': info['title'],
+            'title': title,
              'formats': formats,
              'description': info.get('description'),
              'timestamp': parse_iso8601(info.get('broadcast_date_time')),
-            'duration': info.get('duration'),
+            'duration': int_or_none(info.get('duration')),
              'thumbnail': info.get('image'),
-            'is_live': sources.get('live'),
+            'is_live': info.get('is_live') is True,
          }
diff --git a/youtube_dl/extractor/tvanouvelles.py b/youtube_dl/extractor/tvanouvelles.py

new file mode 100644 (file)

index 0000000..1086176
--- /dev/null
+++ b/youtube_dl/extractor/tvanouvelles.py
@@ -0,0 +1,65 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from .brightcove import BrightcoveNewIE
+
+
+class TVANouvellesIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?tvanouvelles\.ca/videos/(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.tvanouvelles.ca/videos/5117035533001',
+        'info_dict': {
+            'id': '5117035533001',
+            'ext': 'mp4',
+            'title': 'L’industrie du taxi dénonce l’entente entre Québec et Uber: explications',
+            'description': 'md5:479653b7c8cf115747bf5118066bd8b3',
+            'uploader_id': '1741764581',
+            'timestamp': 1473352030,
+            'upload_date': '20160908',
+        },
+        'add_ie': ['BrightcoveNew'],
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1741764581/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        brightcove_id = self._match_id(url)
+        return self.url_result(
+            self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
+            BrightcoveNewIE.ie_key(), brightcove_id)
+
+
+class TVANouvellesArticleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?tvanouvelles\.ca/(?:[^/]+/)+(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'http://www.tvanouvelles.ca/2016/11/17/des-policiers-qui-ont-la-meche-un-peu-courte',
+        'info_dict': {
+            'id': 'des-policiers-qui-ont-la-meche-un-peu-courte',
+            'title': 'Des policiers qui ont «la mèche un peu courte»?',
+            'description': 'md5:92d363c8eb0f0f030de9a4a84a90a3a0',
+        },
+        'playlist_mincount': 4,
+    }
+
+    @classmethod
+    def suitable(cls, url):
+        return False if TVANouvellesIE.suitable(url) else super(TVANouvellesArticleIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        entries = [
+            self.url_result(
+                'http://www.tvanouvelles.ca/videos/%s' % mobj.group('id'),
+                ie=TVANouvellesIE.ie_key(), video_id=mobj.group('id'))
+            for mobj in re.finditer(
+                r'data-video-id=(["\'])?(?P<id>\d+)', webpage)]
+
+        title = self._og_search_title(webpage, fatal=False)
+        description = self._og_search_description(webpage)
+
+        return self.playlist_result(entries, display_id, title, description)
diff --git a/youtube_dl/extractor/tvigle.py b/youtube_dl/extractor/tvigle.py

index dc3a8334a6b335143dff417d805a26df412d8783..f3817ab288473a01e899f335821c241fe43d0e91 100644 (file)
--- a/youtube_dl/extractor/tvigle.py
+++ b/youtube_dl/extractor/tvigle.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -58,7 +58,9 @@ class TvigleIE(InfoExtractor):
          if not video_id:
              webpage = self._download_webpage(url, display_id)
              video_id = self._html_search_regex(
-                r'class="video-preview current_playing" id="(\d+)">',
+                (r'<div[^>]+class=["\']player["\'][^>]+id=["\'](\d+)',
+                 r'var\s+cloudId\s*=\s*["\'](\d+)',
+                 r'class="video-preview current_playing" id="(\d+)"'),
                  webpage, 'video id')
  
          video_data = self._download_json(
@@ -81,10 +83,10 @@ class TvigleIE(InfoExtractor):
  
          formats = []
          for vcodec, fmts in item['videos'].items():
+            if vcodec == 'hls':
+                continue
              for format_id, video_url in fmts.items():
                  if format_id == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        video_url, video_id, 'mp4', m3u8_id=vcodec))
                      continue
                  height = self._search_regex(
                      r'^(\d+)[pP]$', format_id, 'height', default=None)
diff --git a/youtube_dl/extractor/tvland.py b/youtube_dl/extractor/tvland.py

index b73279dec8f433163e4d5d272e25e794c2fa74a8..957cf1ea2666ace07087ffd7d9e94810e87fe1e8 100644 (file)
--- a/youtube_dl/extractor/tvland.py
+++ b/youtube_dl/extractor/tvland.py
@@ -6,59 +6,29 @@ from .mtv import MTVServicesInfoExtractor
  
  class TVLandIE(MTVServicesInfoExtractor):
      IE_NAME = 'tvland.com'
-    _VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|episodes)/(?P<id>[^/?#.]+)'
+    _VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)'
      _FEED_URL = 'http://www.tvland.com/feeds/mrss/'
      _TESTS = [{
+        # Geo-restricted. Without a proxy metadata are still there. With a
+        # proxy it redirects to http://m.tvland.com/app/
          'url': 'http://www.tvland.com/episodes/hqhps2/everybody-loves-raymond-the-invasion-ep-048',
-        'playlist': [
-            {
-                'md5': '227e9723b9669c05bf51098b10287aa7',
-                'info_dict': {
-                    'id': 'bcbd3a83-3aca-4dca-809b-f78a87dcccdd',
-                    'ext': 'mp4',
-                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 1 of 5',
-                }
-            },
-            {
-                'md5': '9fa2b764ec0e8194fb3ebb01a83df88b',
-                'info_dict': {
-                    'id': 'f4279548-6e13-40dd-92e8-860d27289197',
-                    'ext': 'mp4',
-                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 2 of 5',
-                }
-            },
-            {
-                'md5': 'fde4c3bccd7cc7e3576b338734153cec',
-                'info_dict': {
-                    'id': '664e4a38-53ef-4115-9bc9-d0f789ec6334',
-                    'ext': 'mp4',
-                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 3 of 5',
-                }
-            },
-            {
-                'md5': '247f6780cda6891f2e49b8ae2b10e017',
-                'info_dict': {
-                    'id': '9146ecf5-b15a-4d78-879c-6679b77f4960',
-                    'ext': 'mp4',
-                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 4 of 5',
-                }
-            },
-            {
-                'md5': 'fd269f33256e47bad5eb6c40de089ff6',
-                'info_dict': {
-                    'id': '04334a2e-9a47-4214-a8c2-ae5792e2fab7',
-                    'ext': 'mp4',
-                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 5 of 5',
-                }
-            }
-        ],
+        'info_dict': {
+            'description': 'md5:80973e81b916a324e05c14a3fb506d29',
+            'title': 'The Invasion',
+        },
+        'playlist': [],
      }, {
          'url': 'http://www.tvland.com/video-clips/zea2ev/younger-younger--hilary-duff---little-lies',
          'md5': 'e2c6389401cf485df26c79c247b08713',
          'info_dict': {
              'id': 'b8697515-4bbe-4e01-83d5-fa705ce5fa88',
              'ext': 'mp4',
-            'title': 'Younger|Younger: Hilary Duff - Little Lies',
-            'description': 'md5:7d192f56ca8d958645c83f0de8ef0269'
+            'title': 'Younger|December 28, 2015|2|NO-EPISODE#|Younger: Hilary Duff - Little Lies',
+            'description': 'md5:7d192f56ca8d958645c83f0de8ef0269',
+            'upload_date': '20151228',
+            'timestamp': 1451289600,
          },
+    }, {
+        'url': 'http://www.tvland.com/full-episodes/iu0hz6/younger-a-kiss-is-just-a-kiss-season-3-ep-301',
+        'only_matching': True,
      }]
diff --git a/youtube_dl/extractor/tvnoe.py b/youtube_dl/extractor/tvnoe.py

new file mode 100644 (file)

index 0000000..6d5c748
--- /dev/null
+++ b/youtube_dl/extractor/tvnoe.py
@@ -0,0 +1,49 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .jwplatform import JWPlatformBaseIE
+from ..utils import (
+    clean_html,
+    get_element_by_class,
+    js_to_json,
+)
+
+
+class TVNoeIE(JWPlatformBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.tvnoe.cz/video/10362',
+        'md5': 'aee983f279aab96ec45ab6e2abb3c2ca',
+        'info_dict': {
+            'id': '10362',
+            'ext': 'mp4',
+            'series': 'Noční univerzita',
+            'title': 'prof. Tomáš Halík, Th.D. - Návrat náboženství a střet civilizací',
+            'description': 'md5:f337bae384e1a531a52c55ebc50fff41',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        iframe_url = self._search_regex(
+            r'<iframe[^>]+src="([^"]+)"', webpage, 'iframe URL')
+
+        ifs_page = self._download_webpage(iframe_url, video_id)
+        jwplayer_data = self._parse_json(
+            self._find_jwplayer_data(ifs_page),
+            video_id, transform_source=js_to_json)
+        info_dict = self._parse_jwplayer_data(
+            jwplayer_data, video_id, require_title=False, base_url=iframe_url)
+
+        info_dict.update({
+            'id': video_id,
+            'title': clean_html(get_element_by_class(
+                'field-name-field-podnazev', webpage)),
+            'description': clean_html(get_element_by_class(
+                'field-name-body', webpage)),
+            'series': clean_html(get_element_by_class('title', webpage))
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/tvp.py b/youtube_dl/extractor/tvp.py

index f57d609d43eecb13f3bb43ecc042107b5cad50bd..06ea2b40a759158baa2c561498e5155011f418ec 100644 (file)
--- a/youtube_dl/extractor/tvp.py
+++ b/youtube_dl/extractor/tvp.py
@@ -1,47 +1,101 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    clean_html,
+    get_element_by_attribute,
+    ExtractorError,
+)
  
  
-class TvpIE(InfoExtractor):
-    IE_NAME = 'tvp.pl'
-    _VALID_URL = r'https?://(?:vod|www)\.tvp\.pl/.*/(?P<id>\d+)$'
+class TVPIE(InfoExtractor):
+    IE_NAME = 'tvp'
+    IE_DESC = 'Telewizja Polska'
+    _VALID_URL = r'https?://[^/]+\.tvp\.(?:pl|info)/(?:(?!\d+/)[^/]+/)*(?P<id>\d+)'
  
      _TESTS = [{
-        'url': 'http://vod.tvp.pl/filmy-fabularne/filmy-za-darmo/ogniem-i-mieczem/wideo/odc-2/4278035',
-        'md5': 'cdd98303338b8a7f7abab5cd14092bf2',
-        'info_dict': {
-            'id': '4278035',
-            'ext': 'wmv',
-            'title': 'Ogniem i mieczem, odc. 2',
-        },
-    }, {
-        'url': 'http://vod.tvp.pl/seriale/obyczajowe/czas-honoru/sezon-1-1-13/i-seria-odc-13/194536',
+        'url': 'http://vod.tvp.pl/194536/i-seria-odc-13',
          'md5': '8aa518c15e5cc32dfe8db400dc921fbb',
          'info_dict': {
              'id': '194536',
              'ext': 'mp4',
              'title': 'Czas honoru, I seria – odc. 13',
+            'description': 'md5:76649d2014f65c99477be17f23a4dead',
          },
      }, {
          'url': 'http://www.tvp.pl/there-can-be-anything-so-i-shortened-it/17916176',
-        'md5': 'c3b15ed1af288131115ff17a17c19dda',
+        'md5': 'b0005b542e5b4de643a9690326ab1257',
          'info_dict': {
              'id': '17916176',
              'ext': 'mp4',
              'title': 'TVP Gorzów pokaże filmy studentów z podroży dookoła świata',
+            'description': 'TVP Gorzów pokaże filmy studentów z podroży dookoła świata',
+        },
+    }, {
+        # page id is not the same as video id(#7799)
+        'url': 'http://vod.tvp.pl/22704887/08122015-1500',
+        'md5': 'cf6a4705dfd1489aef8deb168d6ba742',
+        'info_dict': {
+            'id': '22680786',
+            'ext': 'mp4',
+            'title': 'Wiadomości, 08.12.2015, 15:00',
          },
      }, {
          'url': 'http://vod.tvp.pl/seriale/obyczajowe/na-sygnale/sezon-2-27-/odc-39/17834272',
-        'md5': 'c3b15ed1af288131115ff17a17c19dda',
+        'only_matching': True,
+    }, {
+        'url': 'http://wiadomosci.tvp.pl/25169746/24052016-1200',
+        'only_matching': True,
+    }, {
+        'url': 'http://krakow.tvp.pl/25511623/25lecie-mck-wyjatkowe-miejsce-na-mapie-krakowa',
+        'only_matching': True,
+    }, {
+        'url': 'http://teleexpress.tvp.pl/25522307/wierni-wzieli-udzial-w-procesjach',
+        'only_matching': True,
+    }, {
+        'url': 'http://sport.tvp.pl/25522165/krychowiak-uspokaja-w-sprawie-kontuzji-dwa-tygodnie-to-maksimum',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.tvp.info/25511919/trwa-rewolucja-wladza-zdecydowala-sie-na-pogwalcenie-konstytucji',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        page_id = self._match_id(url)
+        webpage = self._download_webpage(url, page_id)
+        video_id = self._search_regex([
+            r'<iframe[^>]+src="[^"]*?object_id=(\d+)',
+            r"object_id\s*:\s*'(\d+)'",
+            r'data-video-id="(\d+)"'], webpage, 'video id', default=page_id)
+        return {
+            '_type': 'url_transparent',
+            'url': 'tvp:' + video_id,
+            'description': self._og_search_description(webpage, default=None),
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'ie_key': 'TVPEmbed',
+        }
+
+
+class TVPEmbedIE(InfoExtractor):
+    IE_NAME = 'tvp:embed'
+    IE_DESC = 'Telewizja Polska'
+    _VALID_URL = r'(?:tvp:|https?://[^/]+\.tvp\.(?:pl|info)/sess/tvplayer\.php\?.*?object_id=)(?P<id>\d+)'
+
+    _TESTS = [{
+        'url': 'http://www.tvp.pl/sess/tvplayer.php?object_id=22670268',
+        'md5': '8c9cd59d16edabf39331f93bf8a766c7',
          'info_dict': {
-            'id': '17834272',
+            'id': '22670268',
              'ext': 'mp4',
-            'title': 'Na sygnale, odc. 39',
+            'title': 'Panorama, 07.12.2015, 15:40',
          },
+    }, {
+        'url': 'tvp:22670268',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -50,6 +104,11 @@ class TvpIE(InfoExtractor):
          webpage = self._download_webpage(
              'http://www.tvp.pl/sess/tvplayer.php?object_id=%s' % video_id, video_id)
  
+        error_massage = get_element_by_attribute('class', 'msg error', webpage)
+        if error_massage:
+            raise ExtractorError('%s said: %s' % (
+                self.IE_NAME, clean_html(error_massage)), expected=True)
+
          title = self._search_regex(
              r'name\s*:\s*([\'"])Title\1\s*,\s*value\s*:\s*\1(?P<title>.+?)\1',
              webpage, 'title', group='title')
@@ -63,24 +122,53 @@ class TvpIE(InfoExtractor):
              r"poster\s*:\s*'([^']+)'", webpage, 'thumbnail', default=None)
  
          video_url = self._search_regex(
-            r'0:{src:([\'"])(?P<url>.*?)\1', webpage, 'formats', group='url', default=None)
-        if not video_url:
+            r'0:{src:([\'"])(?P<url>.*?)\1', webpage,
+            'formats', group='url', default=None)
+        if not video_url or 'material_niedostepny.mp4' in video_url:
              video_url = self._download_json(
                  'http://www.tvp.pl/pub/stat/videofileinfo?video_id=%s' % video_id,
                  video_id)['video_url']
  
-        ext = video_url.rsplit('.', 1)[-1]
-        if ext != 'ism/manifest':
-            if '/' in ext:
-                ext = 'mp4'
+        formats = []
+        video_url_base = self._search_regex(
+            r'(https?://.+?/video)(?:\.(?:ism|f4m|m3u8)|-\d+\.mp4)',
+            video_url, 'video base url', default=None)
+        if video_url_base:
+            # TODO: <Group> found instead of <AdaptationSet> in MPD manifest.
+            # It's not mentioned in MPEG-DASH standard. Figure that out.
+            # formats.extend(self._extract_mpd_formats(
+            #     video_url_base + '.ism/video.mpd',
+            #     video_id, mpd_id='dash', fatal=False))
+            formats.extend(self._extract_ism_formats(
+                video_url_base + '.ism/Manifest',
+                video_id, 'mss', fatal=False))
+            formats.extend(self._extract_f4m_formats(
+                video_url_base + '.ism/video.f4m',
+                video_id, f4m_id='hds', fatal=False))
+            m3u8_formats = self._extract_m3u8_formats(
+                video_url_base + '.ism/video.m3u8', video_id,
+                'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
+            self._sort_formats(m3u8_formats)
+            m3u8_formats = list(filter(
+                lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
+                m3u8_formats))
+            formats.extend(m3u8_formats)
+            for i, m3u8_format in enumerate(m3u8_formats, 2):
+                http_url = '%s-%d.mp4' % (video_url_base, i)
+                if self._is_valid_url(http_url, video_id):
+                    f = m3u8_format.copy()
+                    f.update({
+                        'url': http_url,
+                        'format_id': f['format_id'].replace('hls', 'http'),
+                        'protocol': 'http',
+                    })
+                    formats.append(f)
+        else:
              formats = [{
                  'format_id': 'direct',
                  'url': video_url,
-                'ext': ext,
+                'ext': determine_ext(video_url, 'mp4'),
              }]
-        else:
-            m3u8_url = re.sub('([^/]*)\.ism/manifest', r'\1.ism/\1.m3u8', video_url)
-            formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
  
          self._sort_formats(formats)
  
@@ -92,8 +180,8 @@ class TvpIE(InfoExtractor):
          }
  
  
-class TvpSeriesIE(InfoExtractor):
-    IE_NAME = 'tvp.pl:Series'
+class TVPSeriesIE(InfoExtractor):
+    IE_NAME = 'tvp:series'
      _VALID_URL = r'https?://vod\.tvp\.pl/(?:[^/]+/){2}(?P<id>[^/]+)/?$'
  
      _TESTS = [{
@@ -127,7 +215,7 @@ class TvpSeriesIE(InfoExtractor):
          videos_paths = re.findall(
              '(?s)class="shortTitle">.*?href="(/[^"]+)', playlist)
          entries = [
-            self.url_result('http://vod.tvp.pl%s' % v_path, ie=TvpIE.ie_key())
+            self.url_result('http://vod.tvp.pl%s' % v_path, ie=TVPIE.ie_key())
              for v_path in videos_paths]
  
          return {
diff --git a/youtube_dl/extractor/tvplay.py b/youtube_dl/extractor/tvplay.py

index df70a6b230a4217261f3f69a3e1213a88f07afbf..3eda0a399cf602d8e717c75e392a18e9dac82ec9 100644 (file)
--- a/youtube_dl/extractor/tvplay.py
+++ b/youtube_dl/extractor/tvplay.py
@@ -4,47 +4,58 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import (
+    compat_HTTPError,
+    compat_str,
+    compat_urlparse,
+)
  from ..utils import (
+    determine_ext,
+    ExtractorError,
+    int_or_none,
      parse_iso8601,
      qualities,
+    try_get,
+    update_url_query,
  )
  
  
  class TVPlayIE(InfoExtractor):
-    IE_DESC = 'TV3Play and related services'
-    _VALID_URL = r'''(?x)https?://(?:www\.)?
-        (?:tvplay\.lv/parraides|
-           tv3play\.lt/programos|
-           play\.tv3\.lt/programos|
-           tv3play\.ee/sisu|
-           tv3play\.se/program|
-           tv6play\.se/program|
-           tv8play\.se/program|
-           tv10play\.se/program|
-           tv3play\.no/programmer|
-           viasat4play\.no/programmer|
-           tv6play\.no/programmer|
-           tv3play\.dk/programmer|
-           play\.novatv\.bg/programi
-        )/[^/]+/(?P<id>\d+)
-        '''
+    IE_NAME = 'mtg'
+    IE_DESC = 'MTG services'
+    _VALID_URL = r'''(?x)
+                    (?:
+                        mtg:|
+                        https?://
+                            (?:www\.)?
+                            (?:
+                                tvplay(?:\.skaties)?\.lv/parraides|
+                                (?:tv3play|play\.tv3)\.lt/programos|
+                                tv3play(?:\.tv3)?\.ee/sisu|
+                                (?:tv(?:3|6|8|10)play|viafree)\.se/program|
+                                (?:(?:tv3play|viasat4play|tv6play|viafree)\.no|(?:tv3play|viafree)\.dk)/programmer|
+                                play\.novatv\.bg/programi
+                            )
+                            /(?:[^/]+/)+
+                        )
+                        (?P<id>\d+)
+                    '''
      _TESTS = [
          {
              'url': 'http://www.tvplay.lv/parraides/vinas-melo-labak/418113?autostart=true',
+            'md5': 'a1612fe0849455423ad8718fe049be21',
              'info_dict': {
                  'id': '418113',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Kādi ir īri? - Viņas melo labāk',
                  'description': 'Baiba apsmej īrus, kādi tie ir un ko viņi dara.',
+                'series': 'Viņas melo labāk',
+                'season': '2.sezona',
+                'season_number': 2,
                  'duration': 25,
                  'timestamp': 1406097056,
                  'upload_date': '20140723',
              },
-            'params': {
-                # rtmp download
-                'skip_download': True,
-            },
          },
          {
              'url': 'http://play.tv3.lt/programos/moterys-meluoja-geriau/409229?autostart=true',
@@ -53,6 +64,10 @@ class TVPlayIE(InfoExtractor):
                  'ext': 'flv',
                  'title': 'Moterys meluoja geriau',
                  'description': 'md5:9aec0fc68e2cbc992d2a140bd41fa89e',
+                'series': 'Moterys meluoja geriau',
+                'episode_number': 47,
+                'season': '1 sezonas',
+                'season_number': 1,
                  'duration': 1330,
                  'timestamp': 1403769181,
                  'upload_date': '20140626',
@@ -82,7 +97,7 @@ class TVPlayIE(InfoExtractor):
              'url': 'http://www.tv3play.se/program/husraddarna/395385?autostart=true',
              'info_dict': {
                  'id': '395385',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Husräddarna S02E07',
                  'description': 'md5:f210c6c89f42d4fc39faa551be813777',
                  'duration': 2574,
@@ -90,7 +105,6 @@ class TVPlayIE(InfoExtractor):
                  'upload_date': '20140520',
              },
              'params': {
-                # rtmp download
                  'skip_download': True,
              },
          },
@@ -98,7 +112,7 @@ class TVPlayIE(InfoExtractor):
              'url': 'http://www.tv6play.se/program/den-sista-dokusapan/266636?autostart=true',
              'info_dict': {
                  'id': '266636',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Den sista dokusåpan S01E08',
                  'description': 'md5:295be39c872520221b933830f660b110',
                  'duration': 1492,
@@ -107,7 +121,6 @@ class TVPlayIE(InfoExtractor):
                  'age_limit': 18,
              },
              'params': {
-                # rtmp download
                  'skip_download': True,
              },
          },
@@ -115,7 +128,7 @@ class TVPlayIE(InfoExtractor):
              'url': 'http://www.tv8play.se/program/antikjakten/282756?autostart=true',
              'info_dict': {
                  'id': '282756',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Antikjakten S01E10',
                  'description': 'md5:1b201169beabd97e20c5ad0ad67b13b8',
                  'duration': 2646,
@@ -123,7 +136,6 @@ class TVPlayIE(InfoExtractor):
                  'upload_date': '20120925',
              },
              'params': {
-                # rtmp download
                  'skip_download': True,
              },
          },
@@ -131,7 +143,7 @@ class TVPlayIE(InfoExtractor):
              'url': 'http://www.tv3play.no/programmer/anna-anka-soker-assistent/230898?autostart=true',
              'info_dict': {
                  'id': '230898',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Anna Anka søker assistent - Ep. 8',
                  'description': 'md5:f80916bf5bbe1c5f760d127f8dd71474',
                  'duration': 2656,
@@ -139,7 +151,6 @@ class TVPlayIE(InfoExtractor):
                  'upload_date': '20100628',
              },
              'params': {
-                # rtmp download
                  'skip_download': True,
              },
          },
@@ -147,7 +158,7 @@ class TVPlayIE(InfoExtractor):
              'url': 'http://www.viasat4play.no/programmer/budbringerne/21873?autostart=true',
              'info_dict': {
                  'id': '21873',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Budbringerne program 10',
                  'description': 'md5:4db78dc4ec8a85bb04fd322a3ee5092d',
                  'duration': 1297,
@@ -155,7 +166,6 @@ class TVPlayIE(InfoExtractor):
                  'upload_date': '20090929',
              },
              'params': {
-                # rtmp download
                  'skip_download': True,
              },
          },
@@ -163,7 +173,7 @@ class TVPlayIE(InfoExtractor):
              'url': 'http://www.tv6play.no/programmer/hotelinspektor-alex-polizzi/361883?autostart=true',
              'info_dict': {
                  'id': '361883',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Hotelinspektør Alex Polizzi - Ep. 10',
                  'description': 'md5:3ecf808db9ec96c862c8ecb3a7fdaf81',
                  'duration': 2594,
@@ -171,7 +181,6 @@ class TVPlayIE(InfoExtractor):
                  'upload_date': '20140224',
              },
              'params': {
-                # rtmp download
                  'skip_download': True,
              },
          },
@@ -191,59 +200,226 @@ class TVPlayIE(InfoExtractor):
                  'skip_download': True,
              },
          },
+        {
+            'url': 'http://tvplay.skaties.lv/parraides/vinas-melo-labak/418113?autostart=true',
+            'only_matching': True,
+        },
+        {
+            # views is null
+            'url': 'http://tvplay.skaties.lv/parraides/tv3-zinas/760183',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://tv3play.tv3.ee/sisu/kodu-keset-linna/238551?autostart=true',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://www.viafree.se/program/underhallning/i-like-radio-live/sasong-1/676869',
+            'only_matching': True,
+        },
+        {
+            'url': 'mtg:418113',
+            'only_matching': True,
+        }
      ]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          video = self._download_json(
-            'http://playapi.mtgx.tv/v1/videos/%s' % video_id, video_id, 'Downloading video JSON')
+            'http://playapi.mtgx.tv/v3/videos/%s' % video_id, video_id, 'Downloading video JSON')
  
-        if video['is_geo_blocked']:
-            self.report_warning(
-                'This content might not be available in your country due to copyright reasons')
+        title = video['title']
  
-        streams = self._download_json(
-            'http://playapi.mtgx.tv/v1/videos/stream/%s' % video_id, video_id, 'Downloading streams JSON')
+        try:
+            streams = self._download_json(
+                'http://playapi.mtgx.tv/v3/videos/stream/%s' % video_id,
+                video_id, 'Downloading streams JSON')
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                msg = self._parse_json(e.cause.read().decode('utf-8'), video_id)
+                raise ExtractorError(msg['msg'], expected=True)
+            raise
  
          quality = qualities(['hls', 'medium', 'high'])
          formats = []
-        for format_id, video_url in streams['streams'].items():
+        for format_id, video_url in streams.get('streams', {}).items():
              if not video_url or not isinstance(video_url, compat_str):
                  continue
-            fmt = {
-                'format_id': format_id,
-                'preference': quality(format_id),
-            }
-            if video_url.startswith('rtmp'):
-                m = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
-                if not m:
-                    continue
-                fmt.update({
-                    'ext': 'flv',
-                    'url': m.group('url'),
-                    'app': m.group('app'),
-                    'play_path': m.group('playpath'),
-                })
-            elif video_url.endswith('.f4m'):
+            ext = determine_ext(video_url)
+            if ext == 'f4m':
                  formats.extend(self._extract_f4m_formats(
-                    video_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81', video_id))
-                continue
+                    update_url_query(video_url, {
+                        'hdcore': '3.5.0',
+                        'plugin': 'aasp-3.5.0.151.81'
+                    }), video_id, f4m_id='hds', fatal=False))
+            elif ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
              else:
-                fmt.update({
-                    'url': video_url,
-                })
-            formats.append(fmt)
+                fmt = {
+                    'format_id': format_id,
+                    'quality': quality(format_id),
+                    'ext': ext,
+                }
+                if video_url.startswith('rtmp'):
+                    m = re.search(
+                        r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
+                    if not m:
+                        continue
+                    fmt.update({
+                        'ext': 'flv',
+                        'url': m.group('url'),
+                        'app': m.group('app'),
+                        'play_path': m.group('playpath'),
+                    })
+                else:
+                    fmt.update({
+                        'url': video_url,
+                    })
+                formats.append(fmt)
+
+        if not formats and video.get('is_geo_blocked'):
+            self.raise_geo_restricted(
+                'This content might not be available in your country due to copyright reasons')
  
          self._sort_formats(formats)
  
+        # TODO: webvtt in m3u8
+        subtitles = {}
+        sami_path = video.get('sami_path')
+        if sami_path:
+            lang = self._search_regex(
+                r'_([a-z]{2})\.xml', sami_path, 'lang',
+                default=compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1])
+            subtitles[lang] = [{
+                'url': sami_path,
+            }]
+
+        series = video.get('format_title')
+        episode_number = int_or_none(video.get('format_position', {}).get('episode'))
+        season = video.get('_embedded', {}).get('season', {}).get('title')
+        season_number = int_or_none(video.get('format_position', {}).get('season'))
+
          return {
              'id': video_id,
-            'title': video['title'],
-            'description': video['description'],
-            'duration': video['duration'],
-            'timestamp': parse_iso8601(video['created_at']),
-            'view_count': video['views']['total'],
-            'age_limit': video.get('age_limit', 0),
+            'title': title,
+            'description': video.get('description'),
+            'series': series,
+            'episode_number': episode_number,
+            'season': season,
+            'season_number': season_number,
+            'duration': int_or_none(video.get('duration')),
+            'timestamp': parse_iso8601(video.get('created_at')),
+            'view_count': try_get(video, lambda x: x['views']['total'], int),
+            'age_limit': int_or_none(video.get('age_limit', 0)),
              'formats': formats,
+            'subtitles': subtitles,
          }
+
+
+class ViafreeIE(InfoExtractor):
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        viafree\.
+                        (?:
+                            (?:dk|no)/programmer|
+                            se/program
+                        )
+                        /(?:[^/]+/)+(?P<id>[^/?#&]+)
+                    '''
+    _TESTS = [{
+        'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
+        'info_dict': {
+            'id': '395375',
+            'ext': 'mp4',
+            'title': 'Husräddarna S02E02',
+            'description': 'md5:4db5c933e37db629b5a2f75dfb34829e',
+            'series': 'Husräddarna',
+            'season': 'Säsong 2',
+            'season_number': 2,
+            'duration': 2576,
+            'timestamp': 1400596321,
+            'upload_date': '20140520',
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': [TVPlayIE.ie_key()],
+    }, {
+        # with relatedClips
+        'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1',
+        'info_dict': {
+            'id': '758770',
+            'ext': 'mp4',
+            'title': 'Sommaren med YouTube-stjärnorna S01E01',
+            'description': 'md5:2bc69dce2c4bb48391e858539bbb0e3f',
+            'series': 'Sommaren med YouTube-stjärnorna',
+            'season': 'Säsong 1',
+            'season_number': 1,
+            'duration': 1326,
+            'timestamp': 1470905572,
+            'upload_date': '20160811',
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': [TVPlayIE.ie_key()],
+    }, {
+        # Different og:image URL schema
+        'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        data = self._parse_json(
+            self._search_regex(
+                r'(?s)window\.App\s*=\s*({.+?})\s*;\s*</script',
+                webpage, 'data', default='{}'),
+            video_id, transform_source=lambda x: re.sub(
+                r'(?s)function\s+[a-zA-Z_][\da-zA-Z_]*\s*\([^)]*\)\s*{[^}]*}\s*',
+                'null', x), fatal=False)
+
+        video_id = None
+
+        if data:
+            video_id = try_get(
+                data, lambda x: x['context']['dispatcher']['stores'][
+                    'ContentPageProgramStore']['currentVideo']['id'],
+                compat_str)
+
+        # Fallback #1 (extract from og:image URL schema)
+        if not video_id:
+            thumbnail = self._og_search_thumbnail(webpage, default=None)
+            if thumbnail:
+                video_id = self._search_regex(
+                    # Patterns seen:
+                    #  http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/inbox/765166/a2e95e5f1d735bab9f309fa345cc3f25.jpg
+                    #  http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/seasons/15204/758770/4a5ba509ca8bc043e1ebd1a76131cdf2.jpg
+                    r'https?://[^/]+/imagecache/(?:[^/]+/)+(\d{6,})/',
+                    thumbnail, 'video id', default=None)
+
+        # Fallback #2. Extract from raw JSON string.
+        # May extract wrong video id if relatedClips is present.
+        if not video_id:
+            video_id = self._search_regex(
+                r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](\d{6,})',
+                webpage, 'video id')
+
+        return self.url_result('mtg:%s' % video_id, TVPlayIE.ie_key())
diff --git a/youtube_dl/extractor/tweakers.py b/youtube_dl/extractor/tweakers.py

index f3198fb85adb29b8081b9735899dd574cb504c67..7a9386cde3d9e0e5d78bfd368d47819430c53e85 100644 (file)
--- a/youtube_dl/extractor/tweakers.py
+++ b/youtube_dl/extractor/tweakers.py
@@ -1,25 +1,62 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    determine_ext,
+    mimetype2ext,
+)
  
  
  class TweakersIE(InfoExtractor):
      _VALID_URL = r'https?://tweakers\.net/video/(?P<id>\d+)'
      _TEST = {
          'url': 'https://tweakers.net/video/9926/new-nintendo-3ds-xl-op-alle-fronten-beter.html',
-        'md5': '3147e4ddad366f97476a93863e4557c8',
+        'md5': 'fe73e417c093a788e0160c4025f88b15',
          'info_dict': {
              'id': '9926',
              'ext': 'mp4',
              'title': 'New Nintendo 3DS XL - Op alle fronten beter',
-            'description': 'md5:f97324cc71e86e11c853f0763820e3ba',
+            'description': 'md5:3789b21fed9c0219e9bcaacd43fab280',
              'thumbnail': 're:^https?://.*\.jpe?g$',
              'duration': 386,
+            'uploader_id': 's7JeEm',
          }
      }
  
      def _real_extract(self, url):
-        playlist_id = self._match_id(url)
-        entries = self._extract_xspf_playlist(
-            'https://tweakers.net/video/s1playlist/%s/playlist.xspf' % playlist_id, playlist_id)
-        return self.playlist_result(entries, playlist_id)
+        video_id = self._match_id(url)
+        video_data = self._download_json(
+            'https://tweakers.net/video/s1playlist/%s/1920/1080/playlist.json' % video_id,
+            video_id)['items'][0]
+
+        title = video_data['title']
+
+        formats = []
+        for location in video_data.get('locations', {}).get('progressive', []):
+            format_id = location.get('label')
+            width = int_or_none(location.get('width'))
+            height = int_or_none(location.get('height'))
+            for source in location.get('sources', []):
+                source_url = source.get('src')
+                if not source_url:
+                    continue
+                ext = mimetype2ext(source.get('type')) or determine_ext(source_url)
+                formats.append({
+                    'format_id': format_id,
+                    'url': source_url,
+                    'width': width,
+                    'height': height,
+                    'ext': ext,
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'thumbnail': video_data.get('poster'),
+            'duration': int_or_none(video_data.get('duration')),
+            'uploader_id': video_data.get('account'),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/twentyfourvideo.py b/youtube_dl/extractor/twentyfourvideo.py

index e03e2dbaa42f23a5107a50c67e7c12d9f378600b..af92b713b08e22343f84a282d3db59b355623f04 100644 (file)
--- a/youtube_dl/extractor/twentyfourvideo.py
+++ b/youtube_dl/extractor/twentyfourvideo.py
@@ -12,32 +12,32 @@ from ..utils import (
  
  class TwentyFourVideoIE(InfoExtractor):
      IE_NAME = '24video'
-    _VALID_URL = r'https?://(?:www\.)?24video\.net/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
  
-    _TESTS = [
-        {
-            'url': 'http://www.24video.net/video/view/1044982',
-            'md5': 'e09fc0901d9eaeedac872f154931deeb',
-            'info_dict': {
-                'id': '1044982',
-                'ext': 'mp4',
-                'title': 'Эротика каменного века',
-                'description': 'Как смотрели порно в каменном веке.',
-                'thumbnail': 're:^https?://.*\.jpg$',
-                'uploader': 'SUPERTELO',
-                'duration': 31,
-                'timestamp': 1275937857,
-                'upload_date': '20100607',
-                'age_limit': 18,
-                'like_count': int,
-                'dislike_count': int,
-            },
+    _TESTS = [{
+        'url': 'http://www.24video.net/video/view/1044982',
+        'md5': 'e09fc0901d9eaeedac872f154931deeb',
+        'info_dict': {
+            'id': '1044982',
+            'ext': 'mp4',
+            'title': 'Эротика каменного века',
+            'description': 'Как смотрели порно в каменном веке.',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'SUPERTELO',
+            'duration': 31,
+            'timestamp': 1275937857,
+            'upload_date': '20100607',
+            'age_limit': 18,
+            'like_count': int,
+            'dislike_count': int,
          },
-        {
-            'url': 'http://www.24video.net/player/new24_play.swf?id=1044982',
-            'only_matching': True,
-        }
-    ]
+    }, {
+        'url': 'http://www.24video.net/player/new24_play.swf?id=1044982',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.24video.me/video/view/1044982',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -47,7 +47,8 @@ class TwentyFourVideoIE(InfoExtractor):
  
          title = self._og_search_title(webpage)
          description = self._html_search_regex(
-            r'<span itemprop="description">([^<]+)</span>', webpage, 'description', fatal=False)
+            r'<(p|span)[^>]+itemprop="description"[^>]*>(?P<description>[^<]+)</\1>',
+            webpage, 'description', fatal=False, group='description')
          thumbnail = self._og_search_thumbnail(webpage)
          duration = int_or_none(self._og_search_property(
              'duration', webpage, 'duration', fatal=False))
@@ -63,7 +64,7 @@ class TwentyFourVideoIE(InfoExtractor):
              r'<span class="video-views">(\d+) просмотр',
              webpage, 'view count', fatal=False))
          comment_count = int_or_none(self._html_search_regex(
-            r'<div class="comments-title" id="comments-count">(\d+) комментари',
+            r'<a[^>]+href="#tab-comments"[^>]*>(\d+) комментари',
              webpage, 'comment count', fatal=False))
  
          # Sets some cookies
diff --git a/youtube_dl/extractor/twentymin.py b/youtube_dl/extractor/twentymin.py

index ca7d953b8e2733d404ebe9f7cd90e7e28d083abe..b721ecb0a106a710b6d140d7d21309307196a684 100644 (file)
--- a/youtube_dl/extractor/twentymin.py
+++ b/youtube_dl/extractor/twentymin.py
@@ -32,7 +32,22 @@ class TwentyMinutenIE(InfoExtractor):
              'title': '«Wir müssen mutig nach vorne schauen»',
              'description': 'Kein Land sei innovativer als die Schweiz, sagte Johann Schneider-Ammann in seiner Neujahrsansprache. Das Land müsse aber seine Hausaufgaben machen.',
              'thumbnail': 'http://www.20min.ch/images/content/2/2/0/22050469/10/teaserbreit.jpg'
-        }
+        },
+        'skip': '"This video is no longer available" is shown both on the web page and in the downloaded file.',
+    }, {
+        # YouTube embed
+        'url': 'http://www.20min.ch/ro/sports/football/story/Il-marque-une-bicyclette-de-plus-de-30-metres--21115184',
+        'md5': 'cec64d59aa01c0ed9dbba9cf639dd82f',
+        'info_dict': {
+            'id': 'ivM7A7SpDOs',
+            'ext': 'mp4',
+            'title': 'GOLAZO DE CHILENA DE JAVI GÓMEZ, FINALISTA AL BALÓN DE CLM 2016',
+            'description': 'md5:903c92fbf2b2f66c09de514bc25e9f5a',
+            'upload_date': '20160424',
+            'uploader': 'RTVCM Castilla-La Mancha',
+            'uploader_id': 'RTVCM',
+        },
+        'add_ie': ['Youtube'],
      }, {
          'url': 'http://www.20min.ch/videotv/?cid=44&vid=468738',
          'only_matching': True,
@@ -48,6 +63,12 @@ class TwentyMinutenIE(InfoExtractor):
  
          webpage = self._download_webpage(url, display_id)
  
+        youtube_url = self._html_search_regex(
+            r'<iframe[^>]+src="((?:https?:)?//www\.youtube\.com/embed/[^"]+)"',
+            webpage, 'YouTube embed URL', default=None)
+        if youtube_url is not None:
+            return self.url_result(youtube_url, 'Youtube')
+
          title = self._html_search_regex(
              r'<h1>.*?<span>(.+?)</span></h1>',
              webpage, 'title', default=None)
diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py

index 36ee1adff2288570fc39936640bacd3abafe9ed2..77414a242d68f2309985235f0418d47c77194417 100644 (file)
--- a/youtube_dl/extractor/twitch.py
+++ b/youtube_dl/extractor/twitch.py
@@ -7,6 +7,7 @@ import random
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_HTTPError,
      compat_parse_qs,
      compat_str,
      compat_urllib_parse_urlencode,
@@ -14,12 +15,13 @@ from ..compat import (
      compat_urlparse,
  )
  from ..utils import (
+    clean_html,
      ExtractorError,
      int_or_none,
+    js_to_json,
      orderedSet,
      parse_duration,
      parse_iso8601,
-    sanitized_Request,
      urlencode_postdata,
  )
  
@@ -28,8 +30,9 @@ class TwitchBaseIE(InfoExtractor):
      _VALID_URL_BASE = r'https?://(?:www\.)?twitch\.tv'
  
      _API_BASE = 'https://api.twitch.tv'
-    _USHER_BASE = 'http://usher.twitch.tv'
+    _USHER_BASE = 'https://usher.ttvnw.net'
      _LOGIN_URL = 'http://www.twitch.tv/login'
+    _CLIENT_ID = 'jzkbprff40iqj646a697cyrvl0zt2m6'
      _NETRC_MACHINE = 'twitch'
  
      def _handle_error(self, response):
@@ -41,16 +44,10 @@ class TwitchBaseIE(InfoExtractor):
                  '%s returned error: %s - %s' % (self.IE_NAME, error, response.get('message')),
                  expected=True)
  
-    def _download_json(self, url, video_id, note='Downloading JSON metadata'):
-        headers = {
-            'Referer': 'http://api.twitch.tv/crossdomain/receiver.html?v=2',
-            'X-Requested-With': 'XMLHttpRequest',
-        }
-        for cookie in self._downloader.cookiejar:
-            if cookie.name == 'api_token':
-                headers['Twitch-Api-Token'] = cookie.value
-        request = sanitized_Request(url, headers=headers)
-        response = super(TwitchBaseIE, self)._download_json(request, video_id, note)
+    def _call_api(self, path, item_id, note):
+        response = self._download_json(
+            '%s/%s' % (self._API_BASE, path), item_id, note,
+            headers={'Client-ID': self._CLIENT_ID})
          self._handle_error(response)
          return response
  
@@ -62,9 +59,17 @@ class TwitchBaseIE(InfoExtractor):
          if username is None:
              return
  
+        def fail(message):
+            raise ExtractorError(
+                'Unable to login. Twitch said: %s' % message, expected=True)
+
          login_page, handle = self._download_webpage_handle(
              self._LOGIN_URL, None, 'Downloading login page')
  
+        # Some TOR nodes and public proxies are blocked completely
+        if 'blacklist_message' in login_page:
+            fail(clean_html(login_page))
+
          login_form = self._hidden_inputs(login_page)
  
          login_form.update({
@@ -81,21 +86,24 @@ class TwitchBaseIE(InfoExtractor):
          if not post_url.startswith('http'):
              post_url = compat_urlparse.urljoin(redirect_url, post_url)
  
-        request = sanitized_Request(
-            post_url, urlencode_postdata(login_form))
-        request.add_header('Referer', redirect_url)
-        response = self._download_webpage(
-            request, None, 'Logging in as %s' % username)
-
-        error_message = self._search_regex(
-            r'<div[^>]+class="subwindow_notice"[^>]*>([^<]+)</div>',
-            response, 'error message', default=None)
-        if error_message:
-            raise ExtractorError(
-                'Unable to login. Twitch said: %s' % error_message, expected=True)
+        headers = {'Referer': redirect_url}
  
-        if '>Reset your password<' in response:
-            self.report_warning('Twitch asks you to reset your password, go to https://secure.twitch.tv/reset/submit')
+        try:
+            response = self._download_json(
+                post_url, None, 'Logging in as %s' % username,
+                data=urlencode_postdata(login_form),
+                headers=headers)
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
+                response = self._parse_json(
+                    e.cause.read().decode('utf-8'), None)
+                fail(response['message'])
+            raise
+
+        if response.get('redirect'):
+            self._download_webpage(
+                response['redirect'], None, 'Downloading login redirect page',
+                headers=headers)
  
      def _prefer_source(self, formats):
          try:
@@ -108,14 +116,14 @@ class TwitchBaseIE(InfoExtractor):
  
  class TwitchItemBaseIE(TwitchBaseIE):
      def _download_info(self, item, item_id):
-        return self._extract_info(self._download_json(
-            '%s/kraken/videos/%s%s' % (self._API_BASE, item, item_id), item_id,
+        return self._extract_info(self._call_api(
+            'kraken/videos/%s%s' % (item, item_id), item_id,
              'Downloading %s info JSON' % self._ITEM_TYPE))
  
      def _extract_media(self, item_id):
          info = self._download_info(self._ITEM_SHORTCUT, item_id)
-        response = self._download_json(
-            '%s/api/videos/%s%s' % (self._API_BASE, self._ITEM_SHORTCUT, item_id), item_id,
+        response = self._call_api(
+            'api/videos/%s%s' % (self._ITEM_SHORTCUT, item_id), item_id,
              'Downloading %s playlist JSON' % self._ITEM_TYPE)
          entries = []
          chunks = response['chunks']
@@ -171,6 +179,7 @@ class TwitchVideoIE(TwitchItemBaseIE):
              'title': 'Worlds Semifinals - Star Horn Royal Club vs. OMG',
          },
          'playlist_mincount': 12,
+        'skip': 'HTTP Error 404: Not Found',
      }
  
  
@@ -187,6 +196,7 @@ class TwitchChapterIE(TwitchItemBaseIE):
              'title': 'ACRL Off Season - Sports Cars @ Nordschleife',
          },
          'playlist_mincount': 3,
+        'skip': 'HTTP Error 404: Not Found',
      }, {
          'url': 'http://www.twitch.tv/tsm_theoddone/c/2349361',
          'only_matching': True,
@@ -237,14 +247,15 @@ class TwitchVodIE(TwitchItemBaseIE):
              # m3u8 download
              'skip_download': True,
          },
+        'skip': 'HTTP Error 404: Not Found',
      }]
  
      def _real_extract(self, url):
          item_id = self._match_id(url)
  
          info = self._download_info(self._ITEM_SHORTCUT, item_id)
-        access_token = self._download_json(
-            '%s/api/vods/%s/access_token' % (self._API_BASE, item_id), item_id,
+        access_token = self._call_api(
+            'api/vods/%s/access_token' % item_id, item_id,
              'Downloading %s access token' % self._ITEM_TYPE)
  
          formats = self._extract_m3u8_formats(
@@ -258,7 +269,7 @@ class TwitchVodIE(TwitchItemBaseIE):
                      'nauth': access_token['token'],
                      'nauthsig': access_token['sig'],
                  })),
-            item_id, 'mp4')
+            item_id, 'mp4', entry_protocol='m3u8_native')
  
          self._prefer_source(formats)
          info['formats'] = formats
@@ -272,12 +283,12 @@ class TwitchVodIE(TwitchItemBaseIE):
  
  
  class TwitchPlaylistBaseIE(TwitchBaseIE):
-    _PLAYLIST_URL = '%s/kraken/channels/%%s/videos/?offset=%%d&limit=%%d' % TwitchBaseIE._API_BASE
+    _PLAYLIST_PATH = 'kraken/channels/%s/videos/?offset=%d&limit=%d'
      _PAGE_LIMIT = 100
  
      def _extract_playlist(self, channel_id):
-        info = self._download_json(
-            '%s/kraken/channels/%s' % (self._API_BASE, channel_id),
+        info = self._call_api(
+            'kraken/channels/%s' % channel_id,
              channel_id, 'Downloading channel info JSON')
          channel_name = info.get('display_name') or info.get('name')
          entries = []
@@ -286,8 +297,8 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
          broken_paging_detected = False
          counter_override = None
          for counter in itertools.count(1):
-            response = self._download_json(
-                self._PLAYLIST_URL % (channel_id, offset, limit),
+            response = self._call_api(
+                self._PLAYLIST_PATH % (channel_id, offset, limit),
                  channel_id,
                  'Downloading %s videos JSON page %s'
                  % (self._PLAYLIST_TYPE, counter_override or counter))
@@ -342,7 +353,7 @@ class TwitchProfileIE(TwitchPlaylistBaseIE):
  class TwitchPastBroadcastsIE(TwitchPlaylistBaseIE):
      IE_NAME = 'twitch:past_broadcasts'
      _VALID_URL = r'%s/(?P<id>[^/]+)/profile/past_broadcasts/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
-    _PLAYLIST_URL = TwitchPlaylistBaseIE._PLAYLIST_URL + '&broadcasts=true'
+    _PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcasts=true'
      _PLAYLIST_TYPE = 'past broadcasts'
  
      _TEST = {
@@ -355,31 +366,6 @@ class TwitchPastBroadcastsIE(TwitchPlaylistBaseIE):
      }
  
  
-class TwitchBookmarksIE(TwitchPlaylistBaseIE):
-    IE_NAME = 'twitch:bookmarks'
-    _VALID_URL = r'%s/(?P<id>[^/]+)/profile/bookmarks/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
-    _PLAYLIST_URL = '%s/api/bookmark/?user=%%s&offset=%%d&limit=%%d' % TwitchBaseIE._API_BASE
-    _PLAYLIST_TYPE = 'bookmarks'
-
-    _TEST = {
-        'url': 'http://www.twitch.tv/ognos/profile/bookmarks',
-        'info_dict': {
-            'id': 'ognos',
-            'title': 'Ognos',
-        },
-        'playlist_mincount': 3,
-    }
-
-    def _extract_playlist_page(self, response):
-        entries = []
-        for bookmark in response.get('bookmarks', []):
-            video = bookmark.get('video')
-            if not video:
-                continue
-            entries.append(video['url'])
-        return entries
-
-
  class TwitchStreamIE(TwitchBaseIE):
      IE_NAME = 'twitch:stream'
      _VALID_URL = r'%s/(?P<id>[^/#?]+)/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
@@ -411,15 +397,12 @@ class TwitchStreamIE(TwitchBaseIE):
      def _real_extract(self, url):
          channel_id = self._match_id(url)
  
-        stream = self._download_json(
-            '%s/kraken/streams/%s' % (self._API_BASE, channel_id), channel_id,
+        stream = self._call_api(
+            'kraken/streams/%s?stream_type=all' % channel_id, channel_id,
              'Downloading stream JSON').get('stream')
  
-        # Fallback on profile extraction if stream is offline
          if not stream:
-            return self.url_result(
-                'http://www.twitch.tv/%s/profile' % channel_id,
-                'TwitchProfile', channel_id)
+            raise ExtractorError('%s is offline' % channel_id, expected=True)
  
          # Channel name may be typed if different case than the original channel name
          # (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing
@@ -427,13 +410,14 @@ class TwitchStreamIE(TwitchBaseIE):
          # JSON and fallback to lowercase if it's not available.
          channel_id = stream.get('channel', {}).get('name') or channel_id.lower()
  
-        access_token = self._download_json(
-            '%s/api/channels/%s/access_token' % (self._API_BASE, channel_id), channel_id,
+        access_token = self._call_api(
+            'api/channels/%s/access_token' % channel_id, channel_id,
              'Downloading channel access token')
  
          query = {
              'allow_source': 'true',
              'allow_audio_only': 'true',
+            'allow_spectre': 'true',
              'p': random.randint(1000000, 10000000),
              'player': 'twitchweb',
              'segment_preference': '4',
@@ -477,3 +461,61 @@ class TwitchStreamIE(TwitchBaseIE):
              'formats': formats,
              'is_live': True,
          }
+
+
+class TwitchClipsIE(InfoExtractor):
+    IE_NAME = 'twitch:clips'
+    _VALID_URL = r'https?://clips\.twitch\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        'url': 'https://clips.twitch.tv/ea/AggressiveCobraPoooound',
+        'md5': '761769e1eafce0ffebfb4089cb3847cd',
+        'info_dict': {
+            'id': 'AggressiveCobraPoooound',
+            'ext': 'mp4',
+            'title': 'EA Play 2016 Live from the Novo Theatre',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'creator': 'EA',
+            'uploader': 'stereotype_',
+            'uploader_id': 'stereotype_',
+        },
+    }, {
+        # multiple formats
+        'url': 'https://clips.twitch.tv/rflegendary/UninterestedBeeDAESuppy',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        clip = self._parse_json(
+            self._search_regex(
+                r'(?s)clipInfo\s*=\s*({.+?});', webpage, 'clip info'),
+            video_id, transform_source=js_to_json)
+
+        title = clip.get('channel_title') or self._og_search_title(webpage)
+
+        formats = [{
+            'url': option['source'],
+            'format_id': option.get('quality'),
+            'height': int_or_none(option.get('quality')),
+        } for option in clip.get('quality_options', []) if option.get('source')]
+
+        if not formats:
+            formats = [{
+                'url': clip['clip_video_url'],
+            }]
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'creator': clip.get('broadcaster_display_name') or clip.get('broadcaster_login'),
+            'uploader': clip.get('curator_login'),
+            'uploader_id': clip.get('curator_display_name'),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/twitter.py b/youtube_dl/extractor/twitter.py

index 1f32ea2ebe2c917af0e645799c88b474542249bb..ac0b221b4f5ab02c33f1776389cba849bf00ae2b 100644 (file)
--- a/youtube_dl/extractor/twitter.py
+++ b/youtube_dl/extractor/twitter.py
@@ -4,7 +4,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_urlparse
  from ..utils import (
+    determine_ext,
      float_or_none,
      xpath_text,
      remove_end,
@@ -12,6 +14,8 @@ from ..utils import (
      ExtractorError,
  )
  
+from .periscope import PeriscopeIE
+
  
  class TwitterBaseIE(InfoExtractor):
      def _get_vmap_video_url(self, vmap_url, video_id):
@@ -21,7 +25,7 @@ class TwitterBaseIE(InfoExtractor):
  
  class TwitterCardIE(TwitterBaseIE):
      IE_NAME = 'twitter:card'
-    _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?:cards/tfw/v1|videos/tweet)/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?:cards/tfw/v1|videos(?:/tweet)?)/(?P<id>\d+)'
      _TESTS = [
          {
              'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
@@ -47,12 +51,12 @@ class TwitterCardIE(TwitterBaseIE):
          },
          {
              'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
-            'md5': 'd4724ffe6d2437886d004fa5de1043b3',
+            'md5': 'b6d9683dd3f48e340ded81c0e917ad46',
              'info_dict': {
                  'id': 'dq4Oj5quskI',
                  'ext': 'mp4',
                  'title': 'Ubuntu 11.10 Overview',
-                'description': 'Take a quick peek at what\'s new and improved in Ubuntu 11.10.\n\nOnce installed take a look at 10 Things to Do After Installing: http://www.omgubuntu.co.uk/2011/10/10-things-to-do-after-installing-ubuntu-11-10/',
+                'description': 'md5:a831e97fa384863d6e26ce48d1c43376',
                  'upload_date': '20111013',
                  'uploader': 'OMG! Ubuntu!',
                  'uploader_id': 'omgubuntu',
@@ -80,6 +84,9 @@ class TwitterCardIE(TwitterBaseIE):
                  'title': 'Twitter web player',
                  'thumbnail': 're:^https?://.*\.jpg',
              },
+        }, {
+            'url': 'https://twitter.com/i/videos/752274308186120192',
+            'only_matching': True,
          },
      ]
  
@@ -99,12 +106,17 @@ class TwitterCardIE(TwitterBaseIE):
              return self.url_result(iframe_url)
  
          config = self._parse_json(self._html_search_regex(
-            r'data-(?:player-)?config="([^"]+)"', webpage, 'data player config'),
+            r'data-(?:player-)?config="([^"]+)"', webpage,
+            'data player config', default='{}'),
              video_id)
  
          if config.get('source_type') == 'vine':
              return self.url_result(config['player_url'], 'Vine')
  
+        periscope_url = PeriscopeIE._extract_url(webpage)
+        if periscope_url:
+            return self.url_result(periscope_url, PeriscopeIE.ie_key())
+
          def _search_dimensions_in_video_url(a_format, video_url):
              m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url)
              if m:
@@ -116,13 +128,16 @@ class TwitterCardIE(TwitterBaseIE):
          video_url = config.get('video_url') or config.get('playlist', [{}])[0].get('source')
  
          if video_url:
-            f = {
-                'url': video_url,
-            }
+            if determine_ext(video_url) == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(video_url, video_id, ext='mp4', m3u8_id='hls'))
+            else:
+                f = {
+                    'url': video_url,
+                }
  
-            _search_dimensions_in_video_url(f, video_url)
+                _search_dimensions_in_video_url(f, video_url)
  
-            formats.append(f)
+                formats.append(f)
  
          vmap_url = config.get('vmapUrl') or config.get('vmap_url')
          if vmap_url:
@@ -207,6 +222,7 @@ class TwitterIE(InfoExtractor):
              'uploader_id': 'giphz',
          },
          'expected_warnings': ['height', 'width'],
+        'skip': 'Account suspended',
      }, {
          'url': 'https://twitter.com/starwars/status/665052190608723968',
          'md5': '39b7199856dee6cd4432e72c74bc69d4',
@@ -239,10 +255,10 @@ class TwitterIE(InfoExtractor):
          'info_dict': {
              'id': '700207533655363584',
              'ext': 'mp4',
-            'title': 'jay - BEAT PROD: @suhmeduh #Damndaniel',
-            'description': 'jay on Twitter: "BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
+            'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
+            'description': 'JG on Twitter: "BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
              'thumbnail': 're:^https?://.*\.jpg',
-            'uploader': 'jay',
+            'uploader': 'JG',
              'uploader_id': 'jaydingeer',
          },
          'params': {
@@ -260,6 +276,31 @@ class TwitterIE(InfoExtractor):
              'upload_date': '20140615',
          },
          'add_ie': ['Vine'],
+    }, {
+        'url': 'https://twitter.com/captainamerica/status/719944021058060289',
+        'info_dict': {
+            'id': '719944021058060289',
+            'ext': 'mp4',
+            'title': 'Captain America - @King0fNerd Are you sure you made the right choice? Find out in theaters.',
+            'description': 'Captain America on Twitter: "@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI"',
+            'uploader_id': 'captainamerica',
+            'uploader': 'Captain America',
+        },
+        'params': {
+            'skip_download': True,  # requires ffmpeg
+        },
+    }, {
+        'url': 'https://twitter.com/OPP_HSD/status/779210622571536384',
+        'info_dict': {
+            'id': '1zqKVVlkqLaKB',
+            'ext': 'mp4',
+            'title': 'Sgt Kerry Schmidt - Ontario Provincial Police - Road rage, mischief, assault, rollover and fire in one occurrence',
+            'upload_date': '20160923',
+            'uploader_id': 'OPP_HSD',
+            'uploader': 'Sgt Kerry Schmidt - Ontario Provincial Police',
+            'timestamp': 1474613214,
+        },
+        'add_ie': ['Periscope'],
      }]
  
      def _real_extract(self, url):
@@ -267,7 +308,11 @@ class TwitterIE(InfoExtractor):
          user_id = mobj.group('user_id')
          twid = mobj.group('id')
  
-        webpage = self._download_webpage(self._TEMPLATE_URL % (user_id, twid), twid)
+        webpage, urlh = self._download_webpage_handle(
+            self._TEMPLATE_URL % (user_id, twid), twid)
+
+        if 'twitter.com/account/suspended' in urlh.geturl():
+            raise ExtractorError('Account suspended by Twitter.', expected=True)
  
          username = remove_end(self._og_search_title(webpage), ' on Twitter')
  
@@ -284,17 +329,6 @@ class TwitterIE(InfoExtractor):
              'title': username + ' - ' + title,
          }
  
-        card_id = self._search_regex(
-            r'["\']/i/cards/tfw/v1/(\d+)', webpage, 'twitter card url', default=None)
-        if card_id:
-            card_url = 'https://twitter.com/i/cards/tfw/v1/' + card_id
-            info.update({
-                '_type': 'url_transparent',
-                'ie_key': 'TwitterCard',
-                'url': card_url,
-            })
-            return info
-
          mobj = re.search(r'''(?x)
              <video[^>]+class="animated-gif"(?P<more_info>[^>]+)>\s*
                  <source[^>]+video-src="(?P<url>[^"]+)"
@@ -317,13 +351,22 @@ class TwitterIE(InfoExtractor):
              })
              return info
  
+        twitter_card_url = None
          if 'class="PlayableMedia' in webpage:
+            twitter_card_url = '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid)
+        else:
+            twitter_card_iframe_url = self._search_regex(
+                r'data-full-card-iframe-url=([\'"])(?P<url>(?:(?!\1).)+)\1',
+                webpage, 'Twitter card iframe URL', default=None, group='url')
+            if twitter_card_iframe_url:
+                twitter_card_url = compat_urlparse.urljoin(url, twitter_card_iframe_url)
+
+        if twitter_card_url:
              info.update({
                  '_type': 'url_transparent',
                  'ie_key': 'TwitterCard',
-                'url': '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid),
+                'url': twitter_card_url,
              })
-
              return info
  
          raise ExtractorError('There\'s no video in this tweet.')
@@ -331,7 +374,7 @@ class TwitterIE(InfoExtractor):
  
  class TwitterAmplifyIE(TwitterBaseIE):
      IE_NAME = 'twitter:amplify'
-    _VALID_URL = 'https?://amp\.twimg\.com/v/(?P<id>[0-9a-f\-]{36})'
+    _VALID_URL = r'https?://amp\.twimg\.com/v/(?P<id>[0-9a-f\-]{36})'
  
      _TEST = {
          'url': 'https://amp.twimg.com/v/0ba0c3c7-0af3-4c0a-bed5-7efd1ffa2951',
diff --git a/youtube_dl/extractor/ubu.py b/youtube_dl/extractor/ubu.py

deleted file mode 100644 (file)

index 1d52cbc..0000000
--- a/youtube_dl/extractor/ubu.py
+++ /dev/null
@@ -1,57 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    qualities,
-)
-
-
-class UbuIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ubu\.com/film/(?P<id>[\da-z_-]+)\.html'
-    _TEST = {
-        'url': 'http://ubu.com/film/her_noise.html',
-        'md5': '138d5652618bf0f03878978db9bef1ee',
-        'info_dict': {
-            'id': 'her_noise',
-            'ext': 'm4v',
-            'title': 'Her Noise - The Making Of (2007)',
-            'duration': 3600,
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._html_search_regex(
-            r'<title>.+?Film &amp; Video: ([^<]+)</title>', webpage, 'title')
-
-        duration = int_or_none(self._html_search_regex(
-            r'Duration: (\d+) minutes', webpage, 'duration', fatal=False),
-            invscale=60)
-
-        formats = []
-        FORMAT_REGEXES = [
-            ('sq', r"'flashvars'\s*,\s*'file=([^']+)'"),
-            ('hq', r'href="(http://ubumexico\.centro\.org\.mx/video/[^"]+)"'),
-        ]
-        preference = qualities([fid for fid, _ in FORMAT_REGEXES])
-        for format_id, format_regex in FORMAT_REGEXES:
-            m = re.search(format_regex, webpage)
-            if m:
-                formats.append({
-                    'url': m.group(1),
-                    'format_id': format_id,
-                    'preference': preference(format_id),
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'duration': duration,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/udemy.py b/youtube_dl/extractor/udemy.py

index 71bea5363ed77ddbf476bb92050e4d675c6f13a9..cce29c6e07134c58425f5486b15ce6d9e017d0d7 100644 (file)
--- a/youtube_dl/extractor/udemy.py
+++ b/youtube_dl/extractor/udemy.py
@@ -5,7 +5,7 @@ import re
  from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
-    compat_urllib_parse_urlencode,
+    compat_str,
      compat_urllib_request,
      compat_urlparse,
  )
@@ -54,6 +54,16 @@ class UdemyIE(InfoExtractor):
          'only_matching': True,
      }]
  
+    def _extract_course_info(self, webpage, video_id):
+        course = self._parse_json(
+            unescapeHTML(self._search_regex(
+                r'ng-init=["\'].*\bcourse=({.+?});', webpage, 'course', default='{}')),
+            video_id, fatal=False) or {}
+        course_id = course.get('id') or self._search_regex(
+            (r'&quot;id&quot;\s*:\s*(\d+)', r'data-course-id=["\'](\d+)'),
+            webpage, 'course id')
+        return course_id, course.get('title')
+
      def _enroll_course(self, base_url, webpage, course_id):
          def combine_url(base_url, url):
              return compat_urlparse.urljoin(base_url, url) if not url.startswith('http') else url
@@ -74,18 +84,19 @@ class UdemyIE(InfoExtractor):
          if enroll_url:
              webpage = self._download_webpage(
                  combine_url(base_url, enroll_url),
-                course_id, 'Enrolling in the course')
+                course_id, 'Enrolling in the course',
+                headers={'Referer': base_url})
              if '>You have enrolled in' in webpage:
                  self.to_screen('%s: Successfully enrolled in the course' % course_id)
  
      def _download_lecture(self, course_id, lecture_id):
          return self._download_json(
-            'https://www.udemy.com/api-2.0/users/me/subscribed-courses/%s/lectures/%s?%s' % (
-                course_id, lecture_id, compat_urllib_parse_urlencode({
-                    'fields[lecture]': 'title,description,view_html,asset',
-                    'fields[asset]': 'asset_type,stream_url,thumbnail_url,download_urls,data',
-                })),
-            lecture_id, 'Downloading lecture JSON')
+            'https://www.udemy.com/api-2.0/users/me/subscribed-courses/%s/lectures/%s?'
+            % (course_id, lecture_id),
+            lecture_id, 'Downloading lecture JSON', query={
+                'fields[lecture]': 'title,description,view_html,asset',
+                'fields[asset]': 'asset_type,stream_url,thumbnail_url,download_urls,data',
+            })
  
      def _handle_error(self, response):
          if not isinstance(response, dict):
@@ -98,7 +109,7 @@ class UdemyIE(InfoExtractor):
                  error_str += ' - %s' % error_data.get('formErrors')
              raise ExtractorError(error_str, expected=True)
  
-    def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'):
+    def _download_json(self, url_or_request, *args, **kwargs):
          headers = {
              'X-Udemy-Snail-Case': 'true',
              'X-Requested-With': 'XMLHttpRequest',
@@ -116,7 +127,7 @@ class UdemyIE(InfoExtractor):
          else:
              url_or_request = sanitized_Request(url_or_request, headers=headers)
  
-        response = super(UdemyIE, self)._download_json(url_or_request, video_id, note)
+        response = super(UdemyIE, self)._download_json(url_or_request, *args, **kwargs)
          self._handle_error(response)
          return response
  
@@ -132,7 +143,9 @@ class UdemyIE(InfoExtractor):
              self._LOGIN_URL, None, 'Downloading login popup')
  
          def is_logged(webpage):
-            return any(p in webpage for p in ['href="https://www.udemy.com/user/logout/', '>Logout<'])
+            return any(re.search(p, webpage) for p in (
+                r'href=["\'](?:https://www\.udemy\.com)?/user/logout/',
+                r'>Logout<'))
  
          # already logged in
          if is_logged(login_popup):
@@ -141,17 +154,17 @@ class UdemyIE(InfoExtractor):
          login_form = self._form_hidden_inputs('login-form', login_popup)
  
          login_form.update({
-            'email': username.encode('utf-8'),
-            'password': password.encode('utf-8'),
+            'email': username,
+            'password': password,
          })
  
-        request = sanitized_Request(
-            self._LOGIN_URL, urlencode_postdata(login_form))
-        request.add_header('Referer', self._ORIGIN_URL)
-        request.add_header('Origin', self._ORIGIN_URL)
-
          response = self._download_webpage(
-            request, None, 'Logging in as %s' % username)
+            self._LOGIN_URL, None, 'Logging in as %s' % username,
+            data=urlencode_postdata(login_form),
+            headers={
+                'Referer': self._ORIGIN_URL,
+                'Origin': self._ORIGIN_URL,
+            })
  
          if not is_logged(response):
              error = self._html_search_regex(
@@ -166,9 +179,7 @@ class UdemyIE(InfoExtractor):
  
          webpage = self._download_webpage(url, lecture_id)
  
-        course_id = self._search_regex(
-            (r'data-course-id=["\'](\d+)', r'&quot;id&quot;\s*:\s*(\d+)'),
-            webpage, 'course id')
+        course_id, _ = self._extract_course_info(webpage, lecture_id)
  
          try:
              lecture = self._download_lecture(course_id, lecture_id)
@@ -185,20 +196,20 @@ class UdemyIE(InfoExtractor):
  
          asset = lecture['asset']
  
-        asset_type = asset.get('assetType') or asset.get('asset_type')
+        asset_type = asset.get('asset_type') or asset.get('assetType')
          if asset_type != 'Video':
              raise ExtractorError(
                  'Lecture %s is not a video' % lecture_id, expected=True)
  
-        stream_url = asset.get('streamUrl') or asset.get('stream_url')
+        stream_url = asset.get('stream_url') or asset.get('streamUrl')
          if stream_url:
              youtube_url = self._search_regex(
                  r'(https?://www\.youtube\.com/watch\?v=.*)', stream_url, 'youtube URL', default=None)
              if youtube_url:
                  return self.url_result(youtube_url, 'Youtube')
  
-        video_id = asset['id']
-        thumbnail = asset.get('thumbnailUrl') or asset.get('thumbnail_url')
+        video_id = compat_str(asset['id'])
+        thumbnail = asset.get('thumbnail_url') or asset.get('thumbnailUrl')
          duration = float_or_none(asset.get('data', {}).get('duration'))
  
          formats = []
@@ -297,7 +308,7 @@ class UdemyIE(InfoExtractor):
  
  class UdemyCourseIE(UdemyIE):
      IE_NAME = 'udemy:course'
-    _VALID_URL = r'https?://www\.udemy\.com/(?P<id>[\da-z-]+)'
+    _VALID_URL = r'https?://(?:www\.)?udemy\.com/(?P<id>[^/?#&]+)'
      _TESTS = []
  
      @classmethod
@@ -309,29 +320,34 @@ class UdemyCourseIE(UdemyIE):
  
          webpage = self._download_webpage(url, course_path)
  
-        response = self._download_json(
-            'https://www.udemy.com/api-1.1/courses/%s' % course_path,
-            course_path, 'Downloading course JSON')
-
-        course_id = response['id']
-        course_title = response.get('title')
+        course_id, title = self._extract_course_info(webpage, course_path)
  
          self._enroll_course(url, webpage, course_id)
  
          response = self._download_json(
-            'https://www.udemy.com/api-1.1/courses/%s/curriculum' % course_id,
-            course_id, 'Downloading course curriculum')
+            'https://www.udemy.com/api-2.0/courses/%s/cached-subscriber-curriculum-items' % course_id,
+            course_id, 'Downloading course curriculum', query={
+                'fields[chapter]': 'title,object_index',
+                'fields[lecture]': 'title,asset',
+                'page_size': '1000',
+            })
  
          entries = []
-        chapter, chapter_number = None, None
-        for asset in response:
-            asset_type = asset.get('assetType') or asset.get('asset_type')
-            if asset_type == 'Video':
-                asset_id = asset.get('id')
-                if asset_id:
+        chapter, chapter_number = [None] * 2
+        for entry in response['results']:
+            clazz = entry.get('_class')
+            if clazz == 'lecture':
+                asset = entry.get('asset')
+                if isinstance(asset, dict):
+                    asset_type = asset.get('asset_type') or asset.get('assetType')
+                    if asset_type != 'Video':
+                        continue
+                lecture_id = entry.get('id')
+                if lecture_id:
                      entry = {
                          '_type': 'url_transparent',
-                        'url': 'https://www.udemy.com/%s/#/lecture/%s' % (course_path, asset['id']),
+                        'url': 'https://www.udemy.com/%s/learn/v4/t/lecture/%s' % (course_path, entry['id']),
+                        'title': entry.get('title'),
                          'ie_key': UdemyIE.ie_key(),
                      }
                      if chapter_number:
@@ -339,8 +355,8 @@ class UdemyCourseIE(UdemyIE):
                      if chapter:
                          entry['chapter'] = chapter
                      entries.append(entry)
-            elif asset.get('type') == 'chapter':
-                chapter_number = asset.get('index') or asset.get('object_index')
-                chapter = asset.get('title')
+            elif clazz == 'chapter':
+                chapter_number = entry.get('object_index')
+                chapter = entry.get('title')
  
-        return self.playlist_result(entries, course_id, course_title)
+        return self.playlist_result(entries, course_id, title)
diff --git a/youtube_dl/extractor/udn.py b/youtube_dl/extractor/udn.py

index ee35b7227372c0ddc128dfc694577578f9fc6009..57dd73aef6f6254f22cdcd814e2d76b20c75b847 100644 (file)
--- a/youtube_dl/extractor/udn.py
+++ b/youtube_dl/extractor/udn.py
@@ -2,10 +2,13 @@
  from __future__ import unicode_literals
  
  import json
+import re
+
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
+    int_or_none,
      js_to_json,
-    ExtractorError,
  )
  from ..compat import compat_urlparse
  
@@ -16,13 +19,16 @@ class UDNEmbedIE(InfoExtractor):
      _VALID_URL = r'https?:' + _PROTOCOL_RELATIVE_VALID_URL
      _TESTS = [{
          'url': 'http://video.udn.com/embed/news/300040',
-        'md5': 'de06b4c90b042c128395a88f0384817e',
          'info_dict': {
              'id': '300040',
              'ext': 'mp4',
              'title': '生物老師男變女 全校挺"做自己"',
              'thumbnail': 're:^https?://.*\.jpg$',
-        }
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }, {
          'url': 'https://video.udn.com/embed/news/300040',
          'only_matching': True,
@@ -38,39 +44,53 @@ class UDNEmbedIE(InfoExtractor):
          page = self._download_webpage(url, video_id)
  
          options = json.loads(js_to_json(self._html_search_regex(
-            r'var options\s*=\s*([^;]+);', page, 'video urls dictionary')))
+            r'var\s+options\s*=\s*([^;]+);', page, 'video urls dictionary')))
  
          video_urls = options['video']
  
          if video_urls.get('youtube'):
              return self.url_result(video_urls.get('youtube'), 'Youtube')
  
-        try:
-            del video_urls['youtube']
-        except KeyError:
-            pass
+        formats = []
+        for video_type, api_url in video_urls.items():
+            if not api_url:
+                continue
  
-        formats = [{
-            'url': self._download_webpage(
+            video_url = self._download_webpage(
                  compat_urlparse.urljoin(url, api_url), video_id,
-                'retrieve url for %s video' % video_type),
-            'format_id': video_type,
-            'preference': 0 if video_type == 'mp4' else -1,
-        } for video_type, api_url in video_urls.items() if api_url]
+                note='retrieve url for %s video' % video_type)
  
-        if not formats:
-            raise ExtractorError('No videos found', expected=True)
+            ext = determine_ext(video_url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, ext='mp4', m3u8_id='hls'))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    video_url, video_id, f4m_id='hds'))
+            else:
+                mobj = re.search(r'_(?P<height>\d+)p_(?P<tbr>\d+).mp4', video_url)
+                a_format = {
+                    'url': video_url,
+                    # video_type may be 'mp4', which confuses YoutubeDL
+                    'format_id': 'http-' + video_type,
+                }
+                if mobj:
+                    a_format.update({
+                        'height': int_or_none(mobj.group('height')),
+                        'tbr': int_or_none(mobj.group('tbr')),
+                    })
+                formats.append(a_format)
  
          self._sort_formats(formats)
  
-        thumbnail = None
-
-        if options.get('gallery') and len(options['gallery']):
-            thumbnail = options['gallery'][0].get('original')
+        thumbnails = [{
+            'url': img_url,
+            'id': img_type,
+        } for img_type, img_url in options.get('gallery', [{}])[0].items() if img_url]
  
          return {
              'id': video_id,
              'formats': formats,
              'title': options['title'],
-            'thumbnail': thumbnail
+            'thumbnails': thumbnails,
          }
diff --git a/youtube_dl/extractor/unistra.py b/youtube_dl/extractor/unistra.py

index 66d9f1bf3fc9ff8481fb55aa8045078244b11635..a724cdbef8620821e65cb22324fcfcb33c1a3b4c 100644 (file)
--- a/youtube_dl/extractor/unistra.py
+++ b/youtube_dl/extractor/unistra.py
@@ -49,6 +49,7 @@ class UnistraIE(InfoExtractor):
                  'format_id': format_id,
                  'quality': quality(format_id)
              })
+        self._sort_formats(formats)
  
          title = self._html_search_regex(
              r'<title>UTV - (.*?)</', webpage, 'title')
diff --git a/youtube_dl/extractor/uol.py b/youtube_dl/extractor/uol.py

new file mode 100644 (file)

index 0000000..c27c643
--- /dev/null
+++ b/youtube_dl/extractor/uol.py
@@ -0,0 +1,128 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    clean_html,
+    int_or_none,
+    parse_duration,
+    update_url_query,
+    str_or_none,
+)
+
+
+class UOLIE(InfoExtractor):
+    IE_NAME = 'uol.com.br'
+    _VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
+    _TESTS = [{
+        'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
+        'md5': '25291da27dc45e0afb5718a8603d3816',
+        'info_dict': {
+            'id': '15951931',
+            'ext': 'mp4',
+            'title': 'Miss simpatia é encontrada morta',
+            'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
+        }
+    }, {
+        'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
+        'md5': 'e41a2fb7b7398a3a46b6af37b15c00c9',
+        'info_dict': {
+            'id': '15954259',
+            'ext': 'mp4',
+            'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
+            'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
+        }
+    }, {
+        'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
+        'only_matching': True,
+    }, {
+        'url': 'http://mais.uol.com.br/view/15954259',
+        'only_matching': True,
+    }, {
+        'url': 'http://noticias.band.uol.com.br/brasilurgente/video/2016/08/05/15951931/miss-simpatia-e-encontrada-morta.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://videos.band.uol.com.br/programa.asp?e=noticias&pr=brasil-urgente&v=15951931&t=Policia-desmonte-base-do-PCC-na-Cracolandia',
+        'only_matching': True,
+    }, {
+        'url': 'http://mais.uol.com.br/view/cphaa0gl2x8r/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
+        'only_matching': True,
+    }, {
+        'url': 'http://noticias.uol.com.br//videos/assistir.htm?video=rafaela-silva-inspira-criancas-no-judo-04024D983968D4C95326',
+        'only_matching': True,
+    }, {
+        'url': 'http://mais.uol.com.br/view/e0qbgxid79uv/15275470',
+        'only_matching': True,
+    }]
+
+    _FORMATS = {
+        '2': {
+            'width': 640,
+            'height': 360,
+        },
+        '5': {
+            'width': 1080,
+            'height': 720,
+        },
+        '6': {
+            'width': 426,
+            'height': 240,
+        },
+        '7': {
+            'width': 1920,
+            'height': 1080,
+        },
+        '8': {
+            'width': 192,
+            'height': 144,
+        },
+        '9': {
+            'width': 568,
+            'height': 320,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        if not video_id.isdigit():
+            embed_page = self._download_webpage('https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id, video_id)
+            video_id = self._search_regex(r'mediaId=(\d+)', embed_page, 'media id')
+        video_data = self._download_json(
+            'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % video_id,
+            video_id)['item']
+        title = video_data['title']
+
+        query = {
+            'ver': video_data.get('numRevision', 2),
+            'r': 'http://mais.uol.com.br',
+        }
+        formats = []
+        for f in video_data.get('formats', []):
+            f_url = f.get('url') or f.get('secureUrl')
+            if not f_url:
+                continue
+            format_id = str_or_none(f.get('id'))
+            fmt = {
+                'format_id': format_id,
+                'url': update_url_query(f_url, query),
+            }
+            fmt.update(self._FORMATS.get(format_id, {}))
+            formats.append(fmt)
+        self._sort_formats(formats)
+
+        tags = []
+        for tag in video_data.get('tags', []):
+            tag_description = tag.get('description')
+            if not tag_description:
+                continue
+            tags.append(tag_description)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': clean_html(video_data.get('desMedia')),
+            'thumbnail': video_data.get('thumbnail'),
+            'duration': int_or_none(video_data.get('durationSeconds')) or parse_duration(video_data.get('duration')),
+            'tags': tags,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/uplynk.py b/youtube_dl/extractor/uplynk.py

new file mode 100644 (file)

index 0000000..2cd22cf
--- /dev/null
+++ b/youtube_dl/extractor/uplynk.py
@@ -0,0 +1,68 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    float_or_none,
+    ExtractorError,
+)
+
+
+class UplynkIE(InfoExtractor):
+    IE_NAME = 'uplynk'
+    _VALID_URL = r'https?://.*?\.uplynk\.com/(?P<path>ext/[0-9a-f]{32}/(?P<external_id>[^/?&]+)|(?P<id>[0-9a-f]{32}))\.(?:m3u8|json)(?:.*?\bpbs=(?P<session_id>[^&]+))?'
+    _TEST = {
+        'url': 'http://content.uplynk.com/e89eaf2ce9054aa89d92ddb2d817a52e.m3u8',
+        'info_dict': {
+            'id': 'e89eaf2ce9054aa89d92ddb2d817a52e',
+            'ext': 'mp4',
+            'title': '030816-kgo-530pm-solar-eclipse-vid_web.mp4',
+            'uploader_id': '4413701bf5a1488db55b767f8ae9d4fa',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _extract_uplynk_info(self, uplynk_content_url):
+        path, external_id, video_id, session_id = re.match(UplynkIE._VALID_URL, uplynk_content_url).groups()
+        display_id = video_id or external_id
+        formats = self._extract_m3u8_formats('http://content.uplynk.com/%s.m3u8' % path, display_id, 'mp4')
+        if session_id:
+            for f in formats:
+                f['extra_param_to_segment_url'] = 'pbs=' + session_id
+        self._sort_formats(formats)
+        asset = self._download_json('http://content.uplynk.com/player/assetinfo/%s.json' % path, display_id)
+        if asset.get('error') == 1:
+            raise ExtractorError('% said: %s' % (self.IE_NAME, asset['msg']), expected=True)
+
+        return {
+            'id': asset['asset'],
+            'title': asset['desc'],
+            'thumbnail': asset.get('default_poster_url'),
+            'duration': float_or_none(asset.get('duration')),
+            'uploader_id': asset.get('owner'),
+            'formats': formats,
+        }
+
+    def _real_extract(self, url):
+        return self._extract_uplynk_info(url)
+
+
+class UplynkPreplayIE(UplynkIE):
+    IE_NAME = 'uplynk:preplay'
+    _VALID_URL = r'https?://.*?\.uplynk\.com/preplay2?/(?P<path>ext/[0-9a-f]{32}/(?P<external_id>[^/?&]+)|(?P<id>[0-9a-f]{32}))\.json'
+    _TEST = None
+
+    def _real_extract(self, url):
+        path, external_id, video_id = re.match(self._VALID_URL, url).groups()
+        display_id = video_id or external_id
+        preplay = self._download_json(url, display_id)
+        content_url = 'http://content.uplynk.com/%s.m3u8' % path
+        session_id = preplay.get('sid')
+        if session_id:
+            content_url += '?pbs=' + session_id
+        return self._extract_uplynk_info(content_url)
diff --git a/youtube_dl/extractor/urplay.py b/youtube_dl/extractor/urplay.py

new file mode 100644 (file)

index 0000000..8e6fd47
--- /dev/null
+++ b/youtube_dl/extractor/urplay.py
@@ -0,0 +1,57 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class URPlayIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?ur(?:play|skola)\.se/(?:program|Produkter)/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://urplay.se/program/190031-tripp-trapp-trad-sovkudde',
+        'md5': 'ad5f0de86f16ca4c8062cd103959a9eb',
+        'info_dict': {
+            'id': '190031',
+            'ext': 'mp4',
+            'title': 'Tripp, Trapp, Träd : Sovkudde',
+            'description': 'md5:b86bffdae04a7e9379d1d7e5947df1d1',
+        },
+    }, {
+        'url': 'http://urskola.se/Produkter/155794-Smasagor-meankieli-Grodan-i-vida-varlden',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+        urplayer_data = self._parse_json(self._search_regex(
+            r'urPlayer\.init\(({.+?})\);', webpage, 'urplayer data'), video_id)
+        host = self._download_json('http://streaming-loadbalancer.ur.se/loadbalancer.json', video_id)['redirect']
+
+        formats = []
+        for quality_attr, quality, preference in (('', 'sd', 0), ('_hd', 'hd', 1)):
+            file_http = urplayer_data.get('file_http' + quality_attr) or urplayer_data.get('file_http_sub' + quality_attr)
+            if file_http:
+                formats.extend(self._extract_wowza_formats(
+                    'http://%s/%splaylist.m3u8' % (host, file_http), video_id, skip_protocols=['rtmp', 'rtsp']))
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for subtitle in urplayer_data.get('subtitles', []):
+            subtitle_url = subtitle.get('file')
+            kind = subtitle.get('kind')
+            if not subtitle_url or (kind and kind != 'captions'):
+                continue
+            subtitles.setdefault(subtitle.get('label', 'Svenska'), []).append({
+                'url': subtitle_url,
+            })
+
+        return {
+            'id': video_id,
+            'title': urplayer_data['title'],
+            'description': self._og_search_description(webpage),
+            'thumbnail': urplayer_data.get('image'),
+            'series': urplayer_data.get('series_title'),
+            'subtitles': subtitles,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/usanetwork.py b/youtube_dl/extractor/usanetwork.py

new file mode 100644 (file)

index 0000000..8233407
--- /dev/null
+++ b/youtube_dl/extractor/usanetwork.py
@@ -0,0 +1,76 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .adobepass import AdobePassIE
+from ..utils import (
+    extract_attributes,
+    smuggle_url,
+    update_url_query,
+)
+
+
+class USANetworkIE(AdobePassIE):
+    _VALID_URL = r'https?://(?:www\.)?usanetwork\.com/(?:[^/]+/videos|movies)/(?P<id>[^/?#]+)'
+    _TEST = {
+        'url': 'http://www.usanetwork.com/mrrobot/videos/hpe-cybersecurity',
+        'md5': '33c0d2ba381571b414024440d08d57fd',
+        'info_dict': {
+            'id': '3086229',
+            'ext': 'mp4',
+            'title': 'HPE Cybersecurity',
+            'description': 'The more we digitize our world, the more vulnerable we are.',
+            'upload_date': '20160818',
+            'timestamp': 1471535460,
+            'uploader': 'NBCU-USA',
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        player_params = extract_attributes(self._search_regex(
+            r'(<div[^>]+data-usa-tve-player-container[^>]*>)', webpage, 'player params'))
+        video_id = player_params['data-mpx-guid']
+        title = player_params['data-episode-title']
+
+        account_pid, path = re.search(
+            r'data-src="(?:https?)?//player\.theplatform\.com/p/([^/]+)/.*?/(media/guid/\d+/\d+)',
+            webpage).groups()
+
+        query = {
+            'mbr': 'true',
+        }
+        if player_params.get('data-is-full-episode') == '1':
+            query['manifest'] = 'm3u'
+
+        if player_params.get('data-entitlement') == 'auth':
+            adobe_pass = {}
+            drupal_settings = self._search_regex(
+                r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
+                webpage, 'drupal settings', fatal=False)
+            if drupal_settings:
+                drupal_settings = self._parse_json(drupal_settings, video_id, fatal=False)
+                if drupal_settings:
+                    adobe_pass = drupal_settings.get('adobePass', {})
+            resource = self._get_mvpd_resource(
+                adobe_pass.get('adobePassResourceId', 'usa'),
+                title, video_id, player_params.get('data-episode-rating', 'TV-14'))
+            query['auth'] = self._extract_mvpd_auth(
+                url, video_id, adobe_pass.get('adobePassRequestorId', 'usa'), resource)
+
+        info = self._search_json_ld(webpage, video_id, default={})
+        info.update({
+            '_type': 'url_transparent',
+            'url': smuggle_url(update_url_query(
+                'http://link.theplatform.com/s/%s/%s' % (account_pid, path),
+                query), {'force_smil_url': True}),
+            'id': video_id,
+            'title': title,
+            'series': player_params.get('data-show-title'),
+            'episode': title,
+            'ie_key': 'ThePlatform',
+        })
+        return info
diff --git a/youtube_dl/extractor/ustream.py b/youtube_dl/extractor/ustream.py

index b5fe753d7115923d16ed6cf7de34c8723368f82f..0c06bf36bd5f76cabecc47e699ad56a45ba63a4a 100644 (file)
--- a/youtube_dl/extractor/ustream.py
+++ b/youtube_dl/extractor/ustream.py
@@ -1,20 +1,25 @@
  from __future__ import unicode_literals
  
+import random
  import re
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_str,
      compat_urlparse,
  )
  from ..utils import (
+    encode_data_uri,
      ExtractorError,
      int_or_none,
      float_or_none,
+    mimetype2ext,
+    str_or_none,
  )
  
  
  class UstreamIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.ustream\.tv/(?P<type>recorded|embed|embed/recorded)/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?ustream\.tv/(?P<type>recorded|embed|embed/recorded)/(?P<id>\d+)'
      IE_NAME = 'ustream'
      _TESTS = [{
          'url': 'http://www.ustream.tv/recorded/20274954',
@@ -41,8 +46,114 @@ class UstreamIE(InfoExtractor):
              'uploader': 'sportscanadatv',
          },
          'skip': 'This Pro Broadcaster has chosen to remove this video from the ustream.tv site.',
+    }, {
+        'url': 'http://www.ustream.tv/embed/10299409',
+        'info_dict': {
+            'id': '10299409',
+        },
+        'playlist_count': 3,
+    }, {
+        'url': 'http://www.ustream.tv/recorded/91343263',
+        'info_dict': {
+            'id': '91343263',
+            'ext': 'mp4',
+            'title': 'GitHub Universe - General Session - Day 1',
+            'upload_date': '20160914',
+            'description': 'GitHub Universe - General Session - Day 1',
+            'timestamp': 1473872730,
+            'uploader': 'wa0dnskeqkr',
+            'uploader_id': '38977840',
+        },
+        'params': {
+            'skip_download': True,  # m3u8 download
+        },
      }]
  
+    def _get_stream_info(self, url, video_id, app_id_ver, extra_note=None):
+        def num_to_hex(n):
+            return hex(n)[2:]
+
+        rnd = random.randrange
+
+        if not extra_note:
+            extra_note = ''
+
+        conn_info = self._download_json(
+            'http://r%d-1-%s-recorded-lp-live.ums.ustream.tv/1/ustream' % (rnd(1e8), video_id),
+            video_id, note='Downloading connection info' + extra_note,
+            query={
+                'type': 'viewer',
+                'appId': app_id_ver[0],
+                'appVersion': app_id_ver[1],
+                'rsid': '%s:%s' % (num_to_hex(rnd(1e8)), num_to_hex(rnd(1e8))),
+                'rpin': '_rpin.%d' % rnd(1e15),
+                'referrer': url,
+                'media': video_id,
+                'application': 'recorded',
+            })
+        host = conn_info[0]['args'][0]['host']
+        connection_id = conn_info[0]['args'][0]['connectionId']
+
+        return self._download_json(
+            'http://%s/1/ustream?connectionId=%s' % (host, connection_id),
+            video_id, note='Downloading stream info' + extra_note)
+
+    def _get_streams(self, url, video_id, app_id_ver):
+        # Sometimes the return dict does not have 'stream'
+        for trial_count in range(3):
+            stream_info = self._get_stream_info(
+                url, video_id, app_id_ver,
+                extra_note=' (try %d)' % (trial_count + 1) if trial_count > 0 else '')
+            if 'stream' in stream_info[0]['args'][0]:
+                return stream_info[0]['args'][0]['stream']
+        return []
+
+    def _parse_segmented_mp4(self, dash_stream_info):
+        def resolve_dash_template(template, idx, chunk_hash):
+            return template.replace('%', compat_str(idx), 1).replace('%', chunk_hash)
+
+        formats = []
+        for stream in dash_stream_info['streams']:
+            # Use only one provider to avoid too many formats
+            provider = dash_stream_info['providers'][0]
+            fragments = [{
+                'url': resolve_dash_template(
+                    provider['url'] + stream['initUrl'], 0, dash_stream_info['hashes']['0'])
+            }]
+            for idx in range(dash_stream_info['videoLength'] // dash_stream_info['chunkTime']):
+                fragments.append({
+                    'url': resolve_dash_template(
+                        provider['url'] + stream['segmentUrl'], idx,
+                        dash_stream_info['hashes'][compat_str(idx // 10 * 10)])
+                })
+            content_type = stream['contentType']
+            kind = content_type.split('/')[0]
+            f = {
+                'format_id': '-'.join(filter(None, [
+                    'dash', kind, str_or_none(stream.get('bitrate'))])),
+                'protocol': 'http_dash_segments',
+                # TODO: generate a MPD doc for external players?
+                'url': encode_data_uri(b'<MPD/>', 'text/xml'),
+                'ext': mimetype2ext(content_type),
+                'height': stream.get('height'),
+                'width': stream.get('width'),
+                'fragments': fragments,
+            }
+            if kind == 'video':
+                f.update({
+                    'vcodec': stream.get('codec'),
+                    'acodec': 'none',
+                    'vbr': stream.get('bitrate'),
+                })
+            else:
+                f.update({
+                    'vcodec': 'none',
+                    'acodec': stream.get('codec'),
+                    'abr': stream.get('bitrate'),
+                })
+            formats.append(f)
+        return formats
+
      def _real_extract(self, url):
          m = re.match(self._VALID_URL, url)
          video_id = m.group('id')
@@ -55,10 +166,12 @@ class UstreamIE(InfoExtractor):
          if m.group('type') == 'embed':
              video_id = m.group('id')
              webpage = self._download_webpage(url, video_id)
-            desktop_video_id = self._html_search_regex(
-                r'ContentVideoIds=\["([^"]*?)"\]', webpage, 'desktop_video_id')
-            desktop_url = 'http://www.ustream.tv/recorded/' + desktop_video_id
-            return self.url_result(desktop_url, 'Ustream')
+            content_video_ids = self._parse_json(self._search_regex(
+                r'ustream\.vars\.offAirContentVideoIds=([^;]+);', webpage,
+                'content video IDs'), video_id)
+            return self.playlist_result(
+                map(lambda u: self.url_result('http://www.ustream.tv/recorded/' + u, 'Ustream'), content_video_ids),
+                video_id)
  
          params = self._download_json(
              'https://api.ustream.tv/videos/%s.json' % video_id, video_id)
@@ -78,7 +191,22 @@ class UstreamIE(InfoExtractor):
              'url': video_url,
              'ext': format_id,
              'filesize': filesize,
-        } for format_id, video_url in video['media_urls'].items()]
+        } for format_id, video_url in video['media_urls'].items() if video_url]
+
+        if not formats:
+            hls_streams = self._get_streams(url, video_id, app_id_ver=(11, 2))
+            if hls_streams:
+                # m3u8_native leads to intermittent ContentTooShortError
+                formats.extend(self._extract_m3u8_formats(
+                    hls_streams[0]['url'], video_id, ext='mp4', m3u8_id='hls'))
+
+            '''
+            # DASH streams handling is incomplete as 'url' is missing
+            dash_streams = self._get_streams(url, video_id, app_id_ver=(3, 1))
+            if dash_streams:
+                formats.extend(self._parse_segmented_mp4(dash_streams))
+            '''
+
          self._sort_formats(formats)
  
          description = video.get('description')
@@ -109,7 +237,7 @@ class UstreamIE(InfoExtractor):
  
  
  class UstreamChannelIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.ustream\.tv/channel/(?P<slug>.+)'
+    _VALID_URL = r'https?://(?:www\.)?ustream\.tv/channel/(?P<slug>.+)'
      IE_NAME = 'ustream:channel'
      _TEST = {
          'url': 'http://www.ustream.tv/channel/channeljapan',
diff --git a/youtube_dl/extractor/ustudio.py b/youtube_dl/extractor/ustudio.py

index cafc082b6bb8a589edaf02ce9a87f266e44c941d..3484a204658e1f09d472c0b31026ec6621121f1f 100644 (file)
--- a/youtube_dl/extractor/ustudio.py
+++ b/youtube_dl/extractor/ustudio.py
@@ -6,10 +6,12 @@ from .common import InfoExtractor
  from ..utils import (
      int_or_none,
      unified_strdate,
+    unescapeHTML,
  )
  
  
  class UstudioIE(InfoExtractor):
+    IE_NAME = 'ustudio'
      _VALID_URL = r'https?://(?:(?:www|v1)\.)?ustudio\.com/video/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
      _TEST = {
          'url': 'http://ustudio.com/video/Uxu2my9bgSph/san_francisco_golden_gate_bridge',
@@ -27,9 +29,7 @@ class UstudioIE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        video_id, display_id = re.match(self._VALID_URL, url).groups()
  
          config = self._download_xml(
              'http://v1.ustudio.com/embed/%s/ustudio/config.xml' % video_id,
@@ -37,7 +37,7 @@ class UstudioIE(InfoExtractor):
  
          def extract(kind):
              return [{
-                'url': item.attrib['url'],
+                'url': unescapeHTML(item.attrib['url']),
                  'width': int_or_none(item.get('width')),
                  'height': int_or_none(item.get('height')),
              } for item in config.findall('./qualities/quality/%s' % kind) if item.get('url')]
@@ -65,3 +65,61 @@ class UstudioIE(InfoExtractor):
              'uploader': uploader,
              'formats': formats,
          }
+
+
+class UstudioEmbedIE(InfoExtractor):
+    IE_NAME = 'ustudio:embed'
+    _VALID_URL = r'https?://(?:(?:app|embed)\.)?ustudio\.com/embed/(?P<uid>[^/]+)/(?P<id>[^/]+)'
+    _TEST = {
+        'url': 'http://app.ustudio.com/embed/DeN7VdYRDKhP/Uw7G1kMCe65T',
+        'md5': '47c0be52a09b23a7f40de9469cec58f4',
+        'info_dict': {
+            'id': 'Uw7G1kMCe65T',
+            'ext': 'mp4',
+            'title': '5 Things IT Should Know About Video',
+            'description': 'md5:93d32650884b500115e158c5677d25ad',
+            'uploader_id': 'DeN7VdYRDKhP',
+        }
+    }
+
+    def _real_extract(self, url):
+        uploader_id, video_id = re.match(self._VALID_URL, url).groups()
+        video_data = self._download_json(
+            'http://app.ustudio.com/embed/%s/%s/config.json' % (uploader_id, video_id),
+            video_id)['videos'][0]
+        title = video_data['name']
+
+        formats = []
+        for ext, qualities in video_data.get('transcodes', {}).items():
+            for quality in qualities:
+                quality_url = quality.get('url')
+                if not quality_url:
+                    continue
+                height = int_or_none(quality.get('height'))
+                formats.append({
+                    'format_id': '%s-%dp' % (ext, height) if height else ext,
+                    'url': quality_url,
+                    'width': int_or_none(quality.get('width')),
+                    'height': height,
+                })
+        self._sort_formats(formats)
+
+        thumbnails = []
+        for image in video_data.get('images', []):
+            image_url = image.get('url')
+            if not image_url:
+                continue
+            thumbnails.append({
+                'url': image_url,
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'duration': int_or_none(video_data.get('duration')),
+            'uploader_id': uploader_id,
+            'tags': video_data.get('keywords'),
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/varzesh3.py b/youtube_dl/extractor/varzesh3.py

index 9369abaf8f7bdfa2b220c39d02f9460dbab711c2..84698371a8ab2daf77faae1684141eb32425f232 100644 (file)
--- a/youtube_dl/extractor/varzesh3.py
+++ b/youtube_dl/extractor/varzesh3.py
@@ -2,11 +2,19 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlparse,
+    compat_parse_qs,
+)
+from ..utils import (
+    clean_html,
+    remove_start,
+)
  
  
  class Varzesh3IE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?video\.varzesh3\.com/(?:[^/]+/)+(?P<id>[^/]+)/?'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://video.varzesh3.com/germany/bundesliga/5-%D9%88%D8%A7%DA%A9%D9%86%D8%B4-%D8%A8%D8%B1%D8%AA%D8%B1-%D8%AF%D8%B1%D9%88%D8%A7%D8%B2%D9%87%E2%80%8C%D8%A8%D8%A7%D9%86%D8%A7%D9%86%D8%9B%D9%87%D9%81%D8%AA%D9%87-26-%D8%A8%D9%88%D9%86%D8%AF%D8%B3/',
          'md5': '2a933874cb7dce4366075281eb49e855',
          'info_dict': {
@@ -15,8 +23,19 @@ class Varzesh3IE(InfoExtractor):
              'title': '۵ واکنش برتر دروازه‌بانان؛هفته ۲۶ بوندسلیگا',
              'description': 'فصل ۲۰۱۵-۲۰۱۴',
              'thumbnail': 're:^https?://.*\.jpg$',
-        }
-    }
+        },
+        'skip': 'HTTP 404 Error',
+    }, {
+        'url': 'http://video.varzesh3.com/video/112785/%D8%AF%D9%84%D9%87-%D8%B9%D9%84%DB%8C%D8%9B-%D8%B3%D8%AA%D8%A7%D8%B1%D9%87-%D9%86%D9%88%D8%B8%D9%87%D9%88%D8%B1-%D9%84%DB%8C%DA%AF-%D8%A8%D8%B1%D8%AA%D8%B1-%D8%AC%D8%B2%DB%8C%D8%B1%D9%87',
+        'md5': '841b7cd3afbc76e61708d94e53a4a4e7',
+        'info_dict': {
+            'id': '112785',
+            'ext': 'mp4',
+            'title': 'دله علی؛ ستاره نوظهور لیگ برتر جزیره',
+            'description': 'فوتبال 120',
+        },
+        'expected_warnings': ['description'],
+    }]
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
@@ -26,15 +45,30 @@ class Varzesh3IE(InfoExtractor):
          video_url = self._search_regex(
              r'<source[^>]+src="([^"]+)"', webpage, 'video url')
  
-        title = self._og_search_title(webpage)
+        title = remove_start(self._html_search_regex(
+            r'<title>([^<]+)</title>', webpage, 'title'), 'ویدیو ورزش 3 | ')
+
          description = self._html_search_regex(
              r'(?s)<div class="matn">(.+?)</div>',
-            webpage, 'description', fatal=False)
-        thumbnail = self._og_search_thumbnail(webpage)
+            webpage, 'description', default=None)
+        if description is None:
+            description = clean_html(self._html_search_meta('description', webpage))
+
+        thumbnail = self._og_search_thumbnail(webpage, default=None)
+        if thumbnail is None:
+            fb_sharer_url = self._search_regex(
+                r'<a[^>]+href="(https?://www\.facebook\.com/sharer/sharer\.php?[^"]+)"',
+                webpage, 'facebook sharer URL', fatal=False)
+            sharer_params = compat_parse_qs(compat_urllib_parse_urlparse(fb_sharer_url).query)
+            thumbnail = sharer_params.get('p[images][0]', [None])[0]
  
          video_id = self._search_regex(
              r"<link[^>]+rel='(?:canonical|shortlink)'[^>]+href='/\?p=([^']+)'",
-            webpage, display_id, default=display_id)
+            webpage, display_id, default=None)
+        if video_id is None:
+            video_id = self._search_regex(
+                'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id',
+                default=display_id)
  
          return {
              'url': video_url,
diff --git a/youtube_dl/extractor/vbox7.py b/youtube_dl/extractor/vbox7.py

index dff1bb70281b9a1cd69d9507b963e4c622a29044..a1e0851b7424e4c73cd34b72c02f16bc1905b6ce 100644 (file)
--- a/youtube_dl/extractor/vbox7.py
+++ b/youtube_dl/extractor/vbox7.py
@@ -1,18 +1,23 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
-from ..compat import compat_urlparse
-from ..utils import (
-    ExtractorError,
-    sanitized_Request,
-    urlencode_postdata,
-)
+from ..utils import urlencode_postdata
  
  
  class Vbox7IE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vbox7\.com/play:(?P<id>[^/]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?vbox7\.com/(?:play:|emb/external\.php\?.*?\bvid=)(?P<id>[\da-fA-F]+)'
+    _TESTS = [{
+        'url': 'http://vbox7.com/play:0946fff23c',
+        'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
+        'info_dict': {
+            'id': '0946fff23c',
+            'ext': 'mp4',
+            'title': 'Борисов: Притеснен съм за бъдещето на България',
+        },
+    }, {
          'url': 'http://vbox7.com/play:249bb972c2',
          'md5': '99f65c0c9ef9b682b97313e052734c3f',
          'info_dict': {
@@ -20,43 +25,50 @@ class Vbox7IE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Смях! Чудо - чист за секунди - Скрита камера',
          },
-    }
+        'skip': 'georestricted',
+    }, {
+        'url': 'http://vbox7.com/emb/external.php?vid=a240d20f9c&autoplay=1',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            '<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)',
+            webpage)
+        if mobj:
+            return mobj.group('url')
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        # need to get the page 3 times for the correct jsSecretToken cookie
-        # which is necessary for the correct title
-        def get_session_id():
-            redirect_page = self._download_webpage(url, video_id)
-            session_id_url = self._search_regex(
-                r'var\s*url\s*=\s*\'([^\']+)\';', redirect_page,
-                'session id url')
-            self._download_webpage(
-                compat_urlparse.urljoin(url, session_id_url), video_id,
-                'Getting session id')
-
-        get_session_id()
-        get_session_id()
-
-        webpage = self._download_webpage(url, video_id,
-                                         'Downloading redirect page')
-
-        title = self._html_search_regex(r'<title>(.*)</title>',
-                                        webpage, 'title').split('/')[0].strip()
-
-        info_url = 'http://vbox7.com/play/magare.do'
-        data = urlencode_postdata({'as3': '1', 'vid': video_id})
-        info_request = sanitized_Request(info_url, data)
-        info_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        info_response = self._download_webpage(info_request, video_id, 'Downloading info webpage')
-        if info_response is None:
-            raise ExtractorError('Unable to extract the media url')
-        (final_url, thumbnail_url) = map(lambda x: x.split('=')[1], info_response.split('&'))
+        webpage = self._download_webpage(
+            'http://vbox7.com/play:%s' % video_id, video_id)
+
+        title = self._html_search_regex(
+            r'<title>(.+?)</title>', webpage, 'title').split('/')[0].strip()
+
+        video_url = self._search_regex(
+            r'src\s*:\s*(["\'])(?P<url>.+?.mp4.*?)\1',
+            webpage, 'video url', default=None, group='url')
+
+        thumbnail_url = self._og_search_thumbnail(webpage)
+
+        if not video_url:
+            info_response = self._download_webpage(
+                'http://vbox7.com/play/magare.do', video_id,
+                'Downloading info webpage',
+                data=urlencode_postdata({'as3': '1', 'vid': video_id}),
+                headers={'Content-Type': 'application/x-www-form-urlencoded'})
+            final_url, thumbnail_url = map(
+                lambda x: x.split('=')[1], info_response.split('&'))
+
+        if '/na.mp4' in video_url:
+            self.raise_geo_restricted()
  
          return {
              'id': video_id,
-            'url': final_url,
+            'url': self._proto_relative_url(video_url, 'http:'),
              'title': title,
              'thumbnail': thumbnail_url,
          }
diff --git a/youtube_dl/extractor/veoh.py b/youtube_dl/extractor/veoh.py

index 23ce0a0d1929febac87f789374d8411d7b7ddd00..0f5d6873808ed2dce5cde2e6239b6973cf809367 100644 (file)
--- a/youtube_dl/extractor/veoh.py
+++ b/youtube_dl/extractor/veoh.py
@@ -37,6 +37,7 @@ class VeohIE(InfoExtractor):
                  'uploader': 'afp-news',
                  'duration': 123,
              },
+            'skip': 'This video has been deleted.',
          },
          {
              'url': 'http://www.veoh.com/watch/v69525809F6Nc4frX',
diff --git a/youtube_dl/extractor/vessel.py b/youtube_dl/extractor/vessel.py

index 1a0ff3395598027ebd8de05a609faca987c14e9e..6b9c227db7a8a88e89b2df8efd3e067613bf605a 100644 (file)
--- a/youtube_dl/extractor/vessel.py
+++ b/youtube_dl/extractor/vessel.py
@@ -2,6 +2,7 @@
  from __future__ import unicode_literals
  
  import json
+import re
  
  from .common import InfoExtractor
  from ..utils import (
@@ -12,11 +13,11 @@ from ..utils import (
  
  
  class VesselIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vessel\.com/videos/(?P<id>[0-9a-zA-Z]+)'
+    _VALID_URL = r'https?://(?:www\.)?vessel\.com/(?:videos|embed)/(?P<id>[0-9a-zA-Z-_]+)'
      _API_URL_TEMPLATE = 'https://www.vessel.com/api/view/items/%s'
      _LOGIN_URL = 'https://www.vessel.com/api/account/login'
      _NETRC_MACHINE = 'vessel'
-    _TEST = {
+    _TESTS = [{
          'url': 'https://www.vessel.com/videos/HDN7G5UMs',
          'md5': '455cdf8beb71c6dd797fd2f3818d05c4',
          'info_dict': {
@@ -28,7 +29,22 @@ class VesselIE(InfoExtractor):
              'description': 'Did Nvidia pull out all the stops on the Titan X, or does its performance leave something to be desired?',
              'timestamp': int,
          },
-    }
+    }, {
+        'url': 'https://www.vessel.com/embed/G4U7gUJ6a?w=615&h=346',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.vessel.com/videos/F01_dsLj1',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.vessel.com/videos/RRX-sir-J',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return [url for _, url in re.findall(
+            r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?vessel\.com/embed/[0-9a-zA-Z-_]+.*?)\1',
+            webpage)]
  
      @staticmethod
      def make_json_request(url, data):
@@ -98,16 +114,24 @@ class VesselIE(InfoExtractor):
  
          formats = []
          for f in video_asset.get('sources', []):
-            if f['name'] == 'hls-index':
+            location = f.get('location')
+            if not location:
+                continue
+            name = f.get('name')
+            if name == 'hls-index':
                  formats.extend(self._extract_m3u8_formats(
-                    f['location'], video_id, ext='mp4', m3u8_id='m3u8'))
+                    location, video_id, ext='mp4',
+                    entry_protocol='m3u8_native', m3u8_id='m3u8', fatal=False))
+            elif name == 'dash-index':
+                formats.extend(self._extract_mpd_formats(
+                    location, video_id, mpd_id='dash', fatal=False))
              else:
                  formats.append({
-                    'format_id': f['name'],
+                    'format_id': name,
                      'tbr': f.get('bitrate'),
                      'height': f.get('height'),
                      'width': f.get('width'),
-                    'url': f['location'],
+                    'url': location,
                  })
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/vesti.py b/youtube_dl/extractor/vesti.py

index cb64ae0bd07cdca051eb3aa10550840a296ded85..5ab7168808b10279932ba670165bc8190d5fceb0 100644 (file)
--- a/youtube_dl/extractor/vesti.py
+++ b/youtube_dl/extractor/vesti.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
diff --git a/youtube_dl/extractor/vevo.py b/youtube_dl/extractor/vevo.py

index 147480f6465513066db58ce3cf32e194c4ff8490..783efda7d337217fe0ed86a97a5dfa0902a5b7bf 100644 (file)
--- a/youtube_dl/extractor/vevo.py
+++ b/youtube_dl/extractor/vevo.py
@@ -3,7 +3,11 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_etree_fromstring
+from ..compat import (
+    compat_etree_fromstring,
+    compat_str,
+    compat_urlparse,
+)
  from ..utils import (
      ExtractorError,
      int_or_none,
@@ -12,13 +16,22 @@ from ..utils import (
  )
  
  
-class VevoIE(InfoExtractor):
+class VevoBaseIE(InfoExtractor):
+    def _extract_json(self, webpage, video_id, item):
+        return self._parse_json(
+            self._search_regex(
+                r'window\.__INITIAL_STORE__\s*=\s*({.+?});\s*</script>',
+                webpage, 'initial store'),
+            video_id)['default'][item]
+
+
+class VevoIE(VevoBaseIE):
      '''
      Accepts urls from vevo.com or in the format 'vevo:{id}'
      (currently used by MTVIE and MySpaceIE)
      '''
      _VALID_URL = r'''(?x)
-        (?:https?://www\.vevo\.com/watch/(?:[^/]+/(?:[^/]+/)?)?|
+        (?:https?://(?:www\.)?vevo\.com/watch/(?!playlist|genre)(?:[^/]+/(?:[^/]+/)?)?|
             https?://cache\.vevo\.com/m/html/embed\.html\?video=|
             https?://videoplayer\.vevo\.com/embed/embedded\?videoId=|
             vevo:)
@@ -30,11 +43,15 @@ class VevoIE(InfoExtractor):
          'info_dict': {
              'id': 'GB1101300280',
              'ext': 'mp4',
-            'title': 'Somebody to Die For',
+            'title': 'Hurts - Somebody to Die For',
+            'timestamp': 1372057200,
              'upload_date': '20130624',
              'uploader': 'Hurts',
-            'timestamp': 1372057200,
+            'track': 'Somebody to Die For',
+            'artist': 'Hurts',
+            'genre': 'Pop',
          },
+        'expected_warnings': ['Unable to download SMIL file'],
      }, {
          'note': 'v3 SMIL format',
          'url': 'http://www.vevo.com/watch/cassadee-pope/i-wish-i-could-break-your-heart/USUV71302923',
@@ -42,23 +59,31 @@ class VevoIE(InfoExtractor):
          'info_dict': {
              'id': 'USUV71302923',
              'ext': 'mp4',
-            'title': 'I Wish I Could Break Your Heart',
+            'title': 'Cassadee Pope - I Wish I Could Break Your Heart',
+            'timestamp': 1392796919,
              'upload_date': '20140219',
              'uploader': 'Cassadee Pope',
-            'timestamp': 1392796919,
+            'track': 'I Wish I Could Break Your Heart',
+            'artist': 'Cassadee Pope',
+            'genre': 'Country',
          },
+        'expected_warnings': ['Unable to download SMIL file'],
      }, {
          'note': 'Age-limited video',
          'url': 'https://www.vevo.com/watch/justin-timberlake/tunnel-vision-explicit/USRV81300282',
          'info_dict': {
              'id': 'USRV81300282',
              'ext': 'mp4',
-            'title': 'Tunnel Vision (Explicit)',
-            'upload_date': '20130703',
+            'title': 'Justin Timberlake - Tunnel Vision (Explicit)',
              'age_limit': 18,
-            'uploader': 'Justin Timberlake',
              'timestamp': 1372888800,
+            'upload_date': '20130703',
+            'uploader': 'Justin Timberlake',
+            'track': 'Tunnel Vision (Explicit)',
+            'artist': 'Justin Timberlake',
+            'genre': 'Pop',
          },
+        'expected_warnings': ['Unable to download SMIL file'],
      }, {
          'note': 'No video_info',
          'url': 'http://www.vevo.com/watch/k-camp-1/Till-I-Die/USUV71503000',
@@ -66,12 +91,36 @@ class VevoIE(InfoExtractor):
          'info_dict': {
              'id': 'USUV71503000',
              'ext': 'mp4',
-            'title': 'Till I Die',
-            'upload_date': '20151207',
+            'title': 'K Camp - Till I Die',
              'age_limit': 18,
-            'uploader': 'K Camp',
              'timestamp': 1449468000,
+            'upload_date': '20151207',
+            'uploader': 'K Camp',
+            'track': 'Till I Die',
+            'artist': 'K Camp',
+            'genre': 'Rap/Hip-Hop',
+        },
+    }, {
+        'note': 'Only available via webpage',
+        'url': 'http://www.vevo.com/watch/GBUV71600656',
+        'md5': '67e79210613865b66a47c33baa5e37fe',
+        'info_dict': {
+            'id': 'GBUV71600656',
+            'ext': 'mp4',
+            'title': 'ABC - Viva Love',
+            'age_limit': 0,
+            'timestamp': 1461830400,
+            'upload_date': '20160428',
+            'uploader': 'ABC',
+            'track': 'Viva Love',
+            'artist': 'ABC',
+            'genre': 'Pop',
          },
+        'expected_warnings': ['Failed to download video versions info'],
+    }, {
+        # no genres available
+        'url': 'http://www.vevo.com/watch/INS171400764',
+        'only_matching': True,
      }]
      _SMIL_BASE_URL = 'http://smil.lvl3.vevo.com'
      _SOURCE_TYPES = {
@@ -140,42 +189,41 @@ class VevoIE(InfoExtractor):
              errnote='Unable to retrieve oauth token')
  
          if 'THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION' in webpage:
-            raise ExtractorError(
-                '%s said: This page is currently unavailable in your region.' % self.IE_NAME, expected=True)
+            self.raise_geo_restricted(
+                '%s said: This page is currently unavailable in your region' % self.IE_NAME)
  
          auth_info = self._parse_json(webpage, video_id)
          self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['access_token']
  
-    def _call_api(self, path, video_id, note, errnote, fatal=True):
-        return self._download_json(self._api_url_template % path, video_id, note, errnote)
+    def _call_api(self, path, *args, **kwargs):
+        return self._download_json(self._api_url_template % path, *args, **kwargs)
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          json_url = 'http://api.vevo.com/VideoService/AuthenticateVideo?isrc=%s' % video_id
          response = self._download_json(
-            json_url, video_id, 'Downloading video info', 'Unable to download info')
+            json_url, video_id, 'Downloading video info',
+            'Unable to download info', fatal=False) or {}
          video_info = response.get('video') or {}
-        video_versions = video_info.get('videoVersions')
+        artist = None
+        featured_artist = None
          uploader = None
-        timestamp = None
          view_count = None
          formats = []
  
          if not video_info:
-            if response.get('statusCode') != 909:
+            try:
+                self._initialize_api(video_id)
+            except ExtractorError:
                  ytid = response.get('errorInfo', {}).get('ytid')
                  if ytid:
                      self.report_warning(
                          'Video is geoblocked, trying with the YouTube video %s' % ytid)
                      return self.url_result(ytid, 'Youtube', ytid)
  
-                if 'statusMessage' in response:
-                    raise ExtractorError('%s said: %s' % (
-                        self.IE_NAME, response['statusMessage']), expected=True)
-                raise ExtractorError('Unable to extract videos')
+                raise
  
-            self._initialize_api(video_id)
              video_info = self._call_api(
                  'video/%s' % video_id, video_id, 'Downloading api video info',
                  'Failed to download video info')
@@ -183,12 +231,19 @@ class VevoIE(InfoExtractor):
              video_versions = self._call_api(
                  'video/%s/streams' % video_id, video_id,
                  'Downloading video versions info',
-                'Failed to download video versions info')
+                'Failed to download video versions info',
+                fatal=False)
+
+            # Some videos are only available via webpage (e.g.
+            # https://github.com/rg3/youtube-dl/issues/9366)
+            if not video_versions:
+                webpage = self._download_webpage(url, video_id)
+                video_versions = self._extract_json(webpage, video_id, 'streams')[video_id][0]
  
              timestamp = parse_iso8601(video_info.get('releaseDate'))
              artists = video_info.get('artists')
              if artists:
-                uploader = artists[0]['name']
+                artist = uploader = artists[0]['name']
              view_count = int_or_none(video_info.get('views', {}).get('total'))
  
              for video_version in video_versions:
@@ -241,7 +296,11 @@ class VevoIE(InfoExtractor):
                  scale=1000)
              artists = video_info.get('mainArtists')
              if artists:
-                uploader = artists[0]['artistName']
+                artist = uploader = artists[0]['artistName']
+
+            featured_artists = video_info.get('featuredArtists')
+            if featured_artists:
+                featured_artist = featured_artists[0]['artistName']
  
              smil_parsed = False
              for video_version in video_info['videoVersions']:
@@ -278,7 +337,15 @@ class VevoIE(InfoExtractor):
                          smil_parsed = True
          self._sort_formats(formats)
  
-        title = video_info['title']
+        track = video_info['title']
+        if featured_artist:
+            artist = '%s ft. %s' % (artist, featured_artist)
+        title = '%s - %s' % (artist, track) if artist else track
+
+        genres = video_info.get('genres')
+        genre = (
+            genres[0] if genres and isinstance(genres, list) and
+            isinstance(genres[0], compat_str) else None)
  
          is_explicit = video_info.get('isExplicit')
          if is_explicit is True:
@@ -300,4 +367,75 @@ class VevoIE(InfoExtractor):
              'duration': duration,
              'view_count': view_count,
              'age_limit': age_limit,
+            'track': track,
+            'artist': uploader,
+            'genre': genre,
          }
+
+
+class VevoPlaylistIE(VevoBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?vevo\.com/watch/(?P<kind>playlist|genre)/(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        'url': 'http://www.vevo.com/watch/playlist/dadbf4e7-b99f-4184-9670-6f0e547b6a29',
+        'info_dict': {
+            'id': 'dadbf4e7-b99f-4184-9670-6f0e547b6a29',
+            'title': 'Best-Of: Birdman',
+        },
+        'playlist_count': 10,
+    }, {
+        'url': 'http://www.vevo.com/watch/genre/rock',
+        'info_dict': {
+            'id': 'rock',
+            'title': 'Rock',
+        },
+        'playlist_count': 20,
+    }, {
+        'url': 'http://www.vevo.com/watch/playlist/dadbf4e7-b99f-4184-9670-6f0e547b6a29?index=0',
+        'md5': '32dcdfddddf9ec6917fc88ca26d36282',
+        'info_dict': {
+            'id': 'USCMV1100073',
+            'ext': 'mp4',
+            'title': 'Birdman - Y.U. MAD',
+            'timestamp': 1323417600,
+            'upload_date': '20111209',
+            'uploader': 'Birdman',
+            'track': 'Y.U. MAD',
+            'artist': 'Birdman',
+            'genre': 'Rap/Hip-Hop',
+        },
+        'expected_warnings': ['Unable to download SMIL file'],
+    }, {
+        'url': 'http://www.vevo.com/watch/genre/rock?index=0',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        playlist_id = mobj.group('id')
+        playlist_kind = mobj.group('kind')
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+        index = qs.get('index', [None])[0]
+
+        if index:
+            video_id = self._search_regex(
+                r'<meta[^>]+content=(["\'])vevo://video/(?P<id>.+?)\1[^>]*>',
+                webpage, 'video id', default=None, group='id')
+            if video_id:
+                return self.url_result('vevo:%s' % video_id, VevoIE.ie_key())
+
+        playlists = self._extract_json(webpage, playlist_id, '%ss' % playlist_kind)
+
+        playlist = (list(playlists.values())[0]
+                    if playlist_kind == 'playlist' else playlists[playlist_id])
+
+        entries = [
+            self.url_result('vevo:%s' % src, VevoIE.ie_key())
+            for src in playlist['isrcs']]
+
+        return self.playlist_result(
+            entries, playlist.get('playlistId') or playlist_id,
+            playlist.get('name'), playlist.get('description'))
diff --git a/youtube_dl/extractor/vgtv.py b/youtube_dl/extractor/vgtv.py

index b11cd254c7da9c8c780dedd2b2db120f8025c74b..3b38ac700296a2eef8c12f0b45406f54785d7684 100644 (file)
--- a/youtube_dl/extractor/vgtv.py
+++ b/youtube_dl/extractor/vgtv.py
@@ -8,6 +8,7 @@ from .xstream import XstreamIE
  from ..utils import (
      ExtractorError,
      float_or_none,
+    try_get,
  )
  
  
@@ -21,6 +22,7 @@ class VGTVIE(XstreamIE):
          'fvn.no/fvntv': 'fvntv',
          'aftenposten.no/webtv': 'aptv',
          'ap.vgtv.no/webtv': 'aptv',
+        'tv.aftonbladet.se/abtv': 'abtv',
      }
  
      _APP_NAME_TO_VENDOR = {
@@ -29,6 +31,7 @@ class VGTVIE(XstreamIE):
          'satv': 'sa',
          'fvntv': 'fvn',
          'aptv': 'ap',
+        'abtv': 'ab',
      }
  
      _VALID_URL = r'''(?x)
@@ -39,7 +42,8 @@ class VGTVIE(XstreamIE):
                      /?
                      (?:
                          \#!/(?:video|live)/|
-                        embed?.*id=
+                        embed?.*id=|
+                        articles/
                      )|
                      (?P<appname>
                          %s
@@ -129,6 +133,19 @@ class VGTVIE(XstreamIE):
              'url': 'http://ap.vgtv.no/webtv#!/video/111084/de-nye-bysyklene-lettere-bedre-gir-stoerre-hjul-og-feste-til-mobil',
              'only_matching': True,
          },
+        {
+            # geoblocked
+            'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
+            'only_matching': True,
+        },
+        {
+            'url': 'abtv:140026',
+            'only_matching': True,
+        }
      ]
  
      def _real_extract(self, url):
@@ -196,6 +213,12 @@ class VGTVIE(XstreamIE):
  
          info['formats'].extend(formats)
  
+        if not info['formats']:
+            properties = try_get(
+                data, lambda x: x['streamConfiguration']['properties'], list)
+            if properties and 'geoblocked' in properties:
+                raise self.raise_geo_restricted()
+
          self._sort_formats(info['formats'])
  
          info.update({
diff --git a/youtube_dl/extractor/vice.py b/youtube_dl/extractor/vice.py

index 46c785ae183d72207ab12500618f3eb7b765373d..8a00c8fee17ee84ae0d8d0e1e7360ca67befc8b0 100644 (file)
--- a/youtube_dl/extractor/vice.py
+++ b/youtube_dl/extractor/vice.py
@@ -1,27 +1,134 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
+import time
+import hashlib
+import json
  
+from .adobepass import AdobePassIE
  from .common import InfoExtractor
-from .ooyala import OoyalaIE
-from ..utils import ExtractorError
+from ..compat import compat_HTTPError
+from ..utils import (
+    int_or_none,
+    parse_age_limit,
+    str_or_none,
+    parse_duration,
+    ExtractorError,
+    extract_attributes,
+)
  
  
-class ViceIE(InfoExtractor):
+class ViceBaseIE(AdobePassIE):
+    def _extract_preplay_video(self, url, webpage):
+        watch_hub_data = extract_attributes(self._search_regex(
+            r'(?s)(<watch-hub\s*.+?</watch-hub>)', webpage, 'watch hub'))
+        video_id = watch_hub_data['vms-id']
+        title = watch_hub_data['video-title']
+
+        query = {}
+        is_locked = watch_hub_data.get('video-locked') == '1'
+        if is_locked:
+            resource = self._get_mvpd_resource(
+                'VICELAND', title, video_id,
+                watch_hub_data.get('video-rating'))
+            query['tvetoken'] = self._extract_mvpd_auth(url, video_id, 'VICELAND', resource)
+
+        # signature generation algorithm is reverse engineered from signatureGenerator in
+        # webpack:///../shared/~/vice-player/dist/js/vice-player.js in
+        # https://www.viceland.com/assets/common/js/web.vendor.bundle.js
+        exp = int(time.time()) + 14400
+        query.update({
+            'exp': exp,
+            'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
+        })
+
+        try:
+            host = 'www.viceland' if is_locked else self._PREPLAY_HOST
+            preplay = self._download_json('https://%s.com/en_us/preplay/%s' % (host, video_id), video_id, query=query)
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
+                error = json.loads(e.cause.read().decode())
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, error['details']), expected=True)
+            raise
+
+        video_data = preplay['video']
+        base = video_data['base']
+        uplynk_preplay_url = preplay['preplayURL']
+        episode = video_data.get('episode', {})
+        channel = video_data.get('channel', {})
+
+        subtitles = {}
+        cc_url = preplay.get('ccURL')
+        if cc_url:
+            subtitles['en'] = [{
+                'url': cc_url,
+            }]
+
+        return {
+            '_type': 'url_transparent',
+            'url': uplynk_preplay_url,
+            'id': video_id,
+            'title': title,
+            'description': base.get('body'),
+            'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
+            'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')),
+            'timestamp': int_or_none(video_data.get('created_at')),
+            'age_limit': parse_age_limit(video_data.get('video_rating')),
+            'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
+            'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),
+            'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
+            'season_number': int_or_none(watch_hub_data.get('season')),
+            'season_id': str_or_none(episode.get('season_id')),
+            'uploader': channel.get('base', {}).get('title') or watch_hub_data.get('channel-title'),
+            'uploader_id': str_or_none(channel.get('id')),
+            'subtitles': subtitles,
+            'ie_key': 'UplynkPreplay',
+        }
+
+
+class ViceIE(ViceBaseIE):
      _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?videos?/(?P<id>[^/?#&]+)'
  
      _TESTS = [{
          'url': 'http://www.vice.com/video/cowboy-capitalists-part-1',
+        'md5': 'e9d77741f9e42ba583e683cd170660f7',
          'info_dict': {
              'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov',
              'duration': 725.983,
          },
+        'add_ie': ['Ooyala'],
+    }, {
+        'url': 'http://www.vice.com/video/how-to-hack-a-car',
+        'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
+        'info_dict': {
+            'id': '3jstaBeXgAs',
+            'ext': 'mp4',
+            'title': 'How to Hack a Car: Phreaked Out (Episode 2)',
+            'description': 'md5:ee95453f7ff495db8efe14ae8bf56f30',
+            'uploader_id': 'MotherboardTV',
+            'uploader': 'Motherboard',
+            'upload_date': '20140529',
+        },
+        'add_ie': ['Youtube'],
+    }, {
+        'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
+        'md5': '',
+        'info_dict': {
+            'id': '5816510690b70e6c5fd39a56',
+            'ext': 'mp4',
+            'uploader': 'Waypoint',
+            'title': 'The Signal From Tölva',
+            'uploader_id': '57f7d621e05ca860fa9ccaf9',
+            'timestamp': 1477941983938,
+        },
          'params': {
-            # Requires ffmpeg (m3u8 manifest)
+            # m3u8 download
              'skip_download': True,
          },
+        'add_ie': ['UplynkPreplay'],
      }, {
          'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
          'only_matching': True,
@@ -32,18 +139,21 @@ class ViceIE(InfoExtractor):
          'url': 'https://munchies.vice.com/en/videos/watch-the-trailer-for-our-new-series-the-pizza-show',
          'only_matching': True,
      }]
+    _PREPLAY_HOST = 'video.vice'
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-        try:
-            embed_code = self._search_regex(
-                r'embedCode=([^&\'"]+)', webpage,
-                'ooyala embed code')
-            ooyala_url = OoyalaIE._url_for_embed_code(embed_code)
-        except ExtractorError:
-            raise ExtractorError('The page doesn\'t contain a video', expected=True)
-        return self.url_result(ooyala_url, ie='Ooyala')
+        webpage, urlh = self._download_webpage_handle(url, video_id)
+        embed_code = self._search_regex(
+            r'embedCode=([^&\'"]+)', webpage,
+            'ooyala embed code', default=None)
+        if embed_code:
+            return self.url_result('ooyala:%s' % embed_code, 'Ooyala')
+        youtube_id = self._search_regex(
+            r'data-youtube-id="([^"]+)"', webpage, 'youtube id', default=None)
+        if youtube_id:
+            return self.url_result(youtube_id, 'Youtube')
+        return self._extract_preplay_video(urlh.geturl(), webpage)
  
  
  class ViceShowIE(InfoExtractor):
diff --git a/youtube_dl/extractor/viceland.py b/youtube_dl/extractor/viceland.py

new file mode 100644 (file)

index 0000000..0eff055
--- /dev/null
+++ b/youtube_dl/extractor/viceland.py
@@ -0,0 +1,33 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .vice import ViceBaseIE
+
+
+class VicelandIE(ViceBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)'
+    _TEST = {
+        'url': 'https://www.viceland.com/en_us/video/cyberwar-trailer/57608447973ee7705f6fbd4e',
+        'info_dict': {
+            'id': '57608447973ee7705f6fbd4e',
+            'ext': 'mp4',
+            'title': 'CYBERWAR (Trailer)',
+            'description': 'Tapping into the geopolitics of hacking and surveillance, Ben Makuch travels the world to meet with hackers, government officials, and dissidents to investigate the ecosystem of cyberwarfare.',
+            'age_limit': 14,
+            'timestamp': 1466008539,
+            'upload_date': '20160615',
+            'uploader_id': '11',
+            'uploader': 'Viceland',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'add_ie': ['UplynkPreplay'],
+    }
+    _PREPLAY_HOST = 'www.viceland'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        return self._extract_preplay_video(url, webpage)
diff --git a/youtube_dl/extractor/vidbit.py b/youtube_dl/extractor/vidbit.py

new file mode 100644 (file)

index 0000000..e7ac5a8
--- /dev/null
+++ b/youtube_dl/extractor/vidbit.py
@@ -0,0 +1,84 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    int_or_none,
+    js_to_json,
+    remove_end,
+    unified_strdate,
+)
+
+
+class VidbitIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?vidbit\.co/(?:watch|embed)\?.*?\bv=(?P<id>[\da-zA-Z]+)'
+    _TESTS = [{
+        'url': 'http://www.vidbit.co/watch?v=jkL2yDOEq2',
+        'md5': '1a34b7f14defe3b8fafca9796892924d',
+        'info_dict': {
+            'id': 'jkL2yDOEq2',
+            'ext': 'mp4',
+            'title': 'Intro to VidBit',
+            'description': 'md5:5e0d6142eec00b766cbf114bfd3d16b7',
+            'thumbnail': 're:https?://.*\.jpg$',
+            'upload_date': '20160618',
+            'view_count': int,
+            'comment_count': int,
+        }
+    }, {
+        'url': 'http://www.vidbit.co/embed?v=jkL2yDOEq2&auto=0&water=0',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(
+            compat_urlparse.urljoin(url, '/watch?v=%s' % video_id), video_id)
+
+        video_url, title = [None] * 2
+
+        config = self._parse_json(self._search_regex(
+            r'(?s)\.setup\(({.+?})\);', webpage, 'setup', default='{}'),
+            video_id, transform_source=js_to_json)
+        if config:
+            if config.get('file'):
+                video_url = compat_urlparse.urljoin(url, config['file'])
+            title = config.get('title')
+
+        if not video_url:
+            video_url = compat_urlparse.urljoin(url, self._search_regex(
+                r'file\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
+                webpage, 'video URL', group='url'))
+
+        if not title:
+            title = remove_end(
+                self._html_search_regex(
+                    (r'<h1>(.+?)</h1>', r'<title>(.+?)</title>'),
+                    webpage, 'title', default=None) or self._og_search_title(webpage),
+                ' - VidBit')
+
+        description = self._html_search_meta(
+            ('description', 'og:description', 'twitter:description'),
+            webpage, 'description')
+
+        upload_date = unified_strdate(self._html_search_meta(
+            'datePublished', webpage, 'upload date'))
+
+        view_count = int_or_none(self._search_regex(
+            r'<strong>(\d+)</strong> views',
+            webpage, 'view count', fatal=False))
+        comment_count = int_or_none(self._search_regex(
+            r'id=["\']cmt_num["\'][^>]*>\((\d+)\)',
+            webpage, 'comment count', fatal=False))
+
+        return {
+            'id': video_id,
+            'url': video_url,
+            'title': title,
+            'description': description,
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'upload_date': upload_date,
+            'view_count': view_count,
+            'comment_count': comment_count,
+        }
diff --git a/youtube_dl/extractor/videodetective.py b/youtube_dl/extractor/videodetective.py

index 0ffc7ff7dc9185a3a3ec5c0fd14d302872662dda..a19411a058784fc61db3b764ec882f0b9986323f 100644 (file)
--- a/youtube_dl/extractor/videodetective.py
+++ b/youtube_dl/extractor/videodetective.py
@@ -6,7 +6,7 @@ from .internetvideoarchive import InternetVideoArchiveIE
  
  
  class VideoDetectiveIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.videodetective\.com/[^/]+/[^/]+/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?videodetective\.com/[^/]+/[^/]+/(?P<id>\d+)'
  
      _TEST = {
          'url': 'http://www.videodetective.com/movies/kick-ass-2/194487',
@@ -14,8 +14,11 @@ class VideoDetectiveIE(InfoExtractor):
              'id': '194487',
              'ext': 'mp4',
              'title': 'KICK-ASS 2',
-            'description': 'md5:65ba37ad619165afac7d432eaded6013',
-            'duration': 138,
+            'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          },
      }
  
@@ -24,4 +27,4 @@ class VideoDetectiveIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
          og_video = self._og_search_video_url(webpage)
          query = compat_urlparse.urlparse(og_video).query
-        return self.url_result(InternetVideoArchiveIE._build_url(query), ie=InternetVideoArchiveIE.ie_key())
+        return self.url_result(InternetVideoArchiveIE._build_json_url(query), ie=InternetVideoArchiveIE.ie_key())
diff --git a/youtube_dl/extractor/videomore.py b/youtube_dl/extractor/videomore.py

index 04e95c66e91eec8368d325d1086ac59dddf7656f..7f25665864c696757903deeb582a64f16eec0d85 100644 (file)
--- a/youtube_dl/extractor/videomore.py
+++ b/youtube_dl/extractor/videomore.py
@@ -6,8 +6,7 @@ import re
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
-    parse_age_limit,
-    parse_iso8601,
+    xpath_element,
      xpath_text,
  )
  
@@ -17,38 +16,32 @@ class VideomoreIE(InfoExtractor):
      _VALID_URL = r'videomore:(?P<sid>\d+)$|https?://videomore\.ru/(?:(?:embed|[^/]+/[^/]+)/|[^/]+\?.*\btrack_id=)(?P<id>\d+)(?:[/?#&]|\.(?:xml|json)|$)'
      _TESTS = [{
          'url': 'http://videomore.ru/kino_v_detalayah/5_sezon/367617',
-        'md5': '70875fbf57a1cd004709920381587185',
+        'md5': '44455a346edc0d509ac5b5a5b531dc35',
          'info_dict': {
              'id': '367617',
              'ext': 'flv',
-            'title': 'В гостях Алексей Чумаков и Юлия Ковальчук',
-            'description': 'В гостях – лучшие романтические комедии года, «Выживший» Иньярриту и «Стив Джобс» Дэнни Бойла.',
+            'title': 'Кино в деталях 5 сезон В гостях Алексей Чумаков и Юлия Ковальчук',
              'series': 'Кино в деталях',
              'episode': 'В гостях Алексей Чумаков и Юлия Ковальчук',
-            'episode_number': None,
-            'season': 'Сезон 2015',
-            'season_number': 5,
              'thumbnail': 're:^https?://.*\.jpg',
              'duration': 2910,
-            'age_limit': 16,
              'view_count': int,
+            'comment_count': int,
+            'age_limit': 16,
          },
      }, {
          'url': 'http://videomore.ru/embed/259974',
          'info_dict': {
              'id': '259974',
              'ext': 'flv',
-            'title': '80 серия',
-            'description': '«Медведей» ждет решающий матч. Макеев выясняет отношения со Стрельцовым. Парни узнают подробности прошлого Макеева.',
+            'title': 'Молодежка 2 сезон 40 серия',
              'series': 'Молодежка',
-            'episode': '80 серия',
-            'episode_number': 40,
-            'season': '2 сезон',
-            'season_number': 2,
+            'episode': '40 серия',
              'thumbnail': 're:^https?://.*\.jpg',
              'duration': 2809,
-            'age_limit': 16,
              'view_count': int,
+            'comment_count': int,
+            'age_limit': 16,
          },
          'params': {
              'skip_download': True,
@@ -58,13 +51,8 @@ class VideomoreIE(InfoExtractor):
          'info_dict': {
              'id': '341073',
              'ext': 'flv',
-            'title': 'Команда проиграла из-за Бакина?',
-            'description': 'Молодежка 3 сезон скоро',
-            'series': 'Молодежка',
+            'title': 'Промо Команда проиграла из-за Бакина?',
              'episode': 'Команда проиграла из-за Бакина?',
-            'episode_number': None,
-            'season': 'Промо',
-            'season_number': 99,
              'thumbnail': 're:^https?://.*\.jpg',
              'duration': 29,
              'age_limit': 16,
@@ -96,8 +84,13 @@ class VideomoreIE(InfoExtractor):
      @staticmethod
      def _extract_url(webpage):
          mobj = re.search(
-            r'<object[^>]+data=(["\'])https?://videomore.ru/player\.swf\?.*config=(?P<url>https?://videomore\.ru/(?:[^/]+/)+\d+\.xml).*\1',
+            r'<object[^>]+data=(["\'])https?://videomore\.ru/player\.swf\?.*config=(?P<url>https?://videomore\.ru/(?:[^/]+/)+\d+\.xml).*\1',
              webpage)
+        if not mobj:
+            mobj = re.search(
+                r'<iframe[^>]+src=([\'"])(?P<url>https?://videomore\.ru/embed/\d+)',
+                webpage)
+
          if mobj:
              return mobj.group('url')
  
@@ -109,43 +102,33 @@ class VideomoreIE(InfoExtractor):
              'http://videomore.ru/video/tracks/%s.xml' % video_id,
              video_id, 'Downloading video XML')
  
-        video_url = xpath_text(video, './/video_url', 'video url', fatal=True)
+        item = xpath_element(video, './/playlist/item', fatal=True)
+
+        title = xpath_text(
+            item, ('./title', './episode_name'), 'title', fatal=True)
+
+        video_url = xpath_text(item, './video_url', 'video url', fatal=True)
          formats = self._extract_f4m_formats(video_url, video_id, f4m_id='hds')
          self._sort_formats(formats)
  
-        data = self._download_json(
-            'http://videomore.ru/video/tracks/%s.json' % video_id,
-            video_id, 'Downloading video JSON')
-
-        title = data.get('title') or data['project_title']
-        description = data.get('description') or data.get('description_raw')
-        timestamp = parse_iso8601(data.get('published_at'))
-        duration = int_or_none(data.get('duration'))
-        view_count = int_or_none(data.get('views'))
-        age_limit = parse_age_limit(data.get('min_age'))
-        thumbnails = [{
-            'url': thumbnail,
-        } for thumbnail in data.get('big_thumbnail_urls', [])]
-
-        series = data.get('project_title')
-        episode = data.get('title')
-        episode_number = int_or_none(data.get('episode_of_season') or None)
-        season = data.get('season_title')
-        season_number = int_or_none(data.get('season_pos') or None)
+        thumbnail = xpath_text(item, './thumbnail_url')
+        duration = int_or_none(xpath_text(item, './duration'))
+        view_count = int_or_none(xpath_text(item, './views'))
+        comment_count = int_or_none(xpath_text(item, './count_comments'))
+        age_limit = int_or_none(xpath_text(item, './min_age'))
+
+        series = xpath_text(item, './project_name')
+        episode = xpath_text(item, './episode_name')
  
          return {
              'id': video_id,
              'title': title,
-            'description': description,
              'series': series,
              'episode': episode,
-            'episode_number': episode_number,
-            'season': season,
-            'season_number': season_number,
-            'thumbnails': thumbnails,
-            'timestamp': timestamp,
+            'thumbnail': thumbnail,
              'duration': duration,
              'view_count': view_count,
+            'comment_count': comment_count,
              'age_limit': age_limit,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/vidio.py b/youtube_dl/extractor/vidio.py

new file mode 100644 (file)

index 0000000..6898042
--- /dev/null
+++ b/youtube_dl/extractor/vidio.py
@@ -0,0 +1,73 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class VidioIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?vidio\.com/watch/(?P<id>\d+)-(?P<display_id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'http://www.vidio.com/watch/165683-dj_ambred-booyah-live-2015',
+        'md5': 'cd2801394afc164e9775db6a140b91fe',
+        'info_dict': {
+            'id': '165683',
+            'display_id': 'dj_ambred-booyah-live-2015',
+            'ext': 'mp4',
+            'title': 'DJ_AMBRED - Booyah (Live 2015)',
+            'description': 'md5:27dc15f819b6a78a626490881adbadf8',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 149,
+            'like_count': int,
+        },
+    }, {
+        'url': 'https://www.vidio.com/watch/77949-south-korea-test-fires-missile-that-can-strike-all-of-the-north',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id, display_id = mobj.group('id', 'display_id')
+
+        webpage = self._download_webpage(url, display_id)
+
+        title = self._og_search_title(webpage)
+
+        m3u8_url, duration, thumbnail = [None] * 3
+
+        clips = self._parse_json(
+            self._html_search_regex(
+                r'data-json-clips\s*=\s*(["\'])(?P<data>\[.+?\])\1',
+                webpage, 'video data', default='[]', group='data'),
+            display_id, fatal=False)
+        if clips:
+            clip = clips[0]
+            m3u8_url = clip.get('sources', [{}])[0].get('file')
+            duration = clip.get('clip_duration')
+            thumbnail = clip.get('image')
+
+        m3u8_url = m3u8_url or self._search_regex(
+            r'data(?:-vjs)?-clip-hls-url=(["\'])(?P<url>.+?)\1', webpage, 'hls url')
+        formats = self._extract_m3u8_formats(m3u8_url, display_id, 'mp4', entry_protocol='m3u8_native')
+
+        duration = int_or_none(duration or self._search_regex(
+            r'data-video-duration=(["\'])(?P<duartion>\d+)\1', webpage, 'duration'))
+        thumbnail = thumbnail or self._og_search_thumbnail(webpage)
+
+        like_count = int_or_none(self._search_regex(
+            (r'<span[^>]+data-comment-vote-count=["\'](\d+)',
+             r'<span[^>]+class=["\'].*?\blike(?:__|-)count\b.*?["\'][^>]*>\s*(\d+)'),
+            webpage, 'like count', fatal=False))
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': self._og_search_description(webpage),
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'like_count': like_count,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/vidzi.py b/youtube_dl/extractor/vidzi.py

index 3c78fb3d5a071a6f49dec7467e620c0b8a01ded9..9950c62ad636ee4f03389bef627da4318f019c22 100644 (file)
--- a/youtube_dl/extractor/vidzi.py
+++ b/youtube_dl/extractor/vidzi.py
@@ -1,16 +1,20 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .jwplatform import JWPlatformBaseIE
  from ..utils import (
      decode_packed_codes,
      js_to_json,
+    NO_DEFAULT,
+    PACKED_CODES_RE,
  )
  
  
  class VidziIE(JWPlatformBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?P<id>\w+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
+    _TESTS = [{
          'url': 'http://vidzi.tv/cghql9yq6emu.html',
          'md5': '4f16c71ca0c8c8635ab6932b5f3f1660',
          'info_dict': {
@@ -22,19 +26,30 @@ class VidziIE(JWPlatformBaseIE):
              # m3u8 download
              'skip_download': True,
          },
-    }
+    }, {
+        'url': 'http://vidzi.tv/embed-4z2yb0rzphe9-600x338.html',
+        'skip_download': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(
+            'http://vidzi.tv/%s' % video_id, video_id)
          title = self._html_search_regex(
              r'(?s)<h2 class="video-title">(.*?)</h2>', webpage, 'title')
  
-        code = decode_packed_codes(webpage).replace('\\\'', '\'')
-        jwplayer_data = self._parse_json(
-            self._search_regex(r'setup\(([^)]+)\)', code, 'jwplayer data'),
-            video_id, transform_source=js_to_json)
+        packed_codes = [mobj.group(0) for mobj in re.finditer(
+            PACKED_CODES_RE, webpage)]
+        for num, pc in enumerate(packed_codes, 1):
+            code = decode_packed_codes(pc).replace('\\\'', '\'')
+            jwplayer_data = self._parse_json(
+                self._search_regex(
+                    r'setup\(([^)]+)\)', code, 'jwplayer data',
+                    default=NO_DEFAULT if num == len(packed_codes) else '{}'),
+                video_id, transform_source=js_to_json)
+            if jwplayer_data:
+                break
  
          info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False)
          info_dict['title'] = title
diff --git a/youtube_dl/extractor/vier.py b/youtube_dl/extractor/vier.py

index 6645c6186dbff315e850f22ae793677803cbbf9b..d26fb49b3939728e8a962b2ad3131c71fd223366 100644 (file)
--- a/youtube_dl/extractor/vier.py
+++ b/youtube_dl/extractor/vier.py
@@ -48,8 +48,8 @@ class VierIE(InfoExtractor):
              [r'data-filename="([^"]+)"', r'"filename"\s*:\s*"([^"]+)"'],
              webpage, 'filename')
  
-        playlist_url = 'http://vod.streamcloud.be/%s/mp4:_definst_/%s.mp4/playlist.m3u8' % (application, filename)
-        formats = self._extract_m3u8_formats(playlist_url, display_id, 'mp4')
+        playlist_url = 'http://vod.streamcloud.be/%s/_definst_/mp4:%s.mp4/playlist.m3u8' % (application, filename)
+        formats = self._extract_wowza_formats(playlist_url, display_id, skip_protocols=['dash'])
          self._sort_formats(formats)
  
          title = self._og_search_title(webpage, default=display_id)
diff --git a/youtube_dl/extractor/viewlift.py b/youtube_dl/extractor/viewlift.py

new file mode 100644 (file)

index 0000000..19500eb
--- /dev/null
+++ b/youtube_dl/extractor/viewlift.py
@@ -0,0 +1,200 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    clean_html,
+    determine_ext,
+    int_or_none,
+    js_to_json,
+    parse_duration,
+)
+
+
+class ViewLiftBaseIE(InfoExtractor):
+    _DOMAINS_REGEX = '(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|monumentalsportsnetwork|vayafilm)\.com|kesari\.tv'
+
+
+class ViewLiftEmbedIE(ViewLiftBaseIE):
+    _VALID_URL = r'https?://(?:(?:www|embed)\.)?(?:%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f-]{36})' % ViewLiftBaseIE._DOMAINS_REGEX
+    _TESTS = [{
+        'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
+        'md5': '2924e9215c6eff7a55ed35b72276bd93',
+        'info_dict': {
+            'id': '74849a00-85a9-11e1-9660-123139220831',
+            'ext': 'mp4',
+            'title': '#whilewewatch',
+        }
+    }, {
+        # invalid labels, 360p is better that 480p
+        'url': 'http://www.snagfilms.com/embed/player?filmId=17ca0950-a74a-11e0-a92a-0026bb61d036',
+        'md5': '882fca19b9eb27ef865efeeaed376a48',
+        'info_dict': {
+            'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
+            'ext': 'mp4',
+            'title': 'Life in Limbo',
+        }
+    }, {
+        'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:embed\.)?(?:%s)/embed/player.+?)\1' % ViewLiftBaseIE._DOMAINS_REGEX,
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        if '>This film is not playable in your area.<' in webpage:
+            raise ExtractorError(
+                'Film %s is not playable in your area.' % video_id, expected=True)
+
+        formats = []
+        has_bitrate = False
+        for source in self._parse_json(js_to_json(self._search_regex(
+                r'(?s)sources:\s*(\[.+?\]),', webpage, 'json')), video_id):
+            file_ = source.get('file')
+            if not file_:
+                continue
+            type_ = source.get('type')
+            ext = determine_ext(file_)
+            format_id = source.get('label') or ext
+            if all(v == 'm3u8' or v == 'hls' for v in (type_, ext)):
+                formats.extend(self._extract_m3u8_formats(
+                    file_, video_id, 'mp4', m3u8_id='hls'))
+            else:
+                bitrate = int_or_none(self._search_regex(
+                    [r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext],
+                    file_, 'bitrate', default=None))
+                if not has_bitrate and bitrate:
+                    has_bitrate = True
+                height = int_or_none(self._search_regex(
+                    r'^(\d+)[pP]$', format_id, 'height', default=None))
+                formats.append({
+                    'url': file_,
+                    'format_id': 'http-%s%s' % (format_id, ('-%dk' % bitrate if bitrate else '')),
+                    'tbr': bitrate,
+                    'height': height,
+                })
+        field_preference = None if has_bitrate else ('height', 'tbr', 'format_id')
+        self._sort_formats(formats, field_preference)
+
+        title = self._search_regex(
+            [r"title\s*:\s*'([^']+)'", r'<title>([^<]+)</title>'],
+            webpage, 'title')
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+        }
+
+
+class ViewLiftIE(ViewLiftBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)/(?:films/title|show|(?:news/)?videos?)/(?P<id>[^?#]+)' % ViewLiftBaseIE._DOMAINS_REGEX
+    _TESTS = [{
+        'url': 'http://www.snagfilms.com/films/title/lost_for_life',
+        'md5': '19844f897b35af219773fd63bdec2942',
+        'info_dict': {
+            'id': '0000014c-de2f-d5d6-abcf-ffef58af0017',
+            'display_id': 'lost_for_life',
+            'ext': 'mp4',
+            'title': 'Lost for Life',
+            'description': 'md5:fbdacc8bb6b455e464aaf98bc02e1c82',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 4489,
+            'categories': ['Documentary', 'Crime', 'Award Winning', 'Festivals']
+        }
+    }, {
+        'url': 'http://www.snagfilms.com/show/the_world_cut_project/india',
+        'md5': 'e6292e5b837642bbda82d7f8bf3fbdfd',
+        'info_dict': {
+            'id': '00000145-d75c-d96e-a9c7-ff5c67b20000',
+            'display_id': 'the_world_cut_project/india',
+            'ext': 'mp4',
+            'title': 'India',
+            'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 979,
+            'categories': ['Documentary', 'Sports', 'Politics']
+        }
+    }, {
+        # Film is not playable in your area.
+        'url': 'http://www.snagfilms.com/films/title/inside_mecca',
+        'only_matching': True,
+    }, {
+        # Film is not available.
+        'url': 'http://www.snagfilms.com/show/augie_alone/flirting',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.winnersview.com/videos/the-good-son',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.kesari.tv/news/video/1461919076414',
+        'only_matching': True,
+    }, {
+        # Was once Kaltura embed
+        'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        domain, display_id = re.match(self._VALID_URL, url).groups()
+
+        webpage = self._download_webpage(url, display_id)
+
+        if ">Sorry, the Film you're looking for is not available.<" in webpage:
+            raise ExtractorError(
+                'Film %s is not available.' % display_id, expected=True)
+
+        film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id')
+
+        snag = self._parse_json(
+            self._search_regex(
+                'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'),
+            display_id)
+
+        for item in snag:
+            if item.get('data', {}).get('film', {}).get('id') == film_id:
+                data = item['data']['film']
+                title = data['title']
+                description = clean_html(data.get('synopsis'))
+                thumbnail = data.get('image')
+                duration = int_or_none(data.get('duration') or data.get('runtime'))
+                categories = [
+                    category['title'] for category in data.get('categories', [])
+                    if category.get('title')]
+                break
+        else:
+            title = self._search_regex(
+                r'itemprop="title">([^<]+)<', webpage, 'title')
+            description = self._html_search_regex(
+                r'(?s)<div itemprop="description" class="film-synopsis-inner ">(.+?)</div>',
+                webpage, 'description', default=None) or self._og_search_description(webpage)
+            thumbnail = self._og_search_thumbnail(webpage)
+            duration = parse_duration(self._search_regex(
+                r'<span itemprop="duration" class="film-duration strong">([^<]+)<',
+                webpage, 'duration', fatal=False))
+            categories = re.findall(r'<a href="/movies/[^"]+">([^<]+)</a>', webpage)
+
+        return {
+            '_type': 'url_transparent',
+            'url': 'http://%s/embed/player?filmId=%s' % (domain, film_id),
+            'id': film_id,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'categories': categories,
+            'ie_key': 'ViewLiftEmbed',
+        }
diff --git a/youtube_dl/extractor/viewster.py b/youtube_dl/extractor/viewster.py

index fe94a479339035dfd9a70386e903e5c3770fdfea..a93196a0772fd5588dd2f55c327427d00e814eb4 100644 (file)
--- a/youtube_dl/extractor/viewster.py
+++ b/youtube_dl/extractor/viewster.py
@@ -1,10 +1,11 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
-    compat_urllib_parse,
      compat_urllib_parse_unquote,
  )
  from ..utils import (
@@ -14,6 +15,7 @@ from ..utils import (
      parse_iso8601,
      sanitized_Request,
      HEADRequest,
+    url_basename,
  )
  
  
@@ -75,11 +77,11 @@ class ViewsterIE(InfoExtractor):
  
      _ACCEPT_HEADER = 'application/json, text/javascript, */*; q=0.01'
  
-    def _download_json(self, url, video_id, note='Downloading JSON metadata', fatal=True):
+    def _download_json(self, url, video_id, note='Downloading JSON metadata', fatal=True, query={}):
          request = sanitized_Request(url)
          request.add_header('Accept', self._ACCEPT_HEADER)
          request.add_header('Auth-token', self._AUTH_TOKEN)
-        return super(ViewsterIE, self)._download_json(request, video_id, note, fatal=fatal)
+        return super(ViewsterIE, self)._download_json(request, video_id, note, fatal=fatal, query=query)
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -114,43 +116,85 @@ class ViewsterIE(InfoExtractor):
              return self.playlist_result(entries, video_id, title, description)
  
          formats = []
-        for media_type in ('application/f4m+xml', 'application/x-mpegURL', 'video/mp4'):
-            media = self._download_json(
-                'https://public-api.viewster.com/movies/%s/video?mediaType=%s'
-                % (entry_id, compat_urllib_parse.quote(media_type)),
-                video_id, 'Downloading %s JSON' % media_type, fatal=False)
-            if not media:
-                continue
-            video_url = media.get('Uri')
-            if not video_url:
-                continue
-            ext = determine_ext(video_url)
-            if ext == 'f4m':
-                video_url += '&' if '?' in video_url else '?'
-                video_url += 'hdcore=3.2.0&plugin=flowplayer-3.2.0.1'
-                formats.extend(self._extract_f4m_formats(
-                    video_url, video_id, f4m_id='hds'))
-            elif ext == 'm3u8':
-                m3u8_formats = self._extract_m3u8_formats(
-                    video_url, video_id, 'mp4', m3u8_id='hls',
-                    fatal=False)  # m3u8 sometimes fail
-                if m3u8_formats:
-                    formats.extend(m3u8_formats)
-            else:
-                format_id = media.get('Bitrate')
-                f = {
-                    'url': video_url,
-                    'format_id': 'mp4-%s' % format_id,
-                    'height': int_or_none(media.get('Height')),
-                    'width': int_or_none(media.get('Width')),
-                    'preference': 1,
-                }
-                if format_id and not f['height']:
-                    f['height'] = int_or_none(self._search_regex(
-                        r'^(\d+)[pP]$', format_id, 'height', default=None))
-                formats.append(f)
-
-        if not formats and not info.get('LanguageSets') and not info.get('VODSettings'):
+        for language_set in info.get('LanguageSets', []):
+            manifest_url = None
+            m3u8_formats = []
+            audio = language_set.get('Audio') or ''
+            subtitle = language_set.get('Subtitle') or ''
+            base_format_id = audio
+            if subtitle:
+                base_format_id += '-%s' % subtitle
+
+            def concat(suffix, sep='-'):
+                return (base_format_id + '%s%s' % (sep, suffix)) if base_format_id else suffix
+
+            for media_type in ('application/f4m+xml', 'application/x-mpegURL', 'video/mp4'):
+                media = self._download_json(
+                    'https://public-api.viewster.com/movies/%s/video' % entry_id,
+                    video_id, 'Downloading %s JSON' % concat(media_type, ' '), fatal=False, query={
+                        'mediaType': media_type,
+                        'language': audio,
+                        'subtitle': subtitle,
+                    })
+                if not media:
+                    continue
+                video_url = media.get('Uri')
+                if not video_url:
+                    continue
+                ext = determine_ext(video_url)
+                if ext == 'f4m':
+                    manifest_url = video_url
+                    video_url += '&' if '?' in video_url else '?'
+                    video_url += 'hdcore=3.2.0&plugin=flowplayer-3.2.0.1'
+                    formats.extend(self._extract_f4m_formats(
+                        video_url, video_id, f4m_id=concat('hds')))
+                elif ext == 'm3u8':
+                    manifest_url = video_url
+                    m3u8_formats = self._extract_m3u8_formats(
+                        video_url, video_id, 'mp4', m3u8_id=concat('hls'),
+                        fatal=False)  # m3u8 sometimes fail
+                    if m3u8_formats:
+                        formats.extend(m3u8_formats)
+                else:
+                    qualities_basename = self._search_regex(
+                        '/([^/]+)\.csmil/',
+                        manifest_url, 'qualities basename', default=None)
+                    if not qualities_basename:
+                        continue
+                    QUALITIES_RE = r'((,\d+k)+,?)'
+                    qualities = self._search_regex(
+                        QUALITIES_RE, qualities_basename,
+                        'qualities', default=None)
+                    if not qualities:
+                        continue
+                    qualities = list(map(lambda q: int(q[:-1]), qualities.strip(',').split(',')))
+                    qualities.sort()
+                    http_template = re.sub(QUALITIES_RE, r'%dk', qualities_basename)
+                    http_url_basename = url_basename(video_url)
+                    if m3u8_formats:
+                        self._sort_formats(m3u8_formats)
+                        m3u8_formats = list(filter(
+                            lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
+                            m3u8_formats))
+                    if len(qualities) == len(m3u8_formats):
+                        for q, m3u8_format in zip(qualities, m3u8_formats):
+                            f = m3u8_format.copy()
+                            f.update({
+                                'url': video_url.replace(http_url_basename, http_template % q),
+                                'format_id': f['format_id'].replace('hls', 'http'),
+                                'protocol': 'http',
+                            })
+                            formats.append(f)
+                    else:
+                        for q in qualities:
+                            formats.append({
+                                'url': video_url.replace(http_url_basename, http_template % q),
+                                'ext': 'mp4',
+                                'format_id': 'http-%d' % q,
+                                'tbr': q,
+                            })
+
+        if not formats and not info.get('VODSettings'):
              self.raise_geo_restricted()
  
          self._sort_formats(formats)
diff --git a/youtube_dl/extractor/viki.py b/youtube_dl/extractor/viki.py

index e04b814c8cf27755bfe0a86af3d5bf43262bd0da..4351ac4571935fa3c3ace915c0b97f20e67ec18d 100644 (file)
--- a/youtube_dl/extractor/viki.py
+++ b/youtube_dl/extractor/viki.py
@@ -101,10 +101,13 @@ class VikiBaseIE(InfoExtractor):
              self.report_warning('Unable to get session token, login has probably failed')
  
      @staticmethod
-    def dict_selection(dict_obj, preferred_key):
+    def dict_selection(dict_obj, preferred_key, allow_fallback=True):
          if preferred_key in dict_obj:
              return dict_obj.get(preferred_key)
  
+        if not allow_fallback:
+            return
+
          filtered_dict = list(filter(None, [dict_obj.get(k) for k in dict_obj.keys()]))
          return filtered_dict[0] if filtered_dict else None
  
@@ -153,20 +156,17 @@ class VikiIE(VikiBaseIE):
              'like_count': int,
              'age_limit': 13,
          },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        }
+        'skip': 'Blocked in the US',
      }, {
          # episode
          'url': 'http://www.viki.com/videos/44699v-boys-over-flowers-episode-1',
-        'md5': '190f3ef426005ba3a080a63325955bc3',
+        'md5': '5fa476a902e902783ac7a4d615cdbc7a',
          'info_dict': {
              'id': '44699v',
              'ext': 'mp4',
              'title': 'Boys Over Flowers - Episode 1',
-            'description': 'md5:52617e4f729c7d03bfd4bcbbb6e946f2',
-            'duration': 4155,
+            'description': 'md5:b89cf50038b480b88b5b3c93589a9076',
+            'duration': 4204,
              'timestamp': 1270496524,
              'upload_date': '20100405',
              'uploader': 'group8',
@@ -217,7 +217,7 @@ class VikiIE(VikiBaseIE):
  
          self._check_errors(video)
  
-        title = self.dict_selection(video.get('titles', {}), 'en')
+        title = self.dict_selection(video.get('titles', {}), 'en', allow_fallback=False)
          if not title:
              title = 'Episode %d' % video.get('number') if video.get('type') == 'episode' else video.get('id') or video_id
              container_titles = video.get('container', {}).get('titles', {})
@@ -277,9 +277,16 @@ class VikiIE(VikiBaseIE):
                  r'^(\d+)[pP]$', format_id, 'height', default=None))
              for protocol, format_dict in stream_dict.items():
                  if format_id == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        format_dict['url'], video_id, 'mp4', 'm3u8_native',
-                        m3u8_id='m3u8-%s' % protocol, fatal=False))
+                    m3u8_formats = self._extract_m3u8_formats(
+                        format_dict['url'], video_id, 'mp4',
+                        entry_protocol='m3u8_native', preference=-1,
+                        m3u8_id='m3u8-%s' % protocol, fatal=False)
+                    # Despite CODECS metadata in m3u8 all video-only formats
+                    # are actually video+audio
+                    for f in m3u8_formats:
+                        if f.get('acodec') == 'none' and f.get('vcodec') != 'none':
+                            f['acodec'] = None
+                    formats.extend(m3u8_formats)
                  else:
                      formats.append({
                          'url': format_dict['url'],
@@ -302,7 +309,7 @@ class VikiChannelIE(VikiBaseIE):
              'title': 'Boys Over Flowers',
              'description': 'md5:ecd3cff47967fe193cff37c0bec52790',
          },
-        'playlist_count': 70,
+        'playlist_mincount': 71,
      }, {
          'url': 'http://www.viki.com/tv/1354c-poor-nastya-complete',
          'info_dict': {
diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py

index 707a5735ad5463fec1d6996db4fd0b381a9205bf..51c69a80c216889315a4c5fe070572100c13dd36 100644 (file)
--- a/youtube_dl/extractor/vimeo.py
+++ b/youtube_dl/extractor/vimeo.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import json
@@ -8,6 +8,7 @@ import itertools
  from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
+    compat_str,
      compat_urlparse,
  )
  from ..utils import (
@@ -15,6 +16,7 @@ from ..utils import (
      ExtractorError,
      InAdvancePagedList,
      int_or_none,
+    NO_DEFAULT,
      RegexNotFoundError,
      sanitized_Request,
      smuggle_url,
@@ -24,6 +26,7 @@ from ..utils import (
      urlencode_postdata,
      unescapeHTML,
      parse_filesize,
+    try_get,
  )
  
  
@@ -54,6 +57,26 @@ class VimeoBaseInfoExtractor(InfoExtractor):
          self._set_vimeo_cookie('vuid', vuid)
          self._download_webpage(login_request, None, False, 'Wrong login info')
  
+    def _verify_video_password(self, url, video_id, webpage):
+        password = self._downloader.params.get('videopassword')
+        if password is None:
+            raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
+        token, vuid = self._extract_xsrft_and_vuid(webpage)
+        data = urlencode_postdata({
+            'password': password,
+            'token': token,
+        })
+        if url.startswith('http://'):
+            # vimeo only supports https now, but the user can give an http url
+            url = url.replace('http://', 'https://')
+        password_request = sanitized_Request(url + '/password', data)
+        password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+        password_request.add_header('Referer', url)
+        self._set_vimeo_cookie('vuid', vuid)
+        return self._download_webpage(
+            password_request, video_id,
+            'Verifying the password', 'Wrong password')
+
      def _extract_xsrft_and_vuid(self, webpage):
          xsrft = self._search_regex(
              r'(?:(?P<q1>["\'])xsrft(?P=q1)\s*:|xsrft\s*[=:])\s*(?P<q>["\'])(?P<xsrft>.+?)(?P=q)',
@@ -66,6 +89,69 @@ class VimeoBaseInfoExtractor(InfoExtractor):
      def _set_vimeo_cookie(self, name, value):
          self._set_cookie('vimeo.com', name, value)
  
+    def _vimeo_sort_formats(self, formats):
+        # Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
+        # at the same time without actual units specified. This lead to wrong sorting.
+        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
+
+    def _parse_config(self, config, video_id):
+        # Extract title
+        video_title = config['video']['title']
+
+        # Extract uploader, uploader_url and uploader_id
+        video_uploader = config['video'].get('owner', {}).get('name')
+        video_uploader_url = config['video'].get('owner', {}).get('url')
+        video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
+
+        # Extract video thumbnail
+        video_thumbnail = config['video'].get('thumbnail')
+        if video_thumbnail is None:
+            video_thumbs = config['video'].get('thumbs')
+            if video_thumbs and isinstance(video_thumbs, dict):
+                _, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
+
+        # Extract video duration
+        video_duration = int_or_none(config['video'].get('duration'))
+
+        formats = []
+        config_files = config['video'].get('files') or config['request'].get('files', {})
+        for f in config_files.get('progressive', []):
+            video_url = f.get('url')
+            if not video_url:
+                continue
+            formats.append({
+                'url': video_url,
+                'format_id': 'http-%s' % f.get('quality'),
+                'width': int_or_none(f.get('width')),
+                'height': int_or_none(f.get('height')),
+                'fps': int_or_none(f.get('fps')),
+                'tbr': int_or_none(f.get('bitrate')),
+            })
+        m3u8_url = config_files.get('hls', {}).get('url')
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+
+        subtitles = {}
+        text_tracks = config['request'].get('text_tracks')
+        if text_tracks:
+            for tt in text_tracks:
+                subtitles[tt['lang']] = [{
+                    'ext': 'vtt',
+                    'url': 'https://vimeo.com' + tt['url'],
+                }]
+
+        return {
+            'title': video_title,
+            'uploader': video_uploader,
+            'uploader_id': video_uploader_id,
+            'uploader_url': video_uploader_url,
+            'thumbnail': video_thumbnail,
+            'duration': video_duration,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
+
  
  class VimeoIE(VimeoBaseInfoExtractor):
      """Information extractor for vimeo.com."""
@@ -81,7 +167,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                              \.
                          )?
                          vimeo(?P<pro>pro)?\.com/
-                        (?!channels/[^/?#]+/?(?:$|[?#])|(?:album|ondemand)/)
+                        (?!(?:channels|album)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
                          (?:.*?/)?
                          (?:
                              (?:
@@ -90,6 +176,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                              )?
                          (?:videos?/)?
                          (?P<id>[0-9]+)
+                        (?:/[\da-f]+)?
                          /?(?:[?&].*)?(?:[#].*)?$
                      '''
      IE_NAME = 'vimeo'
@@ -152,7 +239,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'uploader_id': 'user18948128',
                  'uploader': 'Jaime Marquínez Ferrándiz',
                  'duration': 10,
-                'description': 'This is "youtube-dl password protected test video" by Jaime Marquínez Ferrándiz on Vimeo, the home for high quality videos and the people\u2026',
+                'description': 'This is "youtube-dl password protected test video" by  on Vimeo, the home for high quality videos and the people who love them.',
              },
              'params': {
                  'videopassword': 'youtube-dl',
@@ -161,8 +248,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
          {
              'url': 'http://vimeo.com/channels/keypeele/75629013',
              'md5': '2f86a05afe9d7abc0b9126d229bbe15d',
-            'note': 'Video is freely available via original URL '
-                    'and protected with password when accessed via http://vimeo.com/75629013',
              'info_dict': {
                  'id': '75629013',
                  'ext': 'mp4',
@@ -206,7 +291,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
          {
              # contains original format
              'url': 'https://vimeo.com/33951933',
-            'md5': '53c688fa95a55bf4b7293d37a89c5c53',
+            'md5': '2d9f5475e0537f013d0073e812ab89e6',
              'info_dict': {
                  'id': '33951933',
                  'ext': 'mp4',
@@ -218,6 +303,45 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'description': 'md5:ae23671e82d05415868f7ad1aec21147',
              },
          },
+        {
+            # only available via https://vimeo.com/channels/tributes/6213729 and
+            # not via https://vimeo.com/6213729
+            'url': 'https://vimeo.com/channels/tributes/6213729',
+            'info_dict': {
+                'id': '6213729',
+                'ext': 'mp4',
+                'title': 'Vimeo Tribute: The Shining',
+                'uploader': 'Casey Donahue',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/caseydonahue',
+                'uploader_id': 'caseydonahue',
+                'upload_date': '20090821',
+                'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6',
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'expected_warnings': ['Unable to download JSON metadata'],
+        },
+        {
+            # redirects to ondemand extractor and should be passed throught it
+            # for successful extraction
+            'url': 'https://vimeo.com/73445910',
+            'info_dict': {
+                'id': '73445910',
+                'ext': 'mp4',
+                'title': 'The Reluctant Revolutionary',
+                'uploader': '10Ft Films',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/tenfootfilms',
+                'uploader_id': 'tenfootfilms',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        {
+            'url': 'http://vimeo.com/moogaloop.swf?clip_id=2539741',
+            'only_matching': True,
+        },
          {
              'url': 'https://vimeo.com/109815029',
              'note': 'Video not completely processed, "failed" seed status',
@@ -227,47 +351,48 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'url': 'https://vimeo.com/groups/travelhd/videos/22439234',
              'only_matching': True,
          },
+        {
+            'url': 'https://vimeo.com/album/2632481/video/79010983',
+            'only_matching': True,
+        },
          {
              # source file returns 403: Forbidden
              'url': 'https://vimeo.com/7809605',
              'only_matching': True,
          },
+        {
+            'url': 'https://vimeo.com/160743502/abd0e13fb4',
+            'only_matching': True,
+        }
      ]
  
      @staticmethod
-    def _extract_vimeo_url(url, webpage):
+    def _smuggle_referrer(url, referrer_url):
+        return smuggle_url(url, {'http_headers': {'Referer': referrer_url}})
+
+    @staticmethod
+    def _extract_urls(url, webpage):
+        urls = []
          # Look for embedded (iframe) Vimeo player
-        mobj = re.search(
-            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage)
-        if mobj:
-            player_url = unescapeHTML(mobj.group('url'))
-            surl = smuggle_url(player_url, {'http_headers': {'Referer': url}})
-            return surl
-        # Look for embedded (swf embed) Vimeo player
-        mobj = re.search(
-            r'<embed[^>]+?src="((?:https?:)?//(?:www\.)?vimeo\.com/moogaloop\.swf.+?)"', webpage)
-        if mobj:
-            return mobj.group(1)
+        for mobj in re.finditer(
+                r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1',
+                webpage):
+            urls.append(VimeoIE._smuggle_referrer(unescapeHTML(mobj.group('url')), url))
+        PLAIN_EMBED_RE = (
+            # Look for embedded (swf embed) Vimeo player
+            r'<embed[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vimeo\.com/moogaloop\.swf.+?)\1',
+            # Look more for non-standard embedded Vimeo player
+            r'<video[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vimeo\.com/[0-9]+)\1',
+        )
+        for embed_re in PLAIN_EMBED_RE:
+            for mobj in re.finditer(embed_re, webpage):
+                urls.append(mobj.group('url'))
+        return urls
  
-    def _verify_video_password(self, url, video_id, webpage):
-        password = self._downloader.params.get('videopassword')
-        if password is None:
-            raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
-        token, vuid = self._extract_xsrft_and_vuid(webpage)
-        data = urlencode_postdata({
-            'password': password,
-            'token': token,
-        })
-        if url.startswith('http://'):
-            # vimeo only supports https now, but the user can give an http url
-            url = url.replace('http://', 'https://')
-        password_request = sanitized_Request(url + '/password', data)
-        password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        password_request.add_header('Referer', url)
-        self._set_vimeo_cookie('vuid', vuid)
-        return self._download_webpage(
-            password_request, video_id,
-            'Verifying the password', 'Wrong password')
+    @staticmethod
+    def _extract_url(url, webpage):
+        urls = VimeoIE._extract_urls(url, webpage)
+        return urls[0] if urls else None
  
      def _verify_player_video_password(self, url, video_id):
          password = self._downloader.params.get('videopassword')
@@ -277,10 +402,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
          pass_url = url + '/check-password'
          password_request = sanitized_Request(pass_url, data)
          password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+        password_request.add_header('Referer', url)
          return self._download_json(
              password_request, video_id,
-            'Verifying the password',
-            'Wrong password')
+            'Verifying the password', 'Wrong password')
  
      def _real_initialize(self):
          self._login()
@@ -299,13 +424,18 @@ class VimeoIE(VimeoBaseInfoExtractor):
          orig_url = url
          if mobj.group('pro') or mobj.group('player'):
              url = 'https://player.vimeo.com/video/' + video_id
-        else:
+        elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
              url = 'https://vimeo.com/' + video_id
  
          # Retrieve video webpage to extract further information
          request = sanitized_Request(url, headers=headers)
          try:
-            webpage = self._download_webpage(request, video_id)
+            webpage, urlh = self._download_webpage_handle(request, video_id)
+            # Some URLs redirect to ondemand can't be extracted with
+            # this extractor right away thus should be passed through
+            # ondemand extractor (e.g. https://vimeo.com/73445910)
+            if VimeoOndemandIE.suitable(urlh.geturl()):
+                return self.url_result(urlh.geturl(), VimeoOndemandIE.ie_key())
          except ExtractorError as ee:
              if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
                  errmsg = ee.cause.read()
@@ -377,28 +507,24 @@ class VimeoIE(VimeoBaseInfoExtractor):
              if config.get('view') == 4:
                  config = self._verify_player_video_password(url, video_id)
  
-        if '>You rented this title.<' in webpage:
+        def is_rented():
+            if '>You rented this title.<' in webpage:
+                return True
+            if config.get('user', {}).get('purchased'):
+                return True
+            label = try_get(
+                config, lambda x: x['video']['vod']['purchase_options'][0]['label_string'], compat_str)
+            if label and label.startswith('You rented this'):
+                return True
+            return False
+
+        if is_rented():
              feature_id = config.get('video', {}).get('vod', {}).get('feature_id')
              if feature_id and not data.get('force_feature_id', False):
                  return self.url_result(smuggle_url(
                      'https://player.vimeo.com/player/%s' % feature_id,
                      {'force_feature_id': True}), 'Vimeo')
  
-        # Extract title
-        video_title = config['video']['title']
-
-        # Extract uploader, uploader_url and uploader_id
-        video_uploader = config['video'].get('owner', {}).get('name')
-        video_uploader_url = config['video'].get('owner', {}).get('url')
-        video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
-
-        # Extract video thumbnail
-        video_thumbnail = config['video'].get('thumbnail')
-        if video_thumbnail is None:
-            video_thumbs = config['video'].get('thumbs')
-            if video_thumbs and isinstance(video_thumbs, dict):
-                _, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
-
          # Extract video description
  
          video_description = self._html_search_regex(
@@ -418,9 +544,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
          if not video_description and not mobj.group('player'):
              self._downloader.report_warning('Cannot find video description')
  
-        # Extract video duration
-        video_duration = int_or_none(config['video'].get('duration'))
-
          # Extract upload date
          video_upload_date = None
          mobj = re.search(r'<time[^>]+datetime="([^"]+)"', webpage)
@@ -458,53 +581,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
                              'format_id': source_name,
                              'preference': 1,
                          })
-        config_files = config['video'].get('files') or config['request'].get('files', {})
-        for f in config_files.get('progressive', []):
-            video_url = f.get('url')
-            if not video_url:
-                continue
-            formats.append({
-                'url': video_url,
-                'format_id': 'http-%s' % f.get('quality'),
-                'width': int_or_none(f.get('width')),
-                'height': int_or_none(f.get('height')),
-                'fps': int_or_none(f.get('fps')),
-                'tbr': int_or_none(f.get('bitrate')),
-            })
-        m3u8_url = config_files.get('hls', {}).get('url')
-        if m3u8_url:
-            formats.extend(self._extract_m3u8_formats(
-                m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
-        # Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
-        # at the same time without actual units specified. This lead to wrong sorting.
-        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
  
-        subtitles = {}
-        text_tracks = config['request'].get('text_tracks')
-        if text_tracks:
-            for tt in text_tracks:
-                subtitles[tt['lang']] = [{
-                    'ext': 'vtt',
-                    'url': 'https://vimeo.com' + tt['url'],
-                }]
-
-        return {
+        info_dict = self._parse_config(config, video_id)
+        formats.extend(info_dict['formats'])
+        self._vimeo_sort_formats(formats)
+        info_dict.update({
              'id': video_id,
-            'uploader': video_uploader,
-            'uploader_url': video_uploader_url,
-            'uploader_id': video_uploader_id,
+            'formats': formats,
              'upload_date': video_upload_date,
-            'title': video_title,
-            'thumbnail': video_thumbnail,
              'description': video_description,
-            'duration': video_duration,
-            'formats': formats,
              'webpage_url': url,
              'view_count': view_count,
              'like_count': like_count,
              'comment_count': comment_count,
-            'subtitles': subtitles,
-        }
+        })
+
+        return info_dict
  
  
  class VimeoOndemandIE(VimeoBaseInfoExtractor):
@@ -522,6 +614,20 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor):
              'uploader_url': 're:https?://(?:www\.)?vimeo\.com/gumfilms',
              'uploader_id': 'gumfilms',
          },
+    }, {
+        # requires Referer to be passed along with og:video:url
+        'url': 'https://vimeo.com/ondemand/36938/126682985',
+        'info_dict': {
+            'id': '126682985',
+            'ext': 'mp4',
+            'title': 'Rävlock, rätt läte på rätt plats',
+            'uploader': 'Lindroth & Norin',
+            'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user14430847',
+            'uploader_id': 'user14430847',
+        },
+        'params': {
+            'skip_download': True,
+        },
      }, {
          'url': 'https://vimeo.com/ondemand/nazmaalik',
          'only_matching': True,
@@ -536,7 +642,12 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
-        return self.url_result(self._og_search_video_url(webpage), VimeoIE.ie_key())
+        return self.url_result(
+            # Some videos require Referer to be passed along with og:video:url
+            # similarly to generic vimeo embeds (e.g.
+            # https://vimeo.com/ondemand/36938/126682985).
+            VimeoIE._smuggle_referrer(self._og_search_video_url(webpage), url),
+            VimeoIE.ie_key())
  
  
  class VimeoChannelIE(VimeoBaseInfoExtractor):
@@ -598,8 +709,21 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
                  webpage = self._login_list_password(page_url, list_id, webpage)
                  yield self._extract_list_title(webpage)
  
-            for video_id in re.findall(r'id="clip_(\d+?)"', webpage):
-                yield self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo')
+            # Try extracting href first since not all videos are available via
+            # short https://vimeo.com/id URL (e.g. https://vimeo.com/channels/tributes/6213729)
+            clips = re.findall(
+                r'id="clip_(\d+)"[^>]*>\s*<a[^>]+href="(/(?:[^/]+/)*\1)', webpage)
+            if clips:
+                for video_id, video_url in clips:
+                    yield self.url_result(
+                        compat_urlparse.urljoin(base_url, video_url),
+                        VimeoIE.ie_key(), video_id=video_id)
+            # More relaxed fallback
+            else:
+                for video_id in re.findall(r'id=["\']clip_(\d+)', webpage):
+                    yield self.url_result(
+                        'https://vimeo.com/%s' % video_id,
+                        VimeoIE.ie_key(), video_id=video_id)
  
              if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
                  break
@@ -636,7 +760,7 @@ class VimeoUserIE(VimeoChannelIE):
  
  class VimeoAlbumIE(VimeoChannelIE):
      IE_NAME = 'vimeo:album'
-    _VALID_URL = r'https://vimeo\.com/album/(?P<id>\d+)'
+    _VALID_URL = r'https://vimeo\.com/album/(?P<id>\d+)(?:$|[?#]|/(?!video))'
      _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
      _TESTS = [{
          'url': 'https://vimeo.com/album/2632481',
@@ -656,6 +780,13 @@ class VimeoAlbumIE(VimeoChannelIE):
          'params': {
              'videopassword': 'youtube-dl',
          }
+    }, {
+        'url': 'https://vimeo.com/album/2632481/sort:plays/format:thumbnail',
+        'only_matching': True,
+    }, {
+        # TODO: respect page number
+        'url': 'https://vimeo.com/album/2632481/page:2/sort:plays/format:thumbnail',
+        'only_matching': True,
      }]
  
      def _page_url(self, base_url, pagenum):
@@ -687,7 +818,7 @@ class VimeoGroupsIE(VimeoAlbumIE):
          return self._extract_videos(name, 'https://vimeo.com/groups/%s' % name)
  
  
-class VimeoReviewIE(InfoExtractor):
+class VimeoReviewIE(VimeoBaseInfoExtractor):
      IE_NAME = 'vimeo:review'
      IE_DESC = 'Review pages on vimeo'
      _VALID_URL = r'https://vimeo\.com/[^/]+/review/(?P<id>[^/]+)'
@@ -699,6 +830,7 @@ class VimeoReviewIE(InfoExtractor):
              'ext': 'mp4',
              'title': "DICK HARDWICK 'Comedian'",
              'uploader': 'Richard Hardwick',
+            'uploader_id': 'user21297594',
          }
      }, {
          'note': 'video player needs Referer',
@@ -711,14 +843,47 @@ class VimeoReviewIE(InfoExtractor):
              'uploader': 'DevWeek Events',
              'duration': 2773,
              'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader_id': 'user22258446',
          }
+    }, {
+        'note': 'Password protected',
+        'url': 'https://vimeo.com/user37284429/review/138823582/c4d865efde',
+        'info_dict': {
+            'id': '138823582',
+            'ext': 'mp4',
+            'title': 'EFFICIENT PICKUP MASTERCLASS MODULE 1',
+            'uploader': 'TMB',
+            'uploader_id': 'user37284429',
+        },
+        'params': {
+            'videopassword': 'holygrail',
+        },
+        'skip': 'video gone',
      }]
  
+    def _real_initialize(self):
+        self._login()
+
+    def _get_config_url(self, webpage_url, video_id, video_password_verified=False):
+        webpage = self._download_webpage(webpage_url, video_id)
+        data = self._parse_json(self._search_regex(
+            r'window\s*=\s*_extend\(window,\s*({.+?})\);', webpage, 'data',
+            default=NO_DEFAULT if video_password_verified else '{}'), video_id)
+        config_url = data.get('vimeo_esi', {}).get('config', {}).get('configUrl')
+        if config_url is None:
+            self._verify_video_password(webpage_url, video_id, webpage)
+            config_url = self._get_config_url(
+                webpage_url, video_id, video_password_verified=True)
+        return config_url
+
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        player_url = 'https://player.vimeo.com/player/' + video_id
-        return self.url_result(player_url, 'Vimeo', video_id)
+        video_id = self._match_id(url)
+        config_url = self._get_config_url(url, video_id)
+        config = self._download_json(config_url, video_id)
+        info_dict = self._parse_config(config, video_id)
+        self._vimeo_sort_formats(info_dict['formats'])
+        info_dict['id'] = video_id
+        return info_dict
  
  
  class VimeoWatchLaterIE(VimeoChannelIE):
diff --git a/youtube_dl/extractor/vimple.py b/youtube_dl/extractor/vimple.py

index 92321d66e369626c0adfeda6cb4fae282a6f7abb..7fd9b777b4b6bb88cd08e9e625f74f41e8775092 100644 (file)
--- a/youtube_dl/extractor/vimple.py
+++ b/youtube_dl/extractor/vimple.py
@@ -28,23 +28,24 @@ class SprutoBaseIE(InfoExtractor):
  
  class VimpleIE(SprutoBaseIE):
      IE_DESC = 'Vimple - one-click video hosting'
-    _VALID_URL = r'https?://(?:player\.vimple\.ru/iframe|vimple\.ru)/(?P<id>[\da-f-]{32,36})'
-    _TESTS = [
-        {
-            'url': 'http://vimple.ru/c0f6b1687dcd4000a97ebe70068039cf',
-            'md5': '2e750a330ed211d3fd41821c6ad9a279',
-            'info_dict': {
-                'id': 'c0f6b168-7dcd-4000-a97e-be70068039cf',
-                'ext': 'mp4',
-                'title': 'Sunset',
-                'duration': 20,
-                'thumbnail': 're:https?://.*?\.jpg',
-            },
-        }, {
-            'url': 'http://player.vimple.ru/iframe/52e1beec-1314-4a83-aeac-c61562eadbf9',
-            'only_matching': True,
-        }
-    ]
+    _VALID_URL = r'https?://(?:player\.vimple\.(?:ru|co)/iframe|vimple\.(?:ru|co))/(?P<id>[\da-f-]{32,36})'
+    _TESTS = [{
+        'url': 'http://vimple.ru/c0f6b1687dcd4000a97ebe70068039cf',
+        'md5': '2e750a330ed211d3fd41821c6ad9a279',
+        'info_dict': {
+            'id': 'c0f6b168-7dcd-4000-a97e-be70068039cf',
+            'ext': 'mp4',
+            'title': 'Sunset',
+            'duration': 20,
+            'thumbnail': 're:https?://.*?\.jpg',
+        },
+    }, {
+        'url': 'http://player.vimple.ru/iframe/52e1beec-1314-4a83-aeac-c61562eadbf9',
+        'only_matching': True,
+    }, {
+        'url': 'http://vimple.co/04506a053f124483b8fb05ed73899f19',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/vine.py b/youtube_dl/extractor/vine.py

index a6a6cc47955f6aae482d8022bf52d4d233fb955f..0183f052a599f411a48053360ee41670e758f7af 100644 (file)
--- a/youtube_dl/extractor/vine.py
+++ b/youtube_dl/extractor/vine.py
@@ -24,6 +24,7 @@ class VineIE(InfoExtractor):
              'upload_date': '20130519',
              'uploader': 'Jack Dorsey',
              'uploader_id': '76',
+            'view_count': int,
              'like_count': int,
              'comment_count': int,
              'repost_count': int,
@@ -39,6 +40,7 @@ class VineIE(InfoExtractor):
              'upload_date': '20140815',
              'uploader': 'Mars Ruiz',
              'uploader_id': '1102363502380728320',
+            'view_count': int,
              'like_count': int,
              'comment_count': int,
              'repost_count': int,
@@ -54,6 +56,7 @@ class VineIE(InfoExtractor):
              'upload_date': '20130430',
              'uploader': 'Z3k3',
              'uploader_id': '936470460173008896',
+            'view_count': int,
              'like_count': int,
              'comment_count': int,
              'repost_count': int,
@@ -71,6 +74,7 @@ class VineIE(InfoExtractor):
              'upload_date': '20150705',
              'uploader': 'Pimry_zaa',
              'uploader_id': '1135760698325307392',
+            'view_count': int,
              'like_count': int,
              'comment_count': int,
              'repost_count': int,
@@ -86,10 +90,12 @@ class VineIE(InfoExtractor):
  
          data = self._parse_json(
              self._search_regex(
-                r'window\.POST_DATA\s*=\s*{\s*%s\s*:\s*({.+?})\s*};\s*</script>' % video_id,
+                r'window\.POST_DATA\s*=\s*({.+?});\s*</script>',
                  webpage, 'vine data'),
              video_id)
  
+        data = data[list(data.keys())[0]]
+
          formats = [{
              'format_id': '%(format)s-%(rate)s' % f,
              'vcodec': f.get('format'),
@@ -109,6 +115,7 @@ class VineIE(InfoExtractor):
              'upload_date': unified_strdate(data.get('created')),
              'uploader': username,
              'uploader_id': data.get('userIdStr'),
+            'view_count': int_or_none(data.get('loops', {}).get('count')),
              'like_count': int_or_none(data.get('likes', {}).get('count')),
              'comment_count': int_or_none(data.get('comments', {}).get('count')),
              'repost_count': int_or_none(data.get('reposts', {}).get('count')),
diff --git a/youtube_dl/extractor/vk.py b/youtube_dl/extractor/vk.py

index 67220f1b7a991e48494adf24c791317e29eda8cd..1990e7093acabb2dce11faebfddd220e8d88392b 100644 (file)
--- a/youtube_dl/extractor/vk.py
+++ b/youtube_dl/extractor/vk.py
@@ -1,41 +1,100 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
+import collections
  import re
-import json
+import sys
  
  from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import (
+    compat_str,
+    compat_urlparse,
+)
  from ..utils import (
+    clean_html,
      ExtractorError,
+    get_element_by_class,
      int_or_none,
      orderedSet,
-    sanitized_Request,
+    remove_start,
      str_to_int,
      unescapeHTML,
-    unified_strdate,
+    unified_timestamp,
      urlencode_postdata,
  )
-from .vimeo import VimeoIE
+from .dailymotion import DailymotionIE
  from .pladform import PladformIE
+from .vimeo import VimeoIE
+
+
+class VKBaseIE(InfoExtractor):
+    _NETRC_MACHINE = 'vk'
+
+    def _login(self):
+        (username, password) = self._get_login_info()
+        if username is None:
+            return
+
+        login_page, url_handle = self._download_webpage_handle(
+            'https://vk.com', None, 'Downloading login page')
+
+        login_form = self._hidden_inputs(login_page)
+
+        login_form.update({
+            'email': username.encode('cp1251'),
+            'pass': password.encode('cp1251'),
+        })
+
+        # https://new.vk.com/ serves two same remixlhk cookies in Set-Cookie header
+        # and expects the first one to be set rather than second (see
+        # https://github.com/rg3/youtube-dl/issues/9841#issuecomment-227871201).
+        # As of RFC6265 the newer one cookie should be set into cookie store
+        # what actually happens.
+        # We will workaround this VK issue by resetting the remixlhk cookie to
+        # the first one manually.
+        for header, cookies in url_handle.headers.items():
+            if header.lower() != 'set-cookie':
+                continue
+            if sys.version_info[0] >= 3:
+                cookies = cookies.encode('iso-8859-1')
+            cookies = cookies.decode('utf-8')
+            remixlhk = re.search(r'remixlhk=(.+?);.*?\bdomain=(.+?)(?:[,;]|$)', cookies)
+            if remixlhk:
+                value, domain = remixlhk.groups()
+                self._set_cookie(domain, 'remixlhk', value)
+                break
+
+        login_page = self._download_webpage(
+            'https://login.vk.com/?act=login', None,
+            note='Logging in as %s' % username,
+            data=urlencode_postdata(login_form))
+
+        if re.search(r'onLoginFailed', login_page):
+            raise ExtractorError(
+                'Unable to login, incorrect username and/or password', expected=True)
  
+    def _real_initialize(self):
+        self._login()
  
-class VKIE(InfoExtractor):
+
+class VKIE(VKBaseIE):
      IE_NAME = 'vk'
      IE_DESC = 'VK'
      _VALID_URL = r'''(?x)
                      https?://
                          (?:
-                            (?:m\.)?vk\.com/video_ext\.php\?.*?\boid=(?P<oid>-?\d+).*?\bid=(?P<id>\d+)|
                              (?:
-                                (?:m\.)?vk\.com/(?:.+?\?.*?z=)?video|
-                                (?:www\.)?biqle\.ru/watch/
+                                (?:(?:m|new)\.)?vk\.com/video_|
+                                (?:www\.)?daxab.com/
                              )
-                            (?P<videoid>[^s].*?)(?:\?(?:.*\blist=(?P<list_id>[\da-f]+))?|%2F|$)
+                            ext\.php\?(?P<embed_query>.*?\boid=(?P<oid>-?\d+).*?\bid=(?P<id>\d+).*)|
+                            (?:
+                                (?:(?:m|new)\.)?vk\.com/(?:.+?\?.*?z=)?video|
+                                (?:www\.)?daxab.com/embed/
+                            )
+                            (?P<videoid>-?\d+_\d+)(?:.*\blist=(?P<list_id>[\da-f]+))?
                          )
                      '''
-    _NETRC_MACHINE = 'vk'
-
      _TESTS = [
          {
              'url': 'http://vk.com/videos-77521?z=video-77521_162222515%2Fclub77521',
@@ -46,6 +105,7 @@ class VKIE(InfoExtractor):
                  'title': 'ProtivoGunz - Хуёвая песня',
                  'uploader': 're:(?:Noize MC|Alexander Ilyashenko).*',
                  'duration': 195,
+                'timestamp': 1329060660,
                  'upload_date': '20120212',
                  'view_count': int,
              },
@@ -59,6 +119,7 @@ class VKIE(InfoExtractor):
                  'uploader': 'Tom Cruise',
                  'title': 'No name',
                  'duration': 9,
+                'timestamp': 1374374880,
                  'upload_date': '20130721',
                  'view_count': int,
              }
@@ -75,7 +136,8 @@ class VKIE(InfoExtractor):
                  'duration': 101,
                  'upload_date': '20120730',
                  'view_count': int,
-            }
+            },
+            'skip': 'This video has been removed from public access.',
          },
          {
              # VIDEO NOW REMOVED
@@ -134,6 +196,7 @@ class VKIE(InfoExtractor):
                  'upload_date': '20150709',
                  'view_count': int,
              },
+            'skip': 'Removed',
          },
          {
              # youtube embed
@@ -142,7 +205,7 @@ class VKIE(InfoExtractor):
                  'id': 'V3K4mi0SYkc',
                  'ext': 'webm',
                  'title': "DSWD Awards 'Children's Joy Foundation, Inc.' Certificate of Registration and License to Operate",
-                'description': 'md5:bf9c26cfa4acdfb146362682edd3827a',
+                'description': 'md5:d9903938abdc74c738af77f527ca0596',
                  'duration': 178,
                  'upload_date': '20130116',
                  'uploader': "Children's Joy Foundation",
@@ -150,6 +213,23 @@ class VKIE(InfoExtractor):
                  'view_count': int,
              },
          },
+        {
+            # dailymotion embed
+            'url': 'https://vk.com/video-37468416_456239855',
+            'info_dict': {
+                'id': 'k3lz2cmXyRuJQSjGHUv',
+                'ext': 'mp4',
+                'title': 'md5:d52606645c20b0ddbb21655adaa4f56f',
+                'description': 'md5:c651358f03c56f1150b555c26d90a0fd',
+                'uploader': 'AniLibria.Tv',
+                'upload_date': '20160914',
+                'uploader_id': 'x1p5vl5',
+                'timestamp': 1473877246,
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
          {
              # video key is extra_data not url\d+
              'url': 'http://vk.com/video-110305615_171782105',
@@ -159,10 +239,30 @@ class VKIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'S-Dance, репетиции к The way show',
                  'uploader': 'THE WAY SHOW | 17 апреля',
+                'timestamp': 1454870100,
                  'upload_date': '20160207',
                  'view_count': int,
              },
          },
+        {
+            # finished live stream, live_mp4
+            'url': 'https://vk.com/videos-387766?z=video-387766_456242764%2Fpl_-387766_-2',
+            'md5': '90d22d051fccbbe9becfccc615be6791',
+            'info_dict': {
+                'id': '456242764',
+                'ext': 'mp4',
+                'title': 'ИгроМир 2016 — день 1',
+                'uploader': 'Игромания',
+                'duration': 5239,
+                'view_count': int,
+            },
+        },
+        {
+            # live stream, hls and rtmp links,most likely already finished live
+            # stream by the time you are reading this comment
+            'url': 'https://vk.com/video-140332_456239111',
+            'only_matching': True,
+        },
          {
              # removed video, just testing that we match the pattern
              'url': 'http://vk.com/feed?z=video-43215063_166094326%2Fbb50cacd3177146d7a',
@@ -174,63 +274,35 @@ class VKIE(InfoExtractor):
              'only_matching': True,
          },
          {
-            # vk wrapper
-            'url': 'http://www.biqle.ru/watch/847655_160197695',
+            # pladform embed
+            'url': 'https://vk.com/video-76116461_171554880',
              'only_matching': True,
          },
          {
-            # pladform embed
-            'url': 'https://vk.com/video-76116461_171554880',
+            'url': 'http://new.vk.com/video205387401_165548505',
              'only_matching': True,
          }
      ]
  
-    def _login(self):
-        (username, password) = self._get_login_info()
-        if username is None:
-            return
-
-        login_page = self._download_webpage(
-            'https://vk.com', None, 'Downloading login page')
-
-        login_form = self._hidden_inputs(login_page)
-
-        login_form.update({
-            'email': username.encode('cp1251'),
-            'pass': password.encode('cp1251'),
-        })
-
-        request = sanitized_Request(
-            'https://login.vk.com/?act=login',
-            urlencode_postdata(login_form))
-        login_page = self._download_webpage(
-            request, None, note='Logging in as %s' % username)
-
-        if re.search(r'onLoginFailed', login_page):
-            raise ExtractorError(
-                'Unable to login, incorrect username and/or password', expected=True)
-
-    def _real_initialize(self):
-        self._login()
-
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('videoid')
  
-        if not video_id:
+        if video_id:
+            info_url = 'https://vk.com/al_video.php?act=show&al=1&module=video&video=%s' % video_id
+            # Some videos (removed?) can only be downloaded with list id specified
+            list_id = mobj.group('list_id')
+            if list_id:
+                info_url += '&list=%s' % list_id
+        else:
+            info_url = 'http://vk.com/video_ext.php?' + mobj.group('embed_query')
              video_id = '%s_%s' % (mobj.group('oid'), mobj.group('id'))
  
-        info_url = 'https://vk.com/al_video.php?act=show&al=1&module=video&video=%s' % video_id
-
-        # Some videos (removed?) can only be downloaded with list id specified
-        list_id = mobj.group('list_id')
-        if list_id:
-            info_url += '&list=%s' % list_id
-
          info_page = self._download_webpage(info_url, video_id)
  
          error_message = self._html_search_regex(
-            r'(?s)<!><div[^>]+class="video_layer_message"[^>]*>(.+?)</div>',
+            [r'(?s)<!><div[^>]+class="video_layer_message"[^>]*>(.+?)</div>',
+                r'(?s)<div[^>]+id="video_ext_msg"[^>]*>(.+?)</div>'],
              info_page, 'error message', default=None)
          if error_message:
              raise ExtractorError(error_message, expected=True)
@@ -268,7 +340,7 @@ class VKIE(InfoExtractor):
          if youtube_url:
              return self.url_result(youtube_url, 'Youtube')
  
-        vimeo_url = VimeoIE._extract_vimeo_url(url, info_page)
+        vimeo_url = VimeoIE._extract_url(url, info_page)
          if vimeo_url is not None:
              return self.url_result(vimeo_url)
  
@@ -283,6 +355,10 @@ class VKIE(InfoExtractor):
                  m_rutube.group(1).replace('\\', ''))
              return self.url_result(rutube_url)
  
+        dailymotion_urls = DailymotionIE._extract_urls(info_page)
+        if dailymotion_urls:
+            return self.url_result(dailymotion_urls[0], DailymotionIE.ie_key())
+
          m_opts = re.search(r'(?s)var\s+opts\s*=\s*({.+?});', info_page)
          if m_opts:
              m_opts_url = re.search(r"url\s*:\s*'((?!/\b)[^']+)", m_opts.group(1))
@@ -292,53 +368,72 @@ class VKIE(InfoExtractor):
                      opts_url = 'http:' + opts_url
                  return self.url_result(opts_url)
  
-        data_json = self._search_regex(r'var\s+vars\s*=\s*({.+?});', info_page, 'vars')
-        data = json.loads(data_json)
+        # vars does not look to be served anymore since 24.10.2016
+        data = self._parse_json(
+            self._search_regex(
+                r'var\s+vars\s*=\s*({.+?});', info_page, 'vars', default='{}'),
+            video_id, fatal=False)
+
+        # <!json> is served instead
+        if not data:
+            data = self._parse_json(
+                self._search_regex(
+                    r'<!json>\s*({.+?})\s*<!>', info_page, 'json'),
+                video_id)['player']['params'][0]
  
-        # Extract upload date
-        upload_date = None
-        mobj = re.search(r'id="mv_date(?:_views)?_wrap"[^>]*>([a-zA-Z]+ [0-9]+), ([0-9]+) at', info_page)
-        if mobj is not None:
-            mobj.group(1) + ' ' + mobj.group(2)
-            upload_date = unified_strdate(mobj.group(1) + ' ' + mobj.group(2))
+        title = unescapeHTML(data['md_title'])
  
-        view_count = None
-        views = self._html_search_regex(
-            r'"mv_views_count_number"[^>]*>(.+?\bviews?)<',
-            info_page, 'view count', fatal=False)
-        if views:
-            view_count = str_to_int(self._search_regex(
-                r'([\d,.]+)', views, 'view count', fatal=False))
+        if data.get('live') == 2:
+            title = self._live_title(title)
+
+        timestamp = unified_timestamp(self._html_search_regex(
+            r'class=["\']mv_info_date[^>]+>([^<]+)(?:<|from)', info_page,
+            'upload date', fatal=False))
+
+        view_count = str_to_int(self._search_regex(
+            r'class=["\']mv_views_count[^>]+>\s*([\d,.]+)',
+            info_page, 'view count', fatal=False))
  
          formats = []
-        for k, v in data.items():
-            if not k.startswith('url') and k != 'extra_data' or not v:
+        for format_id, format_url in data.items():
+            if not isinstance(format_url, compat_str) or not format_url.startswith(('http', '//', 'rtmp')):
                  continue
-            height = int_or_none(self._search_regex(
-                r'^url(\d+)', k, 'height', default=None))
-            formats.append({
-                'format_id': k,
-                'url': v,
-                'height': height,
-            })
+            if format_id.startswith(('url', 'cache')) or format_id in ('extra_data', 'live_mp4'):
+                height = int_or_none(self._search_regex(
+                    r'^(?:url|cache)(\d+)', format_id, 'height', default=None))
+                formats.append({
+                    'format_id': format_id,
+                    'url': format_url,
+                    'height': height,
+                })
+            elif format_id == 'hls':
+                formats.extend(self._extract_m3u8_formats(
+                    format_url, video_id, 'mp4', m3u8_id=format_id,
+                    fatal=False, live=True))
+            elif format_id == 'rtmp':
+                formats.append({
+                    'format_id': format_id,
+                    'url': format_url,
+                    'ext': 'flv',
+                })
          self._sort_formats(formats)
  
          return {
-            'id': compat_str(data['vid']),
+            'id': compat_str(data.get('vid') or video_id),
              'formats': formats,
-            'title': unescapeHTML(data['md_title']),
+            'title': title,
              'thumbnail': data.get('jpg'),
              'uploader': data.get('md_author'),
              'duration': data.get('duration'),
-            'upload_date': upload_date,
+            'timestamp': timestamp,
              'view_count': view_count,
          }
  
  
-class VKUserVideosIE(InfoExtractor):
+class VKUserVideosIE(VKBaseIE):
      IE_NAME = 'vk:uservideos'
      IE_DESC = "VK - User's Videos"
-    _VALID_URL = r'https?://vk\.com/videos(?P<id>-?[0-9]+)(?!\?.*\bz=video)(?:[/?#&]|$)'
+    _VALID_URL = r'https?://(?:(?:m|new)\.)?vk\.com/videos(?P<id>-?[0-9]+)(?!\?.*\bz=video)(?:[/?#&]|$)'
      _TEMPLATE_URL = 'https://vk.com/videos'
      _TESTS = [{
          'url': 'http://vk.com/videos205387401',
@@ -353,6 +448,12 @@ class VKUserVideosIE(InfoExtractor):
      }, {
          'url': 'http://vk.com/videos-97664626?section=all',
          'only_matching': True,
+    }, {
+        'url': 'http://m.vk.com/videos205387401',
+        'only_matching': True,
+    }, {
+        'url': 'http://new.vk.com/videos205387401',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -370,3 +471,131 @@ class VKUserVideosIE(InfoExtractor):
              webpage, 'title', default=page_id))
  
          return self.playlist_result(entries, page_id, title)
+
+
+class VKWallPostIE(VKBaseIE):
+    IE_NAME = 'vk:wallpost'
+    _VALID_URL = r'https?://(?:(?:(?:(?:m|new)\.)?vk\.com/(?:[^?]+\?.*\bw=)?wall(?P<id>-?\d+_\d+)))'
+    _TESTS = [{
+        # public page URL, audio playlist
+        'url': 'https://vk.com/bs.official?w=wall-23538238_35',
+        'info_dict': {
+            'id': '23538238_35',
+            'title': 'Black Shadow - Wall post 23538238_35',
+            'description': 'md5:3f84b9c4f9ef499731cf1ced9998cc0c',
+        },
+        'playlist': [{
+            'md5': '5ba93864ec5b85f7ce19a9af4af080f6',
+            'info_dict': {
+                'id': '135220665_111806521',
+                'ext': 'mp3',
+                'title': 'Black Shadow - Слепое Верование',
+                'duration': 370,
+                'uploader': 'Black Shadow',
+                'artist': 'Black Shadow',
+                'track': 'Слепое Верование',
+            },
+        }, {
+            'md5': '4cc7e804579122b17ea95af7834c9233',
+            'info_dict': {
+                'id': '135220665_111802303',
+                'ext': 'mp3',
+                'title': 'Black Shadow - Война - Негасимое Бездны Пламя!',
+                'duration': 423,
+                'uploader': 'Black Shadow',
+                'artist': 'Black Shadow',
+                'track': 'Война - Негасимое Бездны Пламя!',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        }],
+        'params': {
+            'usenetrc': True,
+        },
+        'skip': 'Requires vk account credentials',
+    }, {
+        # single YouTube embed, no leading -
+        'url': 'https://vk.com/wall85155021_6319',
+        'info_dict': {
+            'id': '85155021_6319',
+            'title': 'Sergey Gorbunov - Wall post 85155021_6319',
+        },
+        'playlist_count': 1,
+        'params': {
+            'usenetrc': True,
+        },
+        'skip': 'Requires vk account credentials',
+    }, {
+        # wall page URL
+        'url': 'https://vk.com/wall-23538238_35',
+        'only_matching': True,
+    }, {
+        # mobile wall page URL
+        'url': 'https://m.vk.com/wall-23538238_35',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        post_id = self._match_id(url)
+
+        wall_url = 'https://vk.com/wall%s' % post_id
+
+        post_id = remove_start(post_id, '-')
+
+        webpage = self._download_webpage(wall_url, post_id)
+
+        error = self._html_search_regex(
+            r'>Error</div>\s*<div[^>]+class=["\']body["\'][^>]*>([^<]+)',
+            webpage, 'error', default=None)
+        if error:
+            raise ExtractorError('VK said: %s' % error, expected=True)
+
+        description = clean_html(get_element_by_class('wall_post_text', webpage))
+        uploader = clean_html(get_element_by_class('author', webpage))
+        thumbnail = self._og_search_thumbnail(webpage)
+
+        entries = []
+
+        audio_ids = re.findall(r'data-full-id=["\'](\d+_\d+)', webpage)
+        if audio_ids:
+            al_audio = self._download_webpage(
+                'https://vk.com/al_audio.php', post_id,
+                note='Downloading audio info', fatal=False,
+                data=urlencode_postdata({
+                    'act': 'reload_audio',
+                    'al': '1',
+                    'ids': ','.join(audio_ids)
+                }))
+            if al_audio:
+                Audio = collections.namedtuple(
+                    'Audio', ['id', 'user_id', 'url', 'track', 'artist', 'duration'])
+                audios = self._parse_json(
+                    self._search_regex(
+                        r'<!json>(.+?)<!>', al_audio, 'audios', default='[]'),
+                    post_id, fatal=False, transform_source=unescapeHTML)
+                if isinstance(audios, list):
+                    for audio in audios:
+                        a = Audio._make(audio[:6])
+                        entries.append({
+                            'id': '%s_%s' % (a.user_id, a.id),
+                            'url': a.url,
+                            'title': '%s - %s' % (a.artist, a.track) if a.artist and a.track else a.id,
+                            'thumbnail': thumbnail,
+                            'duration': a.duration,
+                            'uploader': uploader,
+                            'artist': a.artist,
+                            'track': a.track,
+                        })
+
+        for video in re.finditer(
+                r'<a[^>]+href=(["\'])(?P<url>/video(?:-?[\d_]+).*?)\1', webpage):
+            entries.append(self.url_result(
+                compat_urlparse.urljoin(url, video.group('url')), VKIE.ie_key()))
+
+        title = 'Wall post %s' % post_id
+
+        return self.playlist_result(
+            orderedSet(entries), post_id,
+            '%s - %s' % (uploader, title) if uploader else title,
+            description)
diff --git a/youtube_dl/extractor/vlive.py b/youtube_dl/extractor/vlive.py

index baf39bb2cea714fb1578d80b4b8a83c9cd67e568..acf9fda487f6143906b8162a158c1cf9f53fec68 100644 (file)
--- a/youtube_dl/extractor/vlive.py
+++ b/youtube_dl/extractor/vlive.py
@@ -1,11 +1,15 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import (
      dict_get,
+    ExtractorError,
      float_or_none,
      int_or_none,
+    remove_start,
  )
  from ..compat import compat_urllib_parse_urlencode
  
@@ -13,17 +17,30 @@ from ..compat import compat_urllib_parse_urlencode
  class VLiveIE(InfoExtractor):
      IE_NAME = 'vlive'
      _VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.vlive.tv/video/1326',
          'md5': 'cc7314812855ce56de70a06a27314983',
          'info_dict': {
              'id': '1326',
              'ext': 'mp4',
-            'title': "[V] Girl's Day's Broadcast",
+            'title': "[V LIVE] Girl's Day's Broadcast",
              'creator': "Girl's Day",
              'view_count': int,
          },
-    }
+    }, {
+        'url': 'http://www.vlive.tv/video/16937',
+        'info_dict': {
+            'id': '16937',
+            'ext': 'mp4',
+            'title': '[V LIVE] 첸백시 걍방',
+            'creator': 'EXO',
+            'view_count': int,
+            'subtitles': 'mincount:12',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -31,16 +48,62 @@ class VLiveIE(InfoExtractor):
          webpage = self._download_webpage(
              'http://www.vlive.tv/video/%s' % video_id, video_id)
  
-        long_video_id = self._search_regex(
-            r'vlive\.tv\.video\.ajax\.request\.handler\.init\(\s*"[0-9]+"\s*,\s*"[^"]*"\s*,\s*"([^"]+)"',
-            webpage, 'long video id')
+        video_params = self._search_regex(
+            r'\bvlive\.video\.init\(([^)]+)\)',
+            webpage, 'video params')
+        status, _, _, live_params, long_video_id, key = re.split(
+            r'"\s*,\s*"', video_params)[2:8]
+        status = remove_start(status, 'PRODUCT_')
  
-        key = self._search_regex(
-            r'vlive\.tv\.video\.ajax\.request\.handler\.init\(\s*"[0-9]+"\s*,\s*"[^"]*"\s*,\s*"[^"]+"\s*,\s*"([^"]+)"',
-            webpage, 'key')
+        if status == 'LIVE_ON_AIR' or status == 'BIG_EVENT_ON_AIR':
+            live_params = self._parse_json('"%s"' % live_params, video_id)
+            live_params = self._parse_json(live_params, video_id)
+            return self._live(video_id, webpage, live_params)
+        elif status == 'VOD_ON_AIR' or status == 'BIG_EVENT_INTRO':
+            if long_video_id and key:
+                return self._replay(video_id, webpage, long_video_id, key)
+            else:
+                status = 'COMING_SOON'
  
+        if status == 'LIVE_END':
+            raise ExtractorError('Uploading for replay. Please wait...',
+                                 expected=True)
+        elif status == 'COMING_SOON':
+            raise ExtractorError('Coming soon!', expected=True)
+        elif status == 'CANCELED':
+            raise ExtractorError('We are sorry, '
+                                 'but the live broadcast has been canceled.',
+                                 expected=True)
+        else:
+            raise ExtractorError('Unknown status %s' % status)
+
+    def _get_common_fields(self, webpage):
          title = self._og_search_title(webpage)
+        creator = self._html_search_regex(
+            r'<div[^>]+class="info_area"[^>]*>\s*<a\s+[^>]*>([^<]+)',
+            webpage, 'creator', fatal=False)
+        thumbnail = self._og_search_thumbnail(webpage)
+        return {
+            'title': title,
+            'creator': creator,
+            'thumbnail': thumbnail,
+        }
+
+    def _live(self, video_id, webpage, live_params):
+        formats = []
+        for vid in live_params.get('resolutions', []):
+            formats.extend(self._extract_m3u8_formats(
+                vid['cdnUrl'], video_id, 'mp4',
+                m3u8_id=vid.get('name'),
+                fatal=False, live=True))
+        self._sort_formats(formats)
+
+        return dict(self._get_common_fields(webpage),
+                    id=video_id,
+                    formats=formats,
+                    is_live=True)
  
+    def _replay(self, video_id, webpage, long_video_id, key):
          playinfo = self._download_json(
              'http://global.apis.naver.com/rmcnmv/rmcnmv/vod_play_videoInfo.json?%s'
              % compat_urllib_parse_urlencode({
@@ -62,27 +125,18 @@ class VLiveIE(InfoExtractor):
          } for vid in playinfo.get('videos', {}).get('list', []) if vid.get('source')]
          self._sort_formats(formats)
  
-        thumbnail = self._og_search_thumbnail(webpage)
-        creator = self._html_search_regex(
-            r'<div[^>]+class="info_area"[^>]*>\s*<a\s+[^>]*>([^<]+)',
-            webpage, 'creator', fatal=False)
-
          view_count = int_or_none(playinfo.get('meta', {}).get('count'))
  
          subtitles = {}
          for caption in playinfo.get('captions', {}).get('list', []):
-            lang = dict_get(caption, ('language', 'locale', 'country', 'label'))
+            lang = dict_get(caption, ('locale', 'language', 'country', 'label'))
              if lang and caption.get('source'):
                  subtitles[lang] = [{
                      'ext': 'vtt',
                      'url': caption['source']}]
  
-        return {
-            'id': video_id,
-            'title': title,
-            'creator': creator,
-            'thumbnail': thumbnail,
-            'view_count': view_count,
-            'formats': formats,
-            'subtitles': subtitles,
-        }
+        return dict(self._get_common_fields(webpage),
+                    id=video_id,
+                    formats=formats,
+                    view_count=view_count,
+                    subtitles=subtitles)
diff --git a/youtube_dl/extractor/vodlocker.py b/youtube_dl/extractor/vodlocker.py

index a938a4007ead91a25ca84b43307ce16e1787e2e6..bbfa6e5f26f6043af52ae168b6e2cebb7463edfc 100644 (file)
--- a/youtube_dl/extractor/vodlocker.py
+++ b/youtube_dl/extractor/vodlocker.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -31,7 +31,8 @@ class VodlockerIE(InfoExtractor):
          if any(p in webpage for p in (
                  '>THIS FILE WAS DELETED<',
                  '>File Not Found<',
-                'The file you were looking for could not be found, sorry for any inconvenience.<')):
+                'The file you were looking for could not be found, sorry for any inconvenience.<',
+                '>The file was removed')):
              raise ExtractorError('Video %s does not exist' % video_id, expected=True)
  
          fields = self._hidden_inputs(webpage)
diff --git a/youtube_dl/extractor/vodplatform.py b/youtube_dl/extractor/vodplatform.py

new file mode 100644 (file)

index 0000000..2396443
--- /dev/null
+++ b/youtube_dl/extractor/vodplatform.py
@@ -0,0 +1,37 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import unescapeHTML
+
+
+class VODPlatformIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?vod-platform\.net/[eE]mbed/(?P<id>[^/?#]+)'
+    _TEST = {
+        # from http://www.lbcgroup.tv/watch/chapter/29143/52844/%D8%A7%D9%84%D9%86%D8%B5%D8%B1%D8%A9-%D9%81%D9%8A-%D8%B6%D9%8A%D8%A7%D9%81%D8%A9-%D8%A7%D9%84%D9%80-cnn/ar
+        'url': 'http://vod-platform.net/embed/RufMcytHDolTH1MuKHY9Fw',
+        'md5': '1db2b7249ce383d6be96499006e951fc',
+        'info_dict': {
+            'id': 'RufMcytHDolTH1MuKHY9Fw',
+            'ext': 'mp4',
+            'title': 'LBCi News_ النصرة في ضيافة الـ "سي.أن.أن"',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        title = unescapeHTML(self._og_search_title(webpage))
+        hidden_inputs = self._hidden_inputs(webpage)
+
+        formats = self._extract_wowza_formats(
+            hidden_inputs.get('HiddenmyhHlsLink') or hidden_inputs['HiddenmyDashLink'], video_id, skip_protocols=['f4m', 'smil'])
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': hidden_inputs.get('HiddenThumbnail') or self._og_search_thumbnail(webpage),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/voicerepublic.py b/youtube_dl/extractor/voicerepublic.py

index 93d15a556dedb6e0589dc5f393e0a01ad5b4a8a0..4f1a99a8989d736c1de572e6372b022544102f87 100644 (file)
--- a/youtube_dl/extractor/voicerepublic.py
+++ b/youtube_dl/extractor/voicerepublic.py
@@ -3,7 +3,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
+from ..compat import (
+    compat_str,
+    compat_urlparse,
+)
  from ..utils import (
      ExtractorError,
      determine_ext,
@@ -16,13 +19,13 @@ class VoiceRepublicIE(InfoExtractor):
      _VALID_URL = r'https?://voicerepublic\.com/(?:talks|embed)/(?P<id>[0-9a-z-]+)'
      _TESTS = [{
          'url': 'http://voicerepublic.com/talks/watching-the-watchers-building-a-sousveillance-state',
-        'md5': '0554a24d1657915aa8e8f84e15dc9353',
+        'md5': 'b9174d651323f17783000876347116e3',
          'info_dict': {
              'id': '2296',
              'display_id': 'watching-the-watchers-building-a-sousveillance-state',
              'ext': 'm4a',
              'title': 'Watching the Watchers: Building a Sousveillance State',
-            'description': 'md5:715ba964958afa2398df615809cfecb1',
+            'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
              'thumbnail': 're:^https?://.*\.(?:png|jpg)$',
              'duration': 1800,
              'view_count': int,
@@ -52,7 +55,7 @@ class VoiceRepublicIE(InfoExtractor):
          if data:
              title = data['title']
              description = data.get('teaser')
-            talk_id = data.get('talk_id') or display_id
+            talk_id = compat_str(data.get('talk_id') or display_id)
              talk = data['talk']
              duration = int_or_none(talk.get('duration'))
              formats = [{
diff --git a/youtube_dl/extractor/voxmedia.py b/youtube_dl/extractor/voxmedia.py

new file mode 100644 (file)

index 0000000..f8e3314
--- /dev/null
+++ b/youtube_dl/extractor/voxmedia.py
@@ -0,0 +1,142 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_unquote
+
+
+class VoxMediaIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?(?:theverge|vox|sbnation|eater|polygon|curbed|racked)\.com/(?:[^/]+/)*(?P<id>[^/?]+)'
+    _TESTS = [{
+        'url': 'http://www.theverge.com/2014/6/27/5849272/material-world-how-google-discovered-what-software-is-made-of',
+        'info_dict': {
+            'id': '11eXZobjrG8DCSTgrNjVinU-YmmdYjhe',
+            'ext': 'mp4',
+            'title': 'Google\'s new material design direction',
+            'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'add_ie': ['Ooyala'],
+    }, {
+        # data-ooyala-id
+        'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
+        'md5': 'd744484ff127884cd2ba09e3fa604e4b',
+        'info_dict': {
+            'id': 'RkZXU4cTphOCPDMZg5oEounJyoFI0g-B',
+            'ext': 'mp4',
+            'title': 'The Nexus 6: hands-on with Google\'s phablet',
+            'description': 'md5:87a51fe95ff8cea8b5bdb9ac7ae6a6af',
+        },
+        'add_ie': ['Ooyala'],
+    }, {
+        # volume embed
+        'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
+        'info_dict': {
+            'id': 'wydzk3dDpmRz7PQoXRsTIX6XTkPjYL0b',
+            'ext': 'mp4',
+            'title': 'The new frontier of LGBTQ civil rights, explained',
+            'description': 'md5:0dc58e94a465cbe91d02950f770eb93f',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'add_ie': ['Ooyala'],
+    }, {
+        # youtube embed
+        'url': 'http://www.vox.com/2016/3/24/11291692/robot-dance',
+        'md5': '83b3080489fb103941e549352d3e0977',
+        'info_dict': {
+            'id': 'FcNHTJU1ufM',
+            'ext': 'mp4',
+            'title': 'How "the robot" became the greatest novelty dance of all time',
+            'description': 'md5:b081c0d588b8b2085870cda55e6da176',
+            'upload_date': '20160324',
+            'uploader_id': 'voxdotcom',
+            'uploader': 'Vox',
+        },
+        'add_ie': ['Youtube'],
+    }, {
+        # SBN.VideoLinkset.entryGroup multiple ooyala embeds
+        'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
+        'info_dict': {
+            'id': 'national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
+            'title': '25 lies you will tell yourself on National Signing Day',
+            'description': 'It\'s the most self-delusional time of the year, and everyone\'s gonna tell the same lies together!',
+        },
+        'playlist': [{
+            'md5': '721fededf2ab74ae4176c8c8cbfe092e',
+            'info_dict': {
+                'id': 'p3cThlMjE61VDi_SD9JlIteSNPWVDBB9',
+                'ext': 'mp4',
+                'title': 'Buddy Hield vs Steph Curry (and the world)',
+                'description': 'Let’s dissect only the most important Final Four storylines.',
+            },
+        }, {
+            'md5': 'bf0c5cc115636af028be1bab79217ea9',
+            'info_dict': {
+                'id': 'BmbmVjMjE6esPHxdALGubTrouQ0jYLHj',
+                'ext': 'mp4',
+                'title': 'Chasing Cinderella 2016: Syracuse basketball',
+                'description': 'md5:e02d56b026d51aa32c010676765a690d',
+            },
+        }],
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = compat_urllib_parse_unquote(self._download_webpage(url, display_id))
+
+        def create_entry(provider_video_id, provider_video_type, title=None, description=None):
+            return {
+                '_type': 'url_transparent',
+                'url': provider_video_id if provider_video_type == 'youtube' else '%s:%s' % (provider_video_type, provider_video_id),
+                'title': title or self._og_search_title(webpage),
+                'description': description or self._og_search_description(webpage),
+            }
+
+        entries = []
+        entries_data = self._search_regex([
+            r'Chorus\.VideoContext\.addVideo\((\[{.+}\])\);',
+            r'var\s+entry\s*=\s*({.+});',
+            r'SBN\.VideoLinkset\.entryGroup\(\s*(\[.+\])',
+        ], webpage, 'video data', default=None)
+        if entries_data:
+            entries_data = self._parse_json(entries_data, display_id)
+            if isinstance(entries_data, dict):
+                entries_data = [entries_data]
+            for video_data in entries_data:
+                provider_video_id = video_data.get('provider_video_id')
+                provider_video_type = video_data.get('provider_video_type')
+                if provider_video_id and provider_video_type:
+                    entries.append(create_entry(
+                        provider_video_id, provider_video_type,
+                        video_data.get('title'), video_data.get('description')))
+
+        provider_video_id = self._search_regex(
+            r'data-ooyala-id="([^"]+)"', webpage, 'ooyala id', default=None)
+        if provider_video_id:
+            entries.append(create_entry(provider_video_id, 'ooyala'))
+
+        volume_uuid = self._search_regex(
+            r'data-volume-uuid="([^"]+)"', webpage, 'volume uuid', default=None)
+        if volume_uuid:
+            volume_webpage = self._download_webpage(
+                'http://volume.vox-cdn.com/embed/%s' % volume_uuid, volume_uuid)
+            video_data = self._parse_json(self._search_regex(
+                r'Volume\.createVideo\(({.+})\s*,\s*{.*}\s*,\s*\[.*\]\s*,\s*{.*}\);', volume_webpage, 'video data'), volume_uuid)
+            for provider_video_type in ('ooyala', 'youtube'):
+                provider_video_id = video_data.get('%s_id' % provider_video_type)
+                if provider_video_id:
+                    description = video_data.get('description_long') or video_data.get('description_short')
+                    entries.append(create_entry(
+                        provider_video_id, provider_video_type, video_data.get('title_short'), description))
+                    break
+
+        if len(entries) == 1:
+            return entries[0]
+        else:
+            return self.playlist_result(entries, display_id, self._og_search_title(webpage), self._og_search_description(webpage))
diff --git a/youtube_dl/extractor/vporn.py b/youtube_dl/extractor/vporn.py

index 92c90e5172e89b98c3309bb01dc9787f15c24859..1557a0e0406ebfb75c2b5b4583c74f05c5dd2cc7 100644 (file)
--- a/youtube_dl/extractor/vporn.py
+++ b/youtube_dl/extractor/vporn.py
@@ -4,6 +4,7 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    ExtractorError,
      parse_duration,
      str_to_int,
  )
@@ -27,7 +28,8 @@ class VpornIE(InfoExtractor):
                  'duration': 393,
                  'age_limit': 18,
                  'view_count': int,
-            }
+            },
+            'skip': 'video removed',
          },
          {
              'url': 'http://www.vporn.com/female/hana-shower/523564/',
@@ -40,7 +42,7 @@ class VpornIE(InfoExtractor):
                  'description': 'Hana showers at the bathroom.',
                  'thumbnail': 're:^https?://.*\.jpg$',
                  'uploader': 'Hmmmmm',
-                'categories': ['Big Boobs', 'Erotic', 'Teen', 'Female'],
+                'categories': ['Big Boobs', 'Erotic', 'Teen', 'Female', '720p'],
                  'duration': 588,
                  'age_limit': 18,
                  'view_count': int,
@@ -55,6 +57,10 @@ class VpornIE(InfoExtractor):
  
          webpage = self._download_webpage(url, display_id)
  
+        errmsg = 'This video has been deleted due to Copyright Infringement or by the account owner!'
+        if errmsg in webpage:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, errmsg), expected=True)
+
          title = self._html_search_regex(
              r'videoname\s*=\s*\'([^\']+)\'', webpage, 'title').strip()
          description = self._html_search_regex(
diff --git a/youtube_dl/extractor/vrt.py b/youtube_dl/extractor/vrt.py

index 2b6bae89bd2a450c9babe1ea77e299b806b752c1..00c72e34684f918e68fc859ad6ffb926efa04661 100644 (file)
--- a/youtube_dl/extractor/vrt.py
+++ b/youtube_dl/extractor/vrt.py
@@ -4,7 +4,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import float_or_none
+from ..utils import (
+    float_or_none,
+)
  
  
  class VRTIE(InfoExtractor):
@@ -22,7 +24,8 @@ class VRTIE(InfoExtractor):
                  'timestamp': 1414271750.949,
                  'upload_date': '20141025',
                  'duration': 929,
-            }
+            },
+            'skip': 'HTTP Error 404: Not Found',
          },
          # sporza.be
          {
@@ -36,7 +39,8 @@ class VRTIE(InfoExtractor):
                  'timestamp': 1413835980.560,
                  'upload_date': '20141020',
                  'duration': 3238,
-            }
+            },
+            'skip': 'HTTP Error 404: Not Found',
          },
          # cobra.be
          {
@@ -50,11 +54,38 @@ class VRTIE(InfoExtractor):
                  'timestamp': 1413967500.494,
                  'upload_date': '20141022',
                  'duration': 661,
-            }
+            },
+            'skip': 'HTTP Error 404: Not Found',
+        },
+        {
+            # YouTube video
+            'url': 'http://deredactie.be/cm/vrtnieuws/videozone/nieuws/cultuurenmedia/1.2622957',
+            'md5': 'b8b93da1df1cea6c8556255a796b7d61',
+            'info_dict': {
+                'id': 'Wji-BZ0oCwg',
+                'ext': 'mp4',
+                'title': 'ROGUE ONE: A STAR WARS STORY Official Teaser Trailer',
+                'description': 'md5:8e468944dce15567a786a67f74262583',
+                'uploader': 'Star Wars',
+                'uploader_id': 'starwars',
+                'upload_date': '20160407',
+            },
+            'add_ie': ['Youtube'],
          },
          {
              'url': 'http://cobra.canvas.be/cm/cobra/videozone/rubriek/film-videozone/1.2377055',
-            'only_matching': True,
+            'info_dict': {
+                'id': '2377055',
+                'ext': 'mp4',
+                'title': 'Cafe Derby',
+                'description': 'Lenny Van Wesemael debuteert met de langspeelfilm Café Derby. Een waar gebeurd maar ook verzonnen verhaal.',
+                'upload_date': '20150626',
+                'timestamp': 1435305240.769,
+            },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            }
          }
      ]
  
@@ -66,7 +97,17 @@ class VRTIE(InfoExtractor):
          video_id = self._search_regex(
              r'data-video-id="([^"]+)_[^"]+"', webpage, 'video id', fatal=False)
  
+        src = self._search_regex(
+            r'data-video-src="([^"]+)"', webpage, 'video src', default=None)
+
+        video_type = self._search_regex(
+            r'data-video-type="([^"]+)"', webpage, 'video type', default=None)
+
+        if video_type == 'YouTubeVideo':
+            return self.url_result(src, 'Youtube')
+
          formats = []
+
          mobj = re.search(
              r'data-video-iphone-server="(?P<server>[^"]+)"\s+data-video-iphone-path="(?P<path>[^"]+)"',
              webpage)
@@ -74,11 +115,19 @@ class VRTIE(InfoExtractor):
              formats.extend(self._extract_m3u8_formats(
                  '%s/%s' % (mobj.group('server'), mobj.group('path')),
                  video_id, 'mp4', m3u8_id='hls', fatal=False))
-        mobj = re.search(r'data-video-src="(?P<src>[^"]+)"', webpage)
-        if mobj:
-            formats.extend(self._extract_f4m_formats(
-                '%s/manifest.f4m' % mobj.group('src'),
-                video_id, f4m_id='hds', fatal=False))
+
+        if src:
+            formats = self._extract_wowza_formats(src, video_id)
+            if 'data-video-geoblocking="true"' not in webpage:
+                for f in formats:
+                    if f['url'].startswith('rtsp://'):
+                        http_format = f.copy()
+                        http_format.update({
+                            'url': f['url'].replace('rtsp://', 'http://').replace('vod.', 'download.').replace('/_definst_/', '/').replace('mp4:', ''),
+                            'format_id': f['format_id'].replace('rtsp', 'http'),
+                            'protocol': 'http',
+                        })
+                        formats.append(http_format)
  
          if not formats and 'data-video-geoblocking="true"' in webpage:
              self.raise_geo_restricted('This video is only available in Belgium')
diff --git a/youtube_dl/extractor/vuclip.py b/youtube_dl/extractor/vuclip.py

index eaa888f005cc61c53b8f45c3e3b93633083b17ed..55e087bdb47bff8d01d06e73ff4e9587e7e5eea8 100644 (file)
--- a/youtube_dl/extractor/vuclip.py
+++ b/youtube_dl/extractor/vuclip.py
@@ -9,7 +9,7 @@ from ..compat import (
  from ..utils import (
      ExtractorError,
      parse_duration,
-    qualities,
+    remove_end,
  )
  
  
@@ -17,12 +17,12 @@ class VuClipIE(InfoExtractor):
      _VALID_URL = r'https?://(?:m\.)?vuclip\.com/w\?.*?cid=(?P<id>[0-9]+)'
  
      _TEST = {
-        'url': 'http://m.vuclip.com/w?cid=922692425&fid=70295&z=1010&nvar&frm=index.html',
+        'url': 'http://m.vuclip.com/w?cid=1129900602&bu=8589892792&frm=w&z=34801&op=0&oc=843169247&section=recommend',
          'info_dict': {
-            'id': '922692425',
+            'id': '1129900602',
              'ext': '3gp',
-            'title': 'The Toy Soldiers - Hollywood Movie Trailer',
-            'duration': 180,
+            'title': 'Top 10 TV Convicts',
+            'duration': 733,
          }
      }
  
@@ -46,34 +46,21 @@ class VuClipIE(InfoExtractor):
                  '%s said: %s' % (self.IE_NAME, error_msg), expected=True)
  
          # These clowns alternate between two page types
-        links_code = self._search_regex(
-            r'''(?xs)
-                (?:
-                    <img\s+src="[^"]*/play.gif".*?>|
-                    <!--\ player\ end\ -->\s*</div><!--\ thumb\ end-->
-                )
-                (.*?)
-                (?:
-                    <a\s+href="fblike|<div\s+class="social">
-                )
-            ''', webpage, 'links')
-        title = self._html_search_regex(
-            r'<title>(.*?)-\s*Vuclip</title>', webpage, 'title').strip()
+        video_url = self._search_regex(
+            r'<a[^>]+href="([^"]+)"[^>]*><img[^>]+src="[^"]*/play\.gif',
+            webpage, 'video URL', default=None)
+        if video_url:
+            formats = [{
+                'url': video_url,
+            }]
+        else:
+            formats = self._parse_html5_media_entries(url, webpage, video_id)[0]['formats']
  
-        quality_order = qualities(['Reg', 'Hi'])
-        formats = []
-        for url, q in re.findall(
-                r'<a\s+href="(?P<url>[^"]+)".*?>(?:<button[^>]*>)?(?P<q>[^<]+)(?:</button>)?</a>', links_code):
-            format_id = compat_urllib_parse_urlparse(url).scheme + '-' + q
-            formats.append({
-                'format_id': format_id,
-                'url': url,
-                'quality': quality_order(q),
-            })
-        self._sort_formats(formats)
+        title = remove_end(self._html_search_regex(
+            r'<title>(.*?)-\s*Vuclip</title>', webpage, 'title').strip(), ' - Video')
  
-        duration = parse_duration(self._search_regex(
-            r'\(([0-9:]+)\)</span>', webpage, 'duration', fatal=False))
+        duration = parse_duration(self._html_search_regex(
+            r'[(>]([0-9]+:[0-9]+)(?:<span|\))', webpage, 'duration', fatal=False))
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/vulture.py b/youtube_dl/extractor/vulture.py

deleted file mode 100644 (file)

index faa167e..0000000
--- a/youtube_dl/extractor/vulture.py
+++ /dev/null
@@ -1,69 +0,0 @@
-from __future__ import unicode_literals
-
-import json
-import os.path
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    parse_iso8601,
-)
-
-
-class VultureIE(InfoExtractor):
-    IE_NAME = 'vulture.com'
-    _VALID_URL = r'https?://video\.vulture\.com/video/(?P<display_id>[^/]+)/'
-    _TEST = {
-        'url': 'http://video.vulture.com/video/Mindy-Kaling-s-Harvard-Speech/player?layout=compact&read_more=1',
-        'md5': '8d997845642a2b5152820f7257871bc8',
-        'info_dict': {
-            'id': '6GHRQL3RV7MSD1H4',
-            'ext': 'mp4',
-            'title': 'kaling-speech-2-MAGNIFY STANDARD CONTAINER REVISED',
-            'uploader_id': 'Sarah',
-            'thumbnail': 're:^http://.*\.jpg$',
-            'timestamp': 1401288564,
-            'upload_date': '20140528',
-            'description': 'Uplifting and witty, as predicted.',
-            'duration': 1015,
-        }
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('display_id')
-
-        webpage = self._download_webpage(url, display_id)
-        query_string = self._search_regex(
-            r"queryString\s*=\s*'([^']+)'", webpage, 'query string')
-        video_id = self._search_regex(
-            r'content=([^&]+)', query_string, 'video ID')
-        query_url = 'http://video.vulture.com/embed/player/container/1000/1000/?%s' % query_string
-
-        query_webpage = self._download_webpage(
-            query_url, display_id, note='Downloading query page')
-        params_json = self._search_regex(
-            r'(?sm)new MagnifyEmbeddablePlayer\({.*?contentItem:\s*(\{.*?\})\n?,\n',
-            query_webpage,
-            'player params')
-        params = json.loads(params_json)
-
-        upload_timestamp = parse_iso8601(params['posted'].replace(' ', 'T'))
-        uploader_id = params.get('user', {}).get('handle')
-
-        media_item = params['media_item']
-        title = os.path.splitext(media_item['title'])[0]
-        duration = int_or_none(media_item.get('duration_seconds'))
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'url': media_item['pipeline_xid'],
-            'title': title,
-            'timestamp': upload_timestamp,
-            'thumbnail': params.get('thumbnail_url'),
-            'uploader_id': uploader_id,
-            'description': params.get('description'),
-            'duration': duration,
-        }
diff --git a/youtube_dl/extractor/vyborymos.py b/youtube_dl/extractor/vyborymos.py

new file mode 100644 (file)

index 0000000..9e703c4
--- /dev/null
+++ b/youtube_dl/extractor/vyborymos.py
@@ -0,0 +1,55 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+
+
+class VyboryMosIE(InfoExtractor):
+    _VALID_URL = r'https?://vybory\.mos\.ru/(?:#precinct/|account/channels\?.*?\bstation_id=)(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://vybory.mos.ru/#precinct/13636',
+        'info_dict': {
+            'id': '13636',
+            'ext': 'mp4',
+            'title': 're:^Участковая избирательная комиссия №2231 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+            'description': 'Россия, Москва, улица Введенского, 32А',
+            'is_live': True,
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://vybory.mos.ru/account/channels?station_id=13636',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        station_id = self._match_id(url)
+
+        channels = self._download_json(
+            'http://vybory.mos.ru/account/channels?station_id=%s' % station_id,
+            station_id, 'Downloading channels JSON')
+
+        formats = []
+        for cam_num, (sid, hosts, name, _) in enumerate(channels, 1):
+            for num, host in enumerate(hosts, 1):
+                formats.append({
+                    'url': 'http://%s/master.m3u8?sid=%s' % (host, sid),
+                    'ext': 'mp4',
+                    'format_id': 'camera%d-host%d' % (cam_num, num),
+                    'format_note': '%s, %s' % (name, host),
+                })
+
+        info = self._download_json(
+            'http://vybory.mos.ru/json/voting_stations/%s/%s.json'
+            % (compat_str(station_id)[:3], station_id),
+            station_id, 'Downloading station JSON', fatal=False)
+
+        return {
+            'id': station_id,
+            'title': self._live_title(info['name'] if info else station_id),
+            'description': info.get('address'),
+            'is_live': True,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/vzaar.py b/youtube_dl/extractor/vzaar.py

new file mode 100644 (file)

index 0000000..b270f08
--- /dev/null
+++ b/youtube_dl/extractor/vzaar.py
@@ -0,0 +1,55 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    float_or_none,
+)
+
+
+class VzaarIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:(?:www|view)\.)?vzaar\.com/(?:videos/)?(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://vzaar.com/videos/1152805',
+        'md5': 'bde5ddfeb104a6c56a93a06b04901dbf',
+        'info_dict': {
+            'id': '1152805',
+            'ext': 'mp4',
+            'title': 'sample video (public)',
+        },
+    }, {
+        'url': 'https://view.vzaar.com/27272/player',
+        'md5': '3b50012ac9bbce7f445550d54e0508f2',
+        'info_dict': {
+            'id': '27272',
+            'ext': 'mp3',
+            'title': 'MP3',
+        },
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_json(
+            'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
+        source_url = video_data['sourceUrl']
+
+        info = {
+            'id': video_id,
+            'title': video_data['videoTitle'],
+            'url': source_url,
+            'thumbnail': self._proto_relative_url(video_data.get('poster')),
+            'duration': float_or_none(video_data.get('videoDuration')),
+        }
+        if 'audio' in source_url:
+            info.update({
+                'vcodec': 'none',
+                'ext': 'mp3',
+            })
+        else:
+            info.update({
+                'width': int_or_none(video_data.get('width')),
+                'height': int_or_none(video_data.get('height')),
+                'ext': 'mp4',
+            })
+        return info
diff --git a/youtube_dl/extractor/washingtonpost.py b/youtube_dl/extractor/washingtonpost.py

index ec8b999983f6ae89a3bf53909e9d70a463f87f52..839cad986cbbf4edc8f73ca5639e780f210163c2 100644 (file)
--- a/youtube_dl/extractor/washingtonpost.py
+++ b/youtube_dl/extractor/washingtonpost.py
@@ -11,7 +11,96 @@ from ..utils import (
  
  
  class WashingtonPostIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?washingtonpost\.com/.*?/(?P<id>[^/]+)/(?:$|[?#])'
+    IE_NAME = 'washingtonpost'
+    _VALID_URL = r'(?:washingtonpost:|https?://(?:www\.)?washingtonpost\.com/video/(?:[^/]+/)*)(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
+    _TEST = {
+        'url': 'https://www.washingtonpost.com/video/c/video/480ba4ee-1ec7-11e6-82c2-a7dcb313287d',
+        'md5': '6f537e1334b714eb15f9563bd4b9cdfa',
+        'info_dict': {
+            'id': '480ba4ee-1ec7-11e6-82c2-a7dcb313287d',
+            'ext': 'mp4',
+            'title': 'Egypt finds belongings, debris from plane crash',
+            'description': 'md5:a17ceee432f215a5371388c1f680bd86',
+            'upload_date': '20160520',
+            'uploader': 'Reuters',
+            'timestamp': 1463778452,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_json(
+            'http://www.washingtonpost.com/posttv/c/videojson/%s?resType=jsonp' % video_id,
+            video_id, transform_source=strip_jsonp)[0]['contentConfig']
+        title = video_data['title']
+
+        urls = []
+        formats = []
+        for s in video_data.get('streams', []):
+            s_url = s.get('url')
+            if not s_url or s_url in urls:
+                continue
+            urls.append(s_url)
+            video_type = s.get('type')
+            if video_type == 'smil':
+                continue
+            elif video_type in ('ts', 'hls') and ('_master.m3u8' in s_url or '_mobile.m3u8' in s_url):
+                m3u8_formats = self._extract_m3u8_formats(
+                    s_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
+                for m3u8_format in m3u8_formats:
+                    width = m3u8_format.get('width')
+                    if not width:
+                        continue
+                    vbr = self._search_regex(
+                        r'%d_%d_(\d+)' % (width, m3u8_format['height']), m3u8_format['url'], 'vbr', default=None)
+                    if vbr:
+                        m3u8_format.update({
+                            'vbr': int_or_none(vbr),
+                        })
+                formats.extend(m3u8_formats)
+            else:
+                width = int_or_none(s.get('width'))
+                vbr = int_or_none(s.get('bitrate'))
+                has_width = width != 0
+                formats.append({
+                    'format_id': (
+                        '%s-%d-%d' % (video_type, width, vbr)
+                        if width
+                        else video_type),
+                    'vbr': vbr if has_width else None,
+                    'width': width,
+                    'height': int_or_none(s.get('height')),
+                    'acodec': s.get('audioCodec'),
+                    'vcodec': s.get('videoCodec') if has_width else 'none',
+                    'filesize': int_or_none(s.get('fileSize')),
+                    'url': s_url,
+                    'ext': 'mp4',
+                    'protocol': 'm3u8_native' if video_type in ('ts', 'hls') else None,
+                })
+        source_media_url = video_data.get('sourceMediaURL')
+        if source_media_url:
+            formats.append({
+                'format_id': 'source_media',
+                'url': source_media_url,
+            })
+        self._sort_formats(
+            formats, ('width', 'height', 'vbr', 'filesize', 'tbr', 'format_id'))
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('blurb'),
+            'uploader': video_data.get('credits', {}).get('source'),
+            'formats': formats,
+            'duration': int_or_none(video_data.get('videoDuration'), 100),
+            'timestamp': int_or_none(
+                video_data.get('dateConfig', {}).get('dateFirstPublished'), 1000),
+        }
+
+
+class WashingtonPostArticleIE(InfoExtractor):
+    IE_NAME = 'washingtonpost:article'
+    _VALID_URL = r'https?://(?:www\.)?washingtonpost\.com/(?:[^/]+/)*(?P<id>[^/?#]+)'
      _TESTS = [{
          'url': 'http://www.washingtonpost.com/sf/national/2014/03/22/sinkhole-of-bureaucracy/',
          'info_dict': {
@@ -63,6 +152,10 @@ class WashingtonPostIE(InfoExtractor):
          }]
      }]
  
+    @classmethod
+    def suitable(cls, url):
+        return False if WashingtonPostIE.suitable(url) else super(WashingtonPostArticleIE, cls).suitable(url)
+
      def _real_extract(self, url):
          page_id = self._match_id(url)
          webpage = self._download_webpage(url, page_id)
@@ -74,54 +167,7 @@ class WashingtonPostIE(InfoExtractor):
                  <div\s+class="posttv-video-embed[^>]*?data-uuid=|
                  data-video-uuid=
              )"([^"]+)"''', webpage)
-        entries = []
-        for i, uuid in enumerate(uuids, start=1):
-            vinfo_all = self._download_json(
-                'http://www.washingtonpost.com/posttv/c/videojson/%s?resType=jsonp' % uuid,
-                page_id,
-                transform_source=strip_jsonp,
-                note='Downloading information of video %d/%d' % (i, len(uuids))
-            )
-            vinfo = vinfo_all[0]['contentConfig']
-            uploader = vinfo.get('credits', {}).get('source')
-            timestamp = int_or_none(
-                vinfo.get('dateConfig', {}).get('dateFirstPublished'), 1000)
-
-            formats = [{
-                'format_id': (
-                    '%s-%s-%s' % (s.get('type'), s.get('width'), s.get('bitrate'))
-                    if s.get('width')
-                    else s.get('type')),
-                'vbr': s.get('bitrate') if s.get('width') != 0 else None,
-                'width': s.get('width'),
-                'height': s.get('height'),
-                'acodec': s.get('audioCodec'),
-                'vcodec': s.get('videoCodec') if s.get('width') != 0 else 'none',
-                'filesize': s.get('fileSize'),
-                'url': s.get('url'),
-                'ext': 'mp4',
-                'preference': -100 if s.get('type') == 'smil' else None,
-                'protocol': {
-                    'MP4': 'http',
-                    'F4F': 'f4m',
-                }.get(s.get('type')),
-            } for s in vinfo.get('streams', [])]
-            source_media_url = vinfo.get('sourceMediaURL')
-            if source_media_url:
-                formats.append({
-                    'format_id': 'source_media',
-                    'url': source_media_url,
-                })
-            self._sort_formats(formats)
-            entries.append({
-                'id': uuid,
-                'title': vinfo['title'],
-                'description': vinfo.get('blurb'),
-                'uploader': uploader,
-                'formats': formats,
-                'duration': int_or_none(vinfo.get('videoDuration'), 100),
-                'timestamp': timestamp,
-            })
+        entries = [self.url_result('washingtonpost:%s' % uuid, 'WashingtonPost', uuid) for uuid in uuids]
  
          return {
              '_type': 'playlist',
diff --git a/youtube_dl/extractor/wat.py b/youtube_dl/extractor/wat.py

index 5227bb5ad9a2cd4f71c156cd8ca9bb3f5fbd5d17..20fef1f04ea776ba21869dfca9e46bd6af591c9f 100644 (file)
--- a/youtube_dl/extractor/wat.py
+++ b/youtube_dl/extractor/wat.py
@@ -2,25 +2,26 @@
  from __future__ import unicode_literals
  
  import re
-import hashlib
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
      unified_strdate,
+    HEADRequest,
+    int_or_none,
  )
  
  
  class WatIE(InfoExtractor):
-    _VALID_URL = r'(?:wat:(?P<real_id>\d{8})|https?://www\.wat\.tv/video/(?P<display_id>.*)-(?P<short_id>.*?)_.*?\.html)'
+    _VALID_URL = r'(?:wat:|https?://(?:www\.)?wat\.tv/video/.*-)(?P<id>[0-9a-z]+)'
      IE_NAME = 'wat.tv'
      _TESTS = [
          {
              'url': 'http://www.wat.tv/video/soupe-figues-l-orange-aux-epices-6z1uz_2hvf7_.html',
-            'md5': 'ce70e9223945ed26a8056d413ca55dc9',
+            'md5': '83d882d9de5c9d97f0bb2c6273cde56a',
              'info_dict': {
                  'id': '11713067',
-                'display_id': 'soupe-figues-l-orange-aux-epices',
                  'ext': 'mp4',
                  'title': 'Soupe de figues à l\'orange et aux épices',
                  'description': 'Retrouvez l\'émission "Petits plats en équilibre", diffusée le 18 août 2014.',
@@ -30,110 +31,136 @@ class WatIE(InfoExtractor):
          },
          {
              'url': 'http://www.wat.tv/video/gregory-lemarchal-voix-ange-6z1v7_6ygkj_.html',
-            'md5': 'fbc84e4378165278e743956d9c1bf16b',
+            'md5': '34bdfa5ca9fd3c7eb88601b635b0424c',
              'info_dict': {
                  'id': '11713075',
-                'display_id': 'gregory-lemarchal-voix-ange',
                  'ext': 'mp4',
                  'title': 'Grégory Lemarchal, une voix d\'ange depuis 10 ans (1/3)',
-                'description': 'md5:b7a849cf16a2b733d9cd10c52906dee3',
                  'upload_date': '20140816',
-                'duration': 2910,
              },
-            'skip': "Ce contenu n'est pas disponible pour l'instant.",
+            'expected_warnings': ["Ce contenu n'est pas disponible pour l'instant."],
          },
      ]
  
-    def download_video_info(self, real_id):
-        # 'contentv4' is used in the website, but it also returns the related
-        # videos, we don't need them
-        info = self._download_json('http://www.wat.tv/interface/contentv3/' + real_id, real_id)
-        return info['media']
+    _FORMATS = (
+        (200, 416, 234),
+        (400, 480, 270),
+        (600, 640, 360),
+        (1200, 640, 360),
+        (1800, 960, 540),
+        (2500, 1280, 720),
+    )
  
      def _real_extract(self, url):
-        def real_id_for_chapter(chapter):
-            return chapter['tc_start'].split('-')[0]
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('display_id')
-        real_id = mobj.group('real_id')
-        if not real_id:
-            short_id = mobj.group('short_id')
-            webpage = self._download_webpage(url, display_id or short_id)
-            real_id = self._search_regex(r'xtpage = ".*-(.*?)";', webpage, 'real id')
-
-        video_info = self.download_video_info(real_id)
+        video_id = self._match_id(url)
+        video_id = video_id if video_id.isdigit() and len(video_id) > 6 else compat_str(int(video_id, 36))
+
+        # 'contentv4' is used in the website, but it also returns the related
+        # videos, we don't need them
+        video_data = self._download_json(
+            'http://www.wat.tv/interface/contentv4s/' + video_id, video_id)
+        video_info = video_data['media']
  
          error_desc = video_info.get('error_desc')
          if error_desc:
-            raise ExtractorError(
-                '%s returned error: %s' % (self.IE_NAME, error_desc), expected=True)
-
-        geo_list = video_info.get('geoList')
-        country = geo_list[0] if geo_list else ''
+            self.report_warning(
+                '%s returned error: %s' % (self.IE_NAME, error_desc))
  
          chapters = video_info['chapters']
-        first_chapter = chapters[0]
+        if chapters:
+            first_chapter = chapters[0]
+
+            def video_id_for_chapter(chapter):
+                return chapter['tc_start'].split('-')[0]
+
+            if video_id_for_chapter(first_chapter) != video_id:
+                self.to_screen('Multipart video detected')
+                entries = [self.url_result('wat:%s' % video_id_for_chapter(chapter)) for chapter in chapters]
+                return self.playlist_result(entries, video_id, video_info['title'])
+            # Otherwise we can continue and extract just one part, we have to use
+            # the video id for getting the video url
+        else:
+            first_chapter = video_info
+
+        title = first_chapter['title']
+
+        def extract_url(path_template, url_type):
+            req_url = 'http://www.wat.tv/get/%s' % (path_template % video_id)
+            head = self._request_webpage(HEADRequest(req_url), video_id, 'Extracting %s url' % url_type, fatal=False)
+            if head:
+                red_url = head.geturl()
+                if req_url != red_url:
+                    return red_url
+            return None
+
+        def remove_bitrate_limit(manifest_url):
+            return re.sub(r'(?:max|min)_bitrate=\d+&?', '', manifest_url)
+
+        formats = []
+        try:
+            manifest_urls = self._download_json(
+                'http://www.wat.tv/get/webhtml/' + video_id, video_id)
+            m3u8_url = manifest_urls.get('hls')
+            if m3u8_url:
+                m3u8_url = remove_bitrate_limit(m3u8_url)
+                m3u8_formats = self._extract_m3u8_formats(
+                    m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
+                if m3u8_formats:
+                    formats.extend(m3u8_formats)
+                    formats.extend(self._extract_f4m_formats(
+                        m3u8_url.replace('ios', 'web').replace('.m3u8', '.f4m'),
+                        video_id, f4m_id='hds', fatal=False))
+                    http_url = extract_url('android5/%s.mp4', 'http')
+                    if http_url:
+                        for m3u8_format in m3u8_formats:
+                            vbr, abr = m3u8_format.get('vbr'), m3u8_format.get('abr')
+                            if not vbr or not abr:
+                                continue
+                            format_id = m3u8_format['format_id'].replace('hls', 'http')
+                            fmt_url = re.sub(r'%s-\d+00-\d+' % video_id, '%s-%d00-%d' % (video_id, round(vbr / 100), round(abr)), http_url)
+                            if self._is_valid_url(fmt_url, video_id, format_id):
+                                f = m3u8_format.copy()
+                                f.update({
+                                    'url': fmt_url,
+                                    'format_id': format_id,
+                                    'protocol': 'http',
+                                })
+                                formats.append(f)
+            mpd_url = manifest_urls.get('mpd')
+            if mpd_url:
+                formats.extend(self._extract_mpd_formats(remove_bitrate_limit(
+                    mpd_url), video_id, mpd_id='dash', fatal=False))
+            self._sort_formats(formats)
+        except ExtractorError:
+            abr = 64
+            for vbr, width, height in self._FORMATS:
+                tbr = vbr + abr
+                format_id = 'http-%s' % tbr
+                fmt_url = 'http://dnl.adv.tf1.fr/2/USP-0x0/%s/%s/%s/ssm/%s-%s-64k.mp4' % (video_id[-4:-2], video_id[-2:], video_id, video_id, vbr)
+                if self._is_valid_url(fmt_url, video_id, format_id):
+                    formats.append({
+                        'format_id': format_id,
+                        'url': fmt_url,
+                        'vbr': vbr,
+                        'abr': abr,
+                        'width': width,
+                        'height': height,
+                    })
+
+        date_diffusion = first_chapter.get('date_diffusion') or video_data.get('configv4', {}).get('estatS4')
+        upload_date = unified_strdate(date_diffusion) if date_diffusion else None
+        duration = None
          files = video_info['files']
-        first_file = files[0]
-
-        if real_id_for_chapter(first_chapter) != real_id:
-            self.to_screen('Multipart video detected')
-            chapter_urls = []
-            for chapter in chapters:
-                chapter_id = real_id_for_chapter(chapter)
-                # Yes, when we this chapter is processed by WatIE,
-                # it will download the info again
-                chapter_info = self.download_video_info(chapter_id)
-                chapter_urls.append(chapter_info['url'])
-            entries = [self.url_result(chapter_url) for chapter_url in chapter_urls]
-            return self.playlist_result(entries, real_id, video_info['title'])
-
-        upload_date = None
-        if 'date_diffusion' in first_chapter:
-            upload_date = unified_strdate(first_chapter['date_diffusion'])
-        # Otherwise we can continue and extract just one part, we have to use
-        # the short id for getting the video url
-
-        formats = [{
-            'url': 'http://wat.tv/get/android5/%s.mp4' % real_id,
-            'format_id': 'Mobile',
-        }]
-
-        fmts = [('SD', 'web')]
-        if first_file.get('hasHD'):
-            fmts.append(('HD', 'webhd'))
-
-        def compute_token(param):
-            timestamp = '%08x' % int(self._download_webpage(
-                'http://www.wat.tv/servertime', real_id,
-                'Downloading server time').split('|')[0])
-            magic = '9b673b13fa4682ed14c3cfa5af5310274b514c4133e9b3a81e6e3aba009l2564'
-            return '%s/%s' % (hashlib.md5((magic + param + timestamp).encode('ascii')).hexdigest(), timestamp)
-
-        for fmt in fmts:
-            webid = '/%s/%s' % (fmt[1], real_id)
-            video_url = self._download_webpage(
-                'http://www.wat.tv/get%s?token=%s&getURL=1&country=%s' % (webid, compute_token(webid), country),
-                real_id,
-                'Downloading %s video URL' % fmt[0],
-                'Failed to download %s video URL' % fmt[0],
-                False)
-            if not video_url:
-                continue
-            formats.append({
-                'url': video_url,
-                'ext': 'mp4',
-                'format_id': fmt[0],
-            })
+        if files:
+            duration = int_or_none(files[0].get('duration'))
  
          return {
-            'id': real_id,
-            'display_id': display_id,
-            'title': first_chapter['title'],
-            'thumbnail': first_chapter['preview'],
-            'description': first_chapter['description'],
-            'view_count': video_info['views'],
+            'id': video_id,
+            'title': title,
+            'thumbnail': first_chapter.get('preview'),
+            'description': first_chapter.get('description'),
+            'view_count': int_or_none(video_info.get('views')),
              'upload_date': upload_date,
-            'duration': first_file['duration'],
+            'duration': duration,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/watchindianporn.py b/youtube_dl/extractor/watchindianporn.py

new file mode 100644 (file)

index 0000000..5d3b5bd
--- /dev/null
+++ b/youtube_dl/extractor/watchindianporn.py
@@ -0,0 +1,90 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    unified_strdate,
+    parse_duration,
+    int_or_none,
+)
+
+
+class WatchIndianPornIE(InfoExtractor):
+    IE_DESC = 'Watch Indian Porn'
+    _VALID_URL = r'https?://(?:www\.)?watchindianporn\.net/(?:[^/]+/)*video/(?P<display_id>[^/]+)-(?P<id>[a-zA-Z0-9]+)\.html'
+    _TEST = {
+        'url': 'http://www.watchindianporn.net/video/hot-milf-from-kerala-shows-off-her-gorgeous-large-breasts-on-camera-RZa2avywNPa.html',
+        'md5': '249589a164dde236ec65832bfce17440',
+        'info_dict': {
+            'id': 'RZa2avywNPa',
+            'display_id': 'hot-milf-from-kerala-shows-off-her-gorgeous-large-breasts-on-camera',
+            'ext': 'mp4',
+            'title': 'Hot milf from kerala shows off her gorgeous large breasts on camera',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'LoveJay',
+            'upload_date': '20160428',
+            'duration': 226,
+            'view_count': int,
+            'comment_count': int,
+            'categories': list,
+            'age_limit': 18,
+        }
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        webpage = self._download_webpage(url, display_id)
+
+        video_url = self._html_search_regex(
+            r"url: escape\('([^']+)'\)", webpage, 'url')
+
+        title = self._html_search_regex(
+            r'<h2 class="he2"><span>(.*?)</span>',
+            webpage, 'title')
+        thumbnail = self._html_search_regex(
+            r'<span id="container"><img\s+src="([^"]+)"',
+            webpage, 'thumbnail', fatal=False)
+
+        uploader = self._html_search_regex(
+            r'class="aupa">\s*(.*?)</a>',
+            webpage, 'uploader')
+        upload_date = unified_strdate(self._html_search_regex(
+            r'Added: <strong>(.+?)</strong>', webpage, 'upload date', fatal=False))
+
+        duration = parse_duration(self._search_regex(
+            r'<td>Time:\s*</td>\s*<td align="right"><span>\s*(.+?)\s*</span>',
+            webpage, 'duration', fatal=False))
+
+        view_count = int_or_none(self._search_regex(
+            r'<td>Views:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>',
+            webpage, 'view count', fatal=False))
+        comment_count = int_or_none(self._search_regex(
+            r'<td>Comments:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>',
+            webpage, 'comment count', fatal=False))
+
+        categories = re.findall(
+            r'<a href="[^"]+/search/video/desi"><span>([^<]+)</span></a>',
+            webpage)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'url': video_url,
+            'http_headers': {
+                'Referer': url,
+            },
+            'title': title,
+            'thumbnail': thumbnail,
+            'uploader': uploader,
+            'upload_date': upload_date,
+            'duration': duration,
+            'view_count': view_count,
+            'comment_count': comment_count,
+            'categories': categories,
+            'age_limit': 18,
+        }
diff --git a/youtube_dl/extractor/wayofthemaster.py b/youtube_dl/extractor/wayofthemaster.py

deleted file mode 100644 (file)

index af7bb8b..0000000
--- a/youtube_dl/extractor/wayofthemaster.py
+++ /dev/null
@@ -1,52 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class WayOfTheMasterIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.wayofthemaster\.com/([^/?#]*/)*(?P<id>[^/?#]+)\.s?html(?:$|[?#])'
-
-    _TEST = {
-        'url': 'http://www.wayofthemaster.com/hbks.shtml',
-        'md5': '5316b57487ada8480606a93cb3d18d24',
-        'info_dict': {
-            'id': 'hbks',
-            'ext': 'mp4',
-            'title': 'Intelligent Design vs. Evolution',
-        },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._search_regex(
-            r'<img src="images/title_[^"]+".*?alt="([^"]+)"',
-            webpage, 'title', default=None)
-        if title is None:
-            title = self._html_search_regex(
-                r'<title>(.*?)</title>', webpage, 'page title')
-
-        url_base = self._search_regex(
-            r'<param\s+name="?movie"?\s+value=".*?/wotm_videoplayer_highlow[0-9]*\.swf\?vid=([^"]+)"',
-            webpage, 'URL base')
-        formats = [{
-            'format_id': 'low',
-            'quality': 1,
-            'url': url_base + '_low.mp4',
-        }, {
-            'format_id': 'high',
-            'quality': 2,
-            'url': url_base + '_high.mp4',
-        }]
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/wdr.py b/youtube_dl/extractor/wdr.py

index 31c90430327da895ffc974c1d489cb4c92689d2f..f7e6360a33e8b6d2cc3096232bfa1d2c458ab3c7 100644 (file)
--- a/youtube_dl/extractor/wdr.py
+++ b/youtube_dl/extractor/wdr.py
@@ -1,215 +1,236 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
-import itertools
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_parse_qs,
-    compat_urlparse,
-)
  from ..utils import (
+    determine_ext,
+    ExtractorError,
+    js_to_json,
+    strip_jsonp,
      unified_strdate,
-    qualities,
+    update_url_query,
+    urlhandle_detect_ext,
  )
  
  
-class WDRIE(InfoExtractor):
-    _PLAYER_REGEX = '-(?:video|audio)player(?:_size-[LMS])?'
-    _VALID_URL = r'(?P<url>https?://www\d?\.(?:wdr\d?|funkhauseuropa)\.de/)(?P<id>.+?)(?P<player>%s)?\.html' % _PLAYER_REGEX
+class WDRBaseIE(InfoExtractor):
+    def _extract_wdr_video(self, webpage, display_id):
+        # for wdr.de the data-extension is in a tag with the class "mediaLink"
+        # for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
+        # for wdrmaus its in a link to the page in a multiline "videoLink"-tag
+        json_metadata = self._html_search_regex(
+            r'class=(?:"(?:mediaLink|wdrrPlayerPlayBtn)\b[^"]*"[^>]+|"videoLink\b[^"]*"[\s]*>\n[^\n]*)data-extension="([^"]+)"',
+            webpage, 'media link', default=None, flags=re.MULTILINE)
+
+        if not json_metadata:
+            return
+
+        media_link_obj = self._parse_json(json_metadata, display_id,
+                                          transform_source=js_to_json)
+        jsonp_url = media_link_obj['mediaObj']['url']
+
+        metadata = self._download_json(
+            jsonp_url, 'metadata', transform_source=strip_jsonp)
+
+        metadata_tracker_data = metadata['trackerData']
+        metadata_media_resource = metadata['mediaResource']
+
+        formats = []
+
+        # check if the metadata contains a direct URL to a file
+        for kind, media_resource in metadata_media_resource.items():
+            if kind not in ('dflt', 'alt'):
+                continue
+
+            for tag_name, medium_url in media_resource.items():
+                if tag_name not in ('videoURL', 'audioURL'):
+                    continue
+
+                ext = determine_ext(medium_url)
+                if ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        medium_url, display_id, 'mp4', 'm3u8_native',
+                        m3u8_id='hls'))
+                elif ext == 'f4m':
+                    manifest_url = update_url_query(
+                        medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'})
+                    formats.extend(self._extract_f4m_formats(
+                        manifest_url, display_id, f4m_id='hds', fatal=False))
+                elif ext == 'smil':
+                    formats.extend(self._extract_smil_formats(
+                        medium_url, 'stream', fatal=False))
+                else:
+                    a_format = {
+                        'url': medium_url
+                    }
+                    if ext == 'unknown_video':
+                        urlh = self._request_webpage(
+                            medium_url, display_id, note='Determining extension')
+                        ext = urlhandle_detect_ext(urlh)
+                        a_format['ext'] = ext
+                    formats.append(a_format)
+
+        self._sort_formats(formats)
+
+        subtitles = {}
+        caption_url = metadata_media_resource.get('captionURL')
+        if caption_url:
+            subtitles['de'] = [{
+                'url': caption_url,
+                'ext': 'ttml',
+            }]
+
+        title = metadata_tracker_data['trackerClipTitle']
+
+        return {
+            'id': metadata_tracker_data.get('trackerClipId', display_id),
+            'display_id': display_id,
+            'title': title,
+            'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
+            'formats': formats,
+            'subtitles': subtitles,
+            'upload_date': unified_strdate(metadata_tracker_data.get('trackerClipAirTime')),
+        }
+
+
+class WDRIE(WDRBaseIE):
+    _CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5'
+    _PAGE_REGEX = r'/(?:mediathek/)?[^/]+/(?P<type>[^/]+)/(?P<display_id>.+)\.html'
+    _VALID_URL = r'(?P<page_url>https?://(?:www\d\.)?wdr\d?\.de)' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
  
      _TESTS = [
          {
-            'url': 'http://www1.wdr.de/mediathek/video/sendungen/servicezeit/videoservicezeit560-videoplayer_size-L.html',
+            'url': 'http://www1.wdr.de/mediathek/video/sendungen/doku-am-freitag/video-geheimnis-aachener-dom-100.html',
+            # HDS download, MD5 is unstable
              'info_dict': {
-                'id': 'mdb-362427',
+                'id': 'mdb-1058683',
                  'ext': 'flv',
-                'title': 'Servicezeit',
-                'description': 'md5:c8f43e5e815eeb54d0b96df2fba906cb',
-                'upload_date': '20140310',
-                'is_live': False
+                'display_id': 'doku-am-freitag/video-geheimnis-aachener-dom-100',
+                'title': 'Geheimnis Aachener Dom',
+                'alt_title': 'Doku am Freitag',
+                'upload_date': '20160304',
+                'description': 'md5:87be8ff14d8dfd7a7ee46f0299b52318',
+                'is_live': False,
+                'subtitles': {'de': [{
+                    'url': 'http://ondemand-ww.wdr.de/medp/fsk0/105/1058683/1058683_12220974.xml',
+                    'ext': 'ttml',
+                }]},
              },
-            'params': {
-                'skip_download': True,
+        },
+        {
+            'url': 'http://www1.wdr.de/mediathek/audio/wdr3/wdr3-gespraech-am-samstag/audio-schriftstellerin-juli-zeh-100.html',
+            'md5': 'f4c1f96d01cf285240f53ea4309663d8',
+            'info_dict': {
+                'id': 'mdb-1072000',
+                'ext': 'mp3',
+                'display_id': 'wdr3-gespraech-am-samstag/audio-schriftstellerin-juli-zeh-100',
+                'title': 'Schriftstellerin Juli Zeh',
+                'alt_title': 'WDR 3 Gespräch am Samstag',
+                'upload_date': '20160312',
+                'description': 'md5:e127d320bc2b1f149be697ce044a3dd7',
+                'is_live': False,
+                'subtitles': {}
              },
-            'skip': 'Page Not Found',
          },
          {
-            'url': 'http://www1.wdr.de/themen/av/videomargaspiegelisttot101-videoplayer.html',
+            'url': 'http://www1.wdr.de/mediathek/video/live/index.html',
              'info_dict': {
-                'id': 'mdb-363194',
-                'ext': 'flv',
-                'title': 'Marga Spiegel ist tot',
-                'description': 'md5:2309992a6716c347891c045be50992e4',
-                'upload_date': '20140311',
-                'is_live': False
+                'id': 'mdb-103364',
+                'ext': 'mp4',
+                'display_id': 'index',
+                'title': r're:^WDR Fernsehen im Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+                'alt_title': 'WDR Fernsehen Live',
+                'upload_date': None,
+                'description': 'md5:ae2ff888510623bf8d4b115f95a9b7c9',
+                'is_live': True,
+                'subtitles': {}
              },
              'params': {
-                'skip_download': True,
+                'skip_download': True,  # m3u8 download
              },
-            'skip': 'Page Not Found',
          },
          {
-            'url': 'http://www1.wdr.de/themen/kultur/audioerlebtegeschichtenmargaspiegel100-audioplayer.html',
-            'md5': '83e9e8fefad36f357278759870805898',
+            'url': 'http://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html',
+            'playlist_mincount': 8,
              'info_dict': {
-                'id': 'mdb-194332',
-                'ext': 'mp3',
-                'title': 'Erlebte Geschichten: Marga Spiegel (29.11.2009)',
-                'description': 'md5:2309992a6716c347891c045be50992e4',
-                'upload_date': '20091129',
-                'is_live': False
+                'id': 'aktuelle-stunde/aktuelle-stunde-120',
              },
          },
          {
-            'url': 'http://www.funkhauseuropa.de/av/audioflaviacoelhoamaramar100-audioplayer.html',
-            'md5': '99a1443ff29af19f6c52cf6f4dc1f4aa',
+            'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
              'info_dict': {
-                'id': 'mdb-478135',
-                'ext': 'mp3',
-                'title': 'Flavia Coelho: Amar é Amar',
-                'description': 'md5:7b29e97e10dfb6e265238b32fa35b23a',
-                'upload_date': '20140717',
-                'is_live': False
+                'id': 'mdb-1096487',
+                'ext': 'flv',
+                'upload_date': 're:^[0-9]{8}$',
+                'title': 're:^Die Sendung mit der Maus vom [0-9.]{10}$',
+                'description': '- Die Sendung mit der Maus -',
              },
-            'skip': 'Page Not Found',
+            'skip': 'The id changes from week to week because of the new episode'
          },
          {
-            'url': 'http://www1.wdr.de/mediathek/video/sendungen/quarks_und_co/filterseite-quarks-und-co100.html',
-            'playlist_mincount': 146,
+            'url': 'http://www.wdrmaus.de/sachgeschichten/sachgeschichten/achterbahn.php5',
+            'md5': '803138901f6368ee497b4d195bb164f2',
              'info_dict': {
-                'id': 'mediathek/video/sendungen/quarks_und_co/filterseite-quarks-und-co100',
-            }
+                'id': 'mdb-186083',
+                'ext': 'mp4',
+                'upload_date': '20130919',
+                'title': 'Sachgeschichte - Achterbahn ',
+                'description': '- Die Sendung mit der Maus -',
+            },
          },
          {
-            'url': 'http://www1.wdr.de/mediathek/video/livestream/index.html',
+            'url': 'http://www1.wdr.de/radio/player/radioplayer116~_layout-popupVersion.html',
+            # Live stream, MD5 unstable
              'info_dict': {
-                'id': 'mdb-103364',
-                'title': 're:^WDR Fernsehen Live [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-                'description': 'md5:ae2ff888510623bf8d4b115f95a9b7c9',
+                'id': 'mdb-869971',
                  'ext': 'flv',
-                'upload_date': '20150101',
-                'is_live': True
-            },
-            'params': {
-                'skip_download': True,
+                'title': 'Funkhaus Europa Livestream',
+                'description': 'md5:2309992a6716c347891c045be50992e4',
+                'upload_date': '20160101',
              },
          }
      ]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        page_url = mobj.group('url')
-        page_id = mobj.group('id')
+        url_type = mobj.group('type')
+        page_url = mobj.group('page_url')
+        display_id = mobj.group('display_id')
+        webpage = self._download_webpage(url, display_id)
  
-        webpage = self._download_webpage(url, page_id)
+        info_dict = self._extract_wdr_video(webpage, display_id)
  
-        if mobj.group('player') is None:
+        if not info_dict:
              entries = [
-                self.url_result(page_url + href, 'WDR')
+                self.url_result(page_url + href[0], 'WDR')
                  for href in re.findall(
-                    r'<a href="/?(.+?%s\.html)" rel="nofollow"' % self._PLAYER_REGEX,
+                    r'<a href="(%s)"[^>]+data-extension=' % self._PAGE_REGEX,
                      webpage)
              ]
  
              if entries:  # Playlist page
-                return self.playlist_result(entries, page_id)
+                return self.playlist_result(entries, playlist_id=display_id)
  
-            # Overview page
-            entries = []
-            for page_num in itertools.count(2):
-                hrefs = re.findall(
-                    r'<li class="mediathekvideo"\s*>\s*<img[^>]*>\s*<a href="(/mediathek/video/[^"]+)"',
-                    webpage)
-                entries.extend(
-                    self.url_result(page_url + href, 'WDR')
-                    for href in hrefs)
-                next_url_m = re.search(
-                    r'<li class="nextToLast">\s*<a href="([^"]+)"', webpage)
-                if not next_url_m:
-                    break
-                next_url = page_url + next_url_m.group(1)
-                webpage = self._download_webpage(
-                    next_url, page_id,
-                    note='Downloading playlist page %d' % page_num)
-            return self.playlist_result(entries, page_id)
-
-        flashvars = compat_parse_qs(self._html_search_regex(
-            r'<param name="flashvars" value="([^"]+)"', webpage, 'flashvars'))
+            raise ExtractorError('No downloadable streams found', expected=True)
  
-        page_id = flashvars['trackerClipId'][0]
-        video_url = flashvars['dslSrc'][0]
-        title = flashvars['trackerClipTitle'][0]
-        thumbnail = flashvars['startPicture'][0] if 'startPicture' in flashvars else None
-        is_live = flashvars.get('isLive', ['0'])[0] == '1'
+        is_live = url_type == 'live'
  
          if is_live:
-            title = self._live_title(title)
-
-        if 'trackerClipAirTime' in flashvars:
-            upload_date = flashvars['trackerClipAirTime'][0]
-        else:
-            upload_date = self._html_search_meta(
-                'DC.Date', webpage, 'upload date')
-
-        if upload_date:
-            upload_date = unified_strdate(upload_date)
-
-        formats = []
-        preference = qualities(['S', 'M', 'L', 'XL'])
-
-        if video_url.endswith('.f4m'):
-            formats.extend(self._extract_f4m_formats(
-                video_url + '?hdcore=3.2.0&plugin=aasp-3.2.0.77.18', page_id,
-                f4m_id='hds', fatal=False))
-        elif video_url.endswith('.smil'):
-            formats.extend(self._extract_smil_formats(
-                video_url, page_id, False, {
-                    'hdcore': '3.3.0',
-                    'plugin': 'aasp-3.3.0.99.43',
-                }))
-        else:
-            formats.append({
-                'url': video_url,
-                'http_headers': {
-                    'User-Agent': 'mobile',
-                },
+            info_dict.update({
+                'title': self._live_title(info_dict['title']),
+                'upload_date': None,
              })
+        elif 'upload_date' not in info_dict:
+            info_dict['upload_date'] = unified_strdate(self._html_search_meta('DC.Date', webpage, 'upload date'))
  
-        m3u8_url = self._search_regex(
-            r'rel="adaptiv"[^>]+href="([^"]+)"',
-            webpage, 'm3u8 url', default=None)
-        if m3u8_url:
-            formats.extend(self._extract_m3u8_formats(
-                m3u8_url, page_id, 'mp4', 'm3u8_native',
-                m3u8_id='hls', fatal=False))
+        info_dict.update({
+            'description': self._html_search_meta('Description', webpage),
+            'is_live': is_live,
+        })
  
-        direct_urls = re.findall(
-            r'rel="web(S|M|L|XL)"[^>]+href="([^"]+)"', webpage)
-        if direct_urls:
-            for quality, video_url in direct_urls:
-                formats.append({
-                    'url': video_url,
-                    'preference': preference(quality),
-                    'http_headers': {
-                        'User-Agent': 'mobile',
-                    },
-                })
-
-        self._sort_formats(formats)
-
-        description = self._html_search_meta('Description', webpage, 'description')
-
-        return {
-            'id': page_id,
-            'formats': formats,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'upload_date': upload_date,
-            'is_live': is_live
-        }
+        return info_dict
  
  
  class WDRMobileIE(InfoExtractor):
@@ -241,81 +262,3 @@ class WDRMobileIE(InfoExtractor):
                  'User-Agent': 'mobile',
              },
          }
-
-
-class WDRMausIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?wdrmaus\.de/(?:[^/]+/){,2}(?P<id>[^/?#]+)(?:/index\.php5|(?<!index)\.php5|/(?:$|[?#]))'
-    IE_DESC = 'Sendung mit der Maus'
-    _TESTS = [{
-        'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
-        'info_dict': {
-            'id': 'aktuelle-sendung',
-            'ext': 'mp4',
-            'thumbnail': 're:^http://.+\.jpg',
-            'upload_date': 're:^[0-9]{8}$',
-            'title': 're:^[0-9.]{10} - Aktuelle Sendung$',
-        }
-    }, {
-        'url': 'http://www.wdrmaus.de/sachgeschichten/sachgeschichten/40_jahre_maus.php5',
-        'md5': '3b1227ca3ed28d73ec5737c65743b2a3',
-        'info_dict': {
-            'id': '40_jahre_maus',
-            'ext': 'mp4',
-            'thumbnail': 're:^http://.+\.jpg',
-            'upload_date': '20131007',
-            'title': '12.03.2011 - 40 Jahre Maus',
-        }
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-        param_code = self._html_search_regex(
-            r'<a href="\?startVideo=1&amp;([^"]+)"', webpage, 'parameters')
-
-        title_date = self._search_regex(
-            r'<div class="sendedatum"><p>Sendedatum:\s*([0-9\.]+)</p>',
-            webpage, 'air date')
-        title_str = self._html_search_regex(
-            r'<h1>(.*?)</h1>', webpage, 'title')
-        title = '%s - %s' % (title_date, title_str)
-        upload_date = unified_strdate(
-            self._html_search_meta('dc.date', webpage))
-
-        fields = compat_parse_qs(param_code)
-        video_url = fields['firstVideo'][0]
-        thumbnail = compat_urlparse.urljoin(url, fields['startPicture'][0])
-
-        formats = [{
-            'format_id': 'rtmp',
-            'url': video_url,
-        }]
-
-        jscode = self._download_webpage(
-            'http://www.wdrmaus.de/codebase/js/extended-medien.min.js',
-            video_id, fatal=False,
-            note='Downloading URL translation table',
-            errnote='Could not download URL translation table')
-        if jscode:
-            for m in re.finditer(
-                    r"stream:\s*'dslSrc=(?P<stream>[^']+)',\s*download:\s*'(?P<dl>[^']+)'\s*\}",
-                    jscode):
-                if video_url.startswith(m.group('stream')):
-                    http_url = video_url.replace(
-                        m.group('stream'), m.group('dl'))
-                    formats.append({
-                        'format_id': 'http',
-                        'url': http_url,
-                    })
-                    break
-
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-            'thumbnail': thumbnail,
-            'upload_date': upload_date,
-        }
diff --git a/youtube_dl/extractor/weibo.py b/youtube_dl/extractor/weibo.py

deleted file mode 100644 (file)

index 20bb039..0000000
--- a/youtube_dl/extractor/weibo.py
+++ /dev/null
@@ -1,49 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class WeiboIE(InfoExtractor):
-    """
-    The videos in Weibo come from different sites, this IE just finds the link
-    to the external video and returns it.
-    """
-    _VALID_URL = r'https?://video\.weibo\.com/v/weishipin/t_(?P<id>.+?)\.htm'
-
-    _TEST = {
-        'url': 'http://video.weibo.com/v/weishipin/t_zjUw2kZ.htm',
-        'info_dict': {
-            'id': '98322879',
-            'ext': 'flv',
-            'title': '魔声耳机最新广告“All Eyes On Us”',
-        },
-        'params': {
-            'skip_download': True,
-        },
-        'add_ie': ['Sina'],
-    }
-
-    # Additional example videos from different sites
-    # Youku: http://video.weibo.com/v/weishipin/t_zQGDWQ8.htm
-    # 56.com: http://video.weibo.com/v/weishipin/t_zQ44HxN.htm
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE)
-        video_id = mobj.group('id')
-        info_url = 'http://video.weibo.com/?s=v&a=play_list&format=json&mix_video_id=t_%s' % video_id
-        info = self._download_json(info_url, video_id)
-
-        videos_urls = map(lambda v: v['play_page_url'], info['result']['data'])
-        # Prefer sina video since they have thumbnails
-        videos_urls = sorted(videos_urls, key=lambda u: 'video.sina.com' in u)
-        player_url = videos_urls[-1]
-        m_sina = re.match(r'https?://video\.sina\.com\.cn/v/b/(\d+)-\d+\.html',
-                          player_url)
-        if m_sina is not None:
-            self.to_screen('Sina video detected')
-            sina_id = m_sina.group(1)
-            player_url = 'http://you.video.sina.com.cn/swf/quotePlayer.swf?vid=%s' % sina_id
-        return self.url_result(player_url)
diff --git a/youtube_dl/extractor/weiqitv.py b/youtube_dl/extractor/weiqitv.py

index 3dafbeec2c5f7ba0b2e18ec621c67966214d3307..8e09156c26c58b4cc184dbe97e679ee9b8dfa47f 100644 (file)
--- a/youtube_dl/extractor/weiqitv.py
+++ b/youtube_dl/extractor/weiqitv.py
@@ -6,7 +6,7 @@ from .common import InfoExtractor
  
  class WeiqiTVIE(InfoExtractor):
      IE_DESC = 'WQTV'
-    _VALID_URL = r'https?://www\.weiqitv\.com/index/video_play\?videoId=(?P<id>[A-Za-z0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?weiqitv\.com/index/video_play\?videoId=(?P<id>[A-Za-z0-9]+)'
  
      _TESTS = [{
          'url': 'http://www.weiqitv.com/index/video_play?videoId=53c744f09874f0e76a8b46f3',
diff --git a/youtube_dl/extractor/wimp.py b/youtube_dl/extractor/wimp.py

index 828c03dc38c4d4d4668f6dfb66e4cc29c51fd7e5..54eb5142793827f8b733592d22b979d326593bee 100644 (file)
--- a/youtube_dl/extractor/wimp.py
+++ b/youtube_dl/extractor/wimp.py
@@ -1,29 +1,33 @@
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
  from .youtube import YoutubeIE
+from .jwplatform import JWPlatformBaseIE
  
  
-class WimpIE(InfoExtractor):
+class WimpIE(JWPlatformBaseIE):
      _VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)'
      _TESTS = [{
-        'url': 'http://www.wimp.com/maruexhausted/',
+        'url': 'http://www.wimp.com/maru-is-exhausted/',
          'md5': 'ee21217ffd66d058e8b16be340b74883',
          'info_dict': {
-            'id': 'maruexhausted',
+            'id': 'maru-is-exhausted',
              'ext': 'mp4',
              'title': 'Maru is exhausted.',
              'description': 'md5:57e099e857c0a4ea312542b684a869b8',
          }
      }, {
          'url': 'http://www.wimp.com/clowncar/',
-        'md5': '4e2986c793694b55b37cf92521d12bb4',
+        'md5': '5c31ad862a90dc5b1f023956faec13fe',
          'info_dict': {
-            'id': 'clowncar',
+            'id': 'cG4CEr2aiSg',
              'ext': 'webm',
-            'title': 'It\'s like a clown car.',
-            'description': 'md5:0e56db1370a6e49c5c1d19124c0d2fb2',
+            'title': 'Basset hound clown car...incredible!',
+            'description': '5 of my Bassets crawled in this dog loo! www.bellinghambassets.com\n\nFor licensing/usage please contact: licensing(at)jukinmediadotcom',
+            'upload_date': '20140303',
+            'uploader': 'Gretchen Hoey',
+            'uploader_id': 'gretchenandjeff1',
          },
+        'add_ie': ['Youtube'],
      }]
  
      def _real_extract(self, url):
@@ -41,14 +45,13 @@ class WimpIE(InfoExtractor):
                  'ie_key': YoutubeIE.ie_key(),
              }
  
-        video_url = self._search_regex(
-            r'<video[^>]+>\s*<source[^>]+src=(["\'])(?P<url>.+?)\1',
-            webpage, 'video URL', group='url')
+        info_dict = self._extract_jwplayer_data(
+            webpage, video_id, require_title=False)
  
-        return {
+        info_dict.update({
              'id': video_id,
-            'url': video_url,
              'title': self._og_search_title(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
              'description': self._og_search_description(webpage),
-        }
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/wistia.py b/youtube_dl/extractor/wistia.py

index 8b14840a2dba606951f1f7d80694f1e7f0cca8d6..c634b8decddf8fdb15649b05e8f49ad9efc36254 100644 (file)
--- a/youtube_dl/extractor/wistia.py
+++ b/youtube_dl/extractor/wistia.py
@@ -3,16 +3,17 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
-    sanitized_Request,
      int_or_none,
+    float_or_none,
  )
  
  
  class WistiaIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:fast\.)?wistia\.net/embed/iframe/(?P<id>[a-z0-9]+)'
-    _API_URL = 'http://fast.wistia.com/embed/medias/{0:}.json'
+    _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.net/embed/iframe/)(?P<id>[a-z0-9]+)'
+    _API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
+    _IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
          'md5': 'cafeb56ec0c53c18c97405eecb3133df',
          'info_dict': {
@@ -24,36 +25,54 @@ class WistiaIE(InfoExtractor):
              'timestamp': 1386185018,
              'duration': 117,
          },
-    }
+    }, {
+        'url': 'wistia:sh7fpupwlt',
+        'only_matching': True,
+    }, {
+        # with hls video
+        'url': 'wistia:807fafadvk',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        request = sanitized_Request(self._API_URL.format(video_id))
-        request.add_header('Referer', url)  # Some videos require this.
-        data_json = self._download_json(request, video_id)
+        data_json = self._download_json(
+            self._API_URL % video_id, video_id,
+            # Some videos require this.
+            headers={
+                'Referer': url if url.startswith('http') else self._IFRAME_URL % video_id,
+            })
+
          if data_json.get('error'):
-            raise ExtractorError('Error while getting the playlist',
-                                 expected=True)
+            raise ExtractorError(
+                'Error while getting the playlist', expected=True)
+
          data = data_json['media']
          title = data['name']
  
          formats = []
          thumbnails = []
          for a in data['assets']:
+            aurl = a.get('url')
+            if not aurl:
+                continue
              astatus = a.get('status')
              atype = a.get('type')
-            if (astatus is not None and astatus != 2) or atype == 'preview':
+            if (astatus is not None and astatus != 2) or atype in ('preview', 'storyboard'):
                  continue
              elif atype in ('still', 'still_image'):
                  thumbnails.append({
-                    'url': a['url'],
-                    'resolution': '%dx%d' % (a['width'], a['height']),
+                    'url': aurl,
+                    'width': int_or_none(a.get('width')),
+                    'height': int_or_none(a.get('height')),
                  })
              else:
+                aext = a.get('ext')
+                is_m3u8 = a.get('container') == 'm3u8' or aext == 'm3u8'
                  formats.append({
                      'format_id': atype,
-                    'url': a['url'],
+                    'url': aurl,
                      'tbr': int_or_none(a.get('bitrate')),
                      'vbr': int_or_none(a.get('opt_vbitrate')),
                      'width': int_or_none(a.get('width')),
@@ -61,7 +80,8 @@ class WistiaIE(InfoExtractor):
                      'filesize': int_or_none(a.get('size')),
                      'vcodec': a.get('codec'),
                      'container': a.get('container'),
-                    'ext': a.get('ext'),
+                    'ext': 'mp4' if is_m3u8 else aext,
+                    'protocol': 'm3u8' if is_m3u8 else None,
                      'preference': 1 if atype == 'original' else None,
                  })
  
@@ -73,6 +93,6 @@ class WistiaIE(InfoExtractor):
              'description': data.get('seoDescription'),
              'formats': formats,
              'thumbnails': thumbnails,
-            'duration': int_or_none(data.get('duration')),
+            'duration': float_or_none(data.get('duration')),
              'timestamp': int_or_none(data.get('createdAt')),
          }
diff --git a/youtube_dl/extractor/wrzuta.py b/youtube_dl/extractor/wrzuta.py

index c427649211079715a5510eef3eaf35981bdb1034..0f53f1bcb85f409a71d77712b006a57146a9d513 100644 (file)
--- a/youtube_dl/extractor/wrzuta.py
+++ b/youtube_dl/extractor/wrzuta.py
@@ -1,12 +1,14 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
  from ..utils import (
+    ExtractorError,
      int_or_none,
      qualities,
+    remove_start,
  )
  
  
@@ -26,16 +28,17 @@ class WrzutaIE(InfoExtractor):
              'uploader_id': 'laboratoriumdextera',
              'description': 'md5:7fb5ef3c21c5893375fda51d9b15d9cd',
          },
+        'skip': 'Redirected to wrzuta.pl',
      }, {
-        'url': 'http://jolka85.wrzuta.pl/audio/063jOPX5ue2/liber_natalia_szroeder_-_teraz_ty',
-        'md5': 'bc78077859bea7bcfe4295d7d7fc9025',
+        'url': 'http://vexling.wrzuta.pl/audio/01xBFabGXu6/james_horner_-_into_the_na_39_vi_world_bonus',
+        'md5': 'f80564fb5a2ec6ec59705ae2bf2ba56d',
          'info_dict': {
-            'id': '063jOPX5ue2',
-            'ext': 'ogg',
-            'title': 'Liber & Natalia Szroeder - Teraz Ty',
-            'duration': 203,
-            'uploader_id': 'jolka85',
-            'description': 'md5:2d2b6340f9188c8c4cd891580e481096',
+            'id': '01xBFabGXu6',
+            'ext': 'mp3',
+            'title': 'James Horner - Into The Na\'vi World [Bonus]',
+            'description': 'md5:30a70718b2cd9df3120fce4445b0263b',
+            'duration': 95,
+            'uploader_id': 'vexling',
          },
      }]
  
@@ -45,7 +48,10 @@ class WrzutaIE(InfoExtractor):
          typ = mobj.group('typ')
          uploader = mobj.group('uploader')
  
-        webpage = self._download_webpage(url, video_id)
+        webpage, urlh = self._download_webpage_handle(url, video_id)
+
+        if urlh.geturl() == 'http://www.wrzuta.pl/':
+            raise ExtractorError('Video removed', expected=True)
  
          quality = qualities(['SD', 'MQ', 'HQ', 'HD'])
  
@@ -80,3 +86,73 @@ class WrzutaIE(InfoExtractor):
              'description': self._og_search_description(webpage),
              'age_limit': embedpage.get('minimalAge', 0),
          }
+
+
+class WrzutaPlaylistIE(InfoExtractor):
+    """
+        this class covers extraction of wrzuta playlist entries
+        the extraction process bases on following steps:
+        * collect information of playlist size
+        * download all entries provided on
+          the playlist webpage (the playlist is split
+          on two pages: first directly reached from webpage
+          second: downloaded on demand by ajax call and rendered
+          using the ajax call response)
+        * in case size of extracted entries not reached total number of entries
+          use the ajax call to collect the remaining entries
+    """
+
+    IE_NAME = 'wrzuta.pl:playlist'
+    _VALID_URL = r'https?://(?P<uploader>[0-9a-zA-Z]+)\.wrzuta\.pl/playlista/(?P<id>[0-9a-zA-Z]+)'
+    _TESTS = [{
+        'url': 'http://miromak71.wrzuta.pl/playlista/7XfO4vE84iR/moja_muza',
+        'playlist_mincount': 14,
+        'info_dict': {
+            'id': '7XfO4vE84iR',
+            'title': 'Moja muza',
+        },
+    }, {
+        'url': 'http://heroesf70.wrzuta.pl/playlista/6Nj3wQHx756/lipiec_-_lato_2015_muzyka_swiata',
+        'playlist_mincount': 144,
+        'info_dict': {
+            'id': '6Nj3wQHx756',
+            'title': 'Lipiec - Lato 2015 Muzyka Świata',
+        },
+    }, {
+        'url': 'http://miromak71.wrzuta.pl/playlista/7XfO4vE84iR',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        playlist_id = mobj.group('id')
+        uploader = mobj.group('uploader')
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        playlist_size = int_or_none(self._html_search_regex(
+            (r'<div[^>]+class=["\']playlist-counter["\'][^>]*>\d+/(\d+)',
+             r'<div[^>]+class=["\']all-counter["\'][^>]*>(.+?)</div>'),
+            webpage, 'playlist size', default=None))
+
+        playlist_title = remove_start(
+            self._og_search_title(webpage), 'Playlista: ')
+
+        entries = []
+        if playlist_size:
+            entries = [
+                self.url_result(entry_url)
+                for _, entry_url in re.findall(
+                    r'<a[^>]+href=(["\'])(http.+?)\1[^>]+class=["\']playlist-file-page',
+                    webpage)]
+            if playlist_size > len(entries):
+                playlist_content = self._download_json(
+                    'http://%s.wrzuta.pl/xhr/get_playlist_offset/%s' % (uploader, playlist_id),
+                    playlist_id,
+                    'Downloading playlist JSON',
+                    'Unable to download playlist JSON')
+                entries.extend([
+                    self.url_result(entry['filelink'])
+                    for entry in playlist_content.get('files', []) if entry.get('filelink')])
+
+        return self.playlist_result(entries, playlist_id, playlist_title)
diff --git a/youtube_dl/extractor/wsj.py b/youtube_dl/extractor/wsj.py

index 5a897371d1d69a95e08f7b4da4d457b3236e09cc..deb7483ae51699df4670675db4622503320f1cbc 100644 (file)
--- a/youtube_dl/extractor/wsj.py
+++ b/youtube_dl/extractor/wsj.py
@@ -1,19 +1,25 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
+    float_or_none,
      unified_strdate,
  )
  
  
  class WSJIE(InfoExtractor):
-    _VALID_URL = r'https?://video-api\.wsj\.com/api-video/player/iframe\.html\?guid=(?P<id>[a-zA-Z0-9-]+)'
+    _VALID_URL = r'''(?x)https?://
+        (?:
+            video-api\.wsj\.com/api-video/player/iframe\.html\?guid=|
+            (?:www\.)?wsj\.com/video/[^/]+/
+        )
+        (?P<id>[a-zA-Z0-9-]+)'''
      IE_DESC = 'Wall Street Journal'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://video-api.wsj.com/api-video/player/iframe.html?guid=1BD01A4C-BFE8-40A5-A42F-8A8AF9898B1A',
-        'md5': '9747d7a6ebc2f4df64b981e1dde9efa9',
+        'md5': 'e230a5bb249075e40793b655a54a02e4',
          'info_dict': {
              'id': '1BD01A4C-BFE8-40A5-A42F-8A8AF9898B1A',
              'ext': 'mp4',
@@ -24,65 +30,60 @@ class WSJIE(InfoExtractor):
              'duration': 90,
              'title': 'Bills Coach Rex Ryan Updates His Old Jets Tattoo',
          },
-    }
+    }, {
+        'url': 'http://www.wsj.com/video/can-alphabet-build-a-smarter-city/359DDAA8-9AC1-489C-82E6-0429C1E430E0.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        bitrates = [128, 174, 264, 320, 464, 664, 1264]
          api_url = (
              'http://video-api.wsj.com/api-video/find_all_videos.asp?'
-            'type=guid&count=1&query=%s&'
-            'fields=hls,adZone,thumbnailList,guid,state,secondsUntilStartTime,'
-            'author,description,name,linkURL,videoStillURL,duration,videoURL,'
-            'adCategory,catastrophic,linkShortURL,doctypeID,youtubeID,'
-            'titletag,rssURL,wsj-section,wsj-subsection,allthingsd-section,'
-            'allthingsd-subsection,sm-section,sm-subsection,provider,'
-            'formattedCreationDate,keywords,keywordsOmniture,column,editor,'
-            'emailURL,emailPartnerID,showName,omnitureProgramName,'
-            'omnitureVideoFormat,linkRelativeURL,touchCastID,'
-            'omniturePublishDate,%s') % (
-                video_id, ','.join('video%dkMP4Url' % br for br in bitrates))
+            'type=guid&count=1&query=%s&fields=type,hls,videoMP4List,'
+            'thumbnailList,author,description,name,duration,videoURL,'
+            'titletag,formattedCreationDate,keywords,editor' % video_id)
          info = self._download_json(api_url, video_id)['items'][0]
-
-        # Thumbnails are conveniently in the correct format already
-        thumbnails = info.get('thumbnailList')
-        creator = info.get('author')
-        uploader_id = info.get('editor')
-        categories = info.get('keywords')
-        duration = int_or_none(info.get('duration'))
-        upload_date = unified_strdate(
-            info.get('formattedCreationDate'), day_first=False)
          title = info.get('name', info.get('titletag'))
  
-        formats = [{
-            'format_id': 'f4m',
-            'format_note': 'f4m (meta URL)',
-            'url': info['videoURL'],
-        }]
-        if info.get('hls'):
+        formats = []
+
+        f4m_url = info.get('videoURL')
+        if f4m_url:
+            formats.extend(self._extract_f4m_formats(
+                f4m_url, video_id, f4m_id='hds', fatal=False))
+
+        m3u8_url = info.get('hls')
+        if m3u8_url:
              formats.extend(self._extract_m3u8_formats(
                  info['hls'], video_id, ext='mp4',
-                preference=0, entry_protocol='m3u8_native'))
-        for br in bitrates:
-            field = 'video%dkMP4Url' % br
-            if info.get(field):
-                formats.append({
-                    'format_id': 'mp4-%d' % br,
-                    'container': 'mp4',
-                    'tbr': br,
-                    'url': info[field],
-                })
+                entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+
+        for v in info.get('videoMP4List', []):
+            mp4_url = v.get('url')
+            if not mp4_url:
+                continue
+            tbr = int_or_none(v.get('bitrate'))
+            formats.append({
+                'url': mp4_url,
+                'format_id': 'http' + ('-%d' % tbr if tbr else ''),
+                'tbr': tbr,
+                'width': int_or_none(v.get('width')),
+                'height': int_or_none(v.get('height')),
+                'fps': float_or_none(v.get('fps')),
+            })
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'formats': formats,
-            'thumbnails': thumbnails,
-            'creator': creator,
-            'uploader_id': uploader_id,
-            'duration': duration,
-            'upload_date': upload_date,
+            # Thumbnails are conveniently in the correct format already
+            'thumbnails': info.get('thumbnailList'),
+            'creator': info.get('author'),
+            'uploader_id': info.get('editor'),
+            'duration': int_or_none(info.get('duration')),
+            'upload_date': unified_strdate(info.get(
+                'formattedCreationDate'), day_first=False),
              'title': title,
-            'categories': categories,
+            'categories': info.get('keywords'),
          }
diff --git a/youtube_dl/extractor/xboxclips.py b/youtube_dl/extractor/xboxclips.py

index 236ff403bd08f941a2eb023cd41c3bb21c49d4c3..d9c277bc3cb0221cd926c54a64f95bcec928bd3d 100644 (file)
--- a/youtube_dl/extractor/xboxclips.py
+++ b/youtube_dl/extractor/xboxclips.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -12,7 +12,7 @@ from ..utils import (
  class XboxClipsIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?xboxclips\.com/(?:video\.php\?.*vid=|[^/]+/)(?P<id>[\w-]{36})'
      _TEST = {
-        'url': 'https://xboxclips.com/video.php?uid=2533274823424419&gamertag=Iabdulelah&vid=074a69a9-5faf-46aa-b93b-9909c1720325',
+        'url': 'http://xboxclips.com/video.php?uid=2533274823424419&gamertag=Iabdulelah&vid=074a69a9-5faf-46aa-b93b-9909c1720325',
          'md5': 'fbe1ec805e920aeb8eced3c3e657df5d',
          'info_dict': {
              'id': '074a69a9-5faf-46aa-b93b-9909c1720325',
diff --git a/youtube_dl/extractor/xfileshare.py b/youtube_dl/extractor/xfileshare.py

index 2d1504eaacd36eb564da06eb541d2b6b8eabaa44..de344bad25309c03b1d7378ceb6b3968c2d4c47a 100644 (file)
--- a/youtube_dl/extractor/xfileshare.py
+++ b/youtube_dl/extractor/xfileshare.py
@@ -5,29 +5,44 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    decode_packed_codes,
      ExtractorError,
      int_or_none,
+    NO_DEFAULT,
      sanitized_Request,
      urlencode_postdata,
  )
  
  
  class XFileShareIE(InfoExtractor):
-    IE_DESC = 'XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me'
-    _VALID_URL = r'''(?x)
-        https?://(?P<host>(?:www\.)?
-            (?:daclips\.in|gorillavid\.in|movpod\.in|fastvideo\.in|realvid\.net|filehoot\.com|vidto\.me|powerwatch\.pw))/
-        (?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:-[0-9]+x[0-9]+\.html)?
-    '''
-
-    _FILE_NOT_FOUND_REGEX = r'>(?:404 - )?File Not Found<'
+    _SITES = (
+        ('daclips.in', 'DaClips'),
+        ('filehoot.com', 'FileHoot'),
+        ('gorillavid.in', 'GorillaVid'),
+        ('movpod.in', 'MovPod'),
+        ('powerwatch.pw', 'PowerWatch'),
+        ('rapidvideo.ws', 'Rapidvideo.ws'),
+        ('thevideobee.to', 'TheVideoBee'),
+        ('vidto.me', 'Vidto'),
+        ('streamin.to', 'Streamin.To'),
+        ('xvidstage.com', 'XVIDSTAGE'),
+    )
+
+    IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1])
+    _VALID_URL = (r'https?://(?P<host>(?:www\.)?(?:%s))/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
+                  % '|'.join(re.escape(site) for site in list(zip(*_SITES))[0]))
+
+    _FILE_NOT_FOUND_REGEXES = (
+        r'>(?:404 - )?File Not Found<',
+        r'>The file was removed by administrator<',
+    )
  
      _TESTS = [{
          'url': 'http://gorillavid.in/06y9juieqpmi',
          'md5': '5ae4a3580620380619678ee4875893ba',
          'info_dict': {
              'id': '06y9juieqpmi',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
              'thumbnail': 're:http://.*\.jpg',
          },
@@ -43,25 +58,6 @@ class XFileShareIE(InfoExtractor):
              'title': 'Micro Pig piglets ready on 16th July 2009-bG0PdrCdxUc',
              'thumbnail': 're:http://.*\.jpg',
          }
-    }, {
-        # video with countdown timeout
-        'url': 'http://fastvideo.in/1qmdn1lmsmbw',
-        'md5': '8b87ec3f6564a3108a0e8e66594842ba',
-        'info_dict': {
-            'id': '1qmdn1lmsmbw',
-            'ext': 'mp4',
-            'title': 'Man of Steel - Trailer',
-            'thumbnail': 're:http://.*\.jpg',
-        },
-    }, {
-        'url': 'http://realvid.net/ctn2y6p2eviw',
-        'md5': 'b2166d2cf192efd6b6d764c18fd3710e',
-        'info_dict': {
-            'id': 'ctn2y6p2eviw',
-            'ext': 'flv',
-            'title': 'rdx 1955',
-            'thumbnail': 're:http://.*\.jpg',
-        },
      }, {
          'url': 'http://movpod.in/0wguyyxi1yca',
          'only_matching': True,
@@ -72,7 +68,8 @@ class XFileShareIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'youtube-dl test video \'äBaW_jenozKc.mp4.mp4',
              'thumbnail': 're:http://.*\.jpg',
-        }
+        },
+        'skip': 'Video removed',
      }, {
          'url': 'http://vidto.me/ku5glz52nqe1.html',
          'info_dict': {
@@ -87,6 +84,17 @@ class XFileShareIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Big Buck Bunny trailer',
          },
+    }, {
+        'url': 'http://xvidstage.com/e0qcnl03co6z',
+        'info_dict': {
+            'id': 'e0qcnl03co6z',
+            'ext': 'mp4',
+            'title': 'Chucky Prank 2015.mp4',
+        },
+    }, {
+        # removed by administrator
+        'url': 'http://xvidstage.com/amfy7atlkx25',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -96,7 +104,7 @@ class XFileShareIE(InfoExtractor):
          url = 'http://%s/%s' % (mobj.group('host'), video_id)
          webpage = self._download_webpage(url, video_id)
  
-        if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None:
+        if any(re.search(p, webpage) for p in self._FILE_NOT_FOUND_REGEXES):
              raise ExtractorError('Video %s does not exist' % video_id, expected=True)
  
          fields = self._hidden_inputs(webpage)
@@ -116,16 +124,31 @@ class XFileShareIE(InfoExtractor):
              webpage = self._download_webpage(req, video_id, 'Downloading video page')
  
          title = (self._search_regex(
-            [r'style="z-index: [0-9]+;">([^<]+)</span>',
+            (r'style="z-index: [0-9]+;">([^<]+)</span>',
               r'<td nowrap>([^<]+)</td>',
               r'h4-fine[^>]*>([^<]+)<',
               r'>Watch (.+) ',
-             r'<h2 class="video-page-head">([^<]+)</h2>'],
-            webpage, 'title', default=None) or self._og_search_title(webpage)).strip()
-        video_url = self._search_regex(
-            [r'file\s*:\s*["\'](http[^"\']+)["\'],',
-             r'file_link\s*=\s*\'(https?:\/\/[0-9a-zA-z.\/\-_]+)'],
-            webpage, 'file url')
+             r'<h2 class="video-page-head">([^<]+)</h2>',
+             r'<h2 style="[^"]*color:#403f3d[^"]*"[^>]*>([^<]+)<'),  # streamin.to
+            webpage, 'title', default=None) or self._og_search_title(
+            webpage, default=None) or video_id).strip()
+
+        def extract_video_url(default=NO_DEFAULT):
+            return self._search_regex(
+                (r'file\s*:\s*(["\'])(?P<url>http.+?)\1,',
+                 r'file_link\s*=\s*(["\'])(?P<url>http.+?)\1',
+                 r'addVariable\((\\?["\'])file\1\s*,\s*(\\?["\'])(?P<url>http.+?)\2\)',
+                 r'<embed[^>]+src=(["\'])(?P<url>http.+?)\1'),
+                webpage, 'file url', default=default, group='url')
+
+        video_url = extract_video_url(default=None)
+
+        if not video_url:
+            webpage = decode_packed_codes(self._search_regex(
+                r"(}\('(.+)',(\d+),(\d+),'[^']*\b(?:file|embed)\b[^']*'\.split\('\|'\))",
+                webpage, 'packed code'))
+            video_url = extract_video_url()
+
          thumbnail = self._search_regex(
              r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None)
  
diff --git a/youtube_dl/extractor/xhamster.py b/youtube_dl/extractor/xhamster.py

index b3547174dd92beffafaf8f220b50b94a25f2fa2b..bd8e1af2e0f6c25fc44aea36c23b813b092b4438 100644 (file)
--- a/youtube_dl/extractor/xhamster.py
+++ b/youtube_dl/extractor/xhamster.py
@@ -12,37 +12,52 @@ from ..utils import (
  
  
  class XHamsterIE(InfoExtractor):
-    _VALID_URL = r'(?P<proto>https?)://(?:.+?\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.+?)\.html(?:\?.*)?'
-    _TESTS = [
-        {
-            'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
-            'info_dict': {
-                'id': '1509445',
-                'ext': 'mp4',
-                'title': 'FemaleAgent Shy beauty takes the bait',
-                'upload_date': '20121014',
-                'uploader': 'Ruseful2011',
-                'duration': 893.52,
-                'age_limit': 18,
-            }
+    _VALID_URL = r'(?P<proto>https?)://(?:.+?\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.*?)\.html(?:\?.*)?'
+    _TESTS = [{
+        'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
+        'md5': '8281348b8d3c53d39fffb377d24eac4e',
+        'info_dict': {
+            'id': '1509445',
+            'ext': 'mp4',
+            'title': 'FemaleAgent Shy beauty takes the bait',
+            'upload_date': '20121014',
+            'uploader': 'Ruseful2011',
+            'duration': 893.52,
+            'age_limit': 18,
          },
-        {
-            'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
-            'info_dict': {
-                'id': '2221348',
-                'ext': 'mp4',
-                'title': 'Britney Spears  Sexy Booty',
-                'upload_date': '20130914',
-                'uploader': 'jojo747400',
-                'duration': 200.48,
-                'age_limit': 18,
-            }
+    }, {
+        'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
+        'info_dict': {
+            'id': '2221348',
+            'ext': 'mp4',
+            'title': 'Britney Spears  Sexy Booty',
+            'upload_date': '20130914',
+            'uploader': 'jojo747400',
+            'duration': 200.48,
+            'age_limit': 18,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # empty seo
+        'url': 'http://xhamster.com/movies/5667973/.html',
+        'info_dict': {
+            'id': '5667973',
+            'ext': 'mp4',
+            'title': '....',
+            'upload_date': '20160208',
+            'uploader': 'parejafree',
+            'duration': 72.0,
+            'age_limit': 18,
          },
-        {
-            'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html',
-            'only_matching': True,
+        'params': {
+            'skip_download': True,
          },
-    ]
+    }, {
+        'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          def extract_video_url(webpage, name):
@@ -170,7 +185,7 @@ class XHamsterEmbedIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          video_url = self._search_regex(
-            r'href="(https?://xhamster\.com/movies/%s/[^"]+\.html[^"]*)"' % video_id,
+            r'href="(https?://xhamster\.com/movies/%s/[^"]*\.html[^"]*)"' % video_id,
              webpage, 'xhamster url', default=None)
  
          if not video_url:
diff --git a/youtube_dl/extractor/xiami.py b/youtube_dl/extractor/xiami.py

new file mode 100644 (file)

index 0000000..86abef2
--- /dev/null
+++ b/youtube_dl/extractor/xiami.py
@@ -0,0 +1,169 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_unquote
+from ..utils import int_or_none
+
+
+class XiamiBaseIE(InfoExtractor):
+    _API_BASE_URL = 'http://www.xiami.com/song/playlist/cat/json/id'
+
+    def _download_webpage(self, *args, **kwargs):
+        webpage = super(XiamiBaseIE, self)._download_webpage(*args, **kwargs)
+        if '>Xiami is currently not available in your country.<' in webpage:
+            self.raise_geo_restricted('Xiami is currently not available in your country')
+        return webpage
+
+    def _extract_track(self, track, track_id=None):
+        title = track['title']
+        track_url = self._decrypt(track['location'])
+
+        subtitles = {}
+        lyrics_url = track.get('lyric_url') or track.get('lyric')
+        if lyrics_url and lyrics_url.startswith('http'):
+            subtitles['origin'] = [{'url': lyrics_url}]
+
+        return {
+            'id': track.get('song_id') or track_id,
+            'url': track_url,
+            'title': title,
+            'thumbnail': track.get('pic') or track.get('album_pic'),
+            'duration': int_or_none(track.get('length')),
+            'creator': track.get('artist', '').split(';')[0],
+            'track': title,
+            'album': track.get('album_name'),
+            'artist': track.get('artist'),
+            'subtitles': subtitles,
+        }
+
+    def _extract_tracks(self, item_id, typ=None):
+        playlist = self._download_json(
+            '%s/%s%s' % (self._API_BASE_URL, item_id, '/type/%s' % typ if typ else ''), item_id)
+        return [
+            self._extract_track(track, item_id)
+            for track in playlist['data']['trackList']]
+
+    @staticmethod
+    def _decrypt(origin):
+        n = int(origin[0])
+        origin = origin[1:]
+        short_lenth = len(origin) // n
+        long_num = len(origin) - short_lenth * n
+        l = tuple()
+        for i in range(0, n):
+            length = short_lenth
+            if i < long_num:
+                length += 1
+            l += (origin[0:length], )
+            origin = origin[length:]
+        ans = ''
+        for i in range(0, short_lenth + 1):
+            for j in range(0, n):
+                if len(l[j]) > i:
+                    ans += l[j][i]
+        return compat_urllib_parse_unquote(ans).replace('^', '0')
+
+
+class XiamiSongIE(XiamiBaseIE):
+    IE_NAME = 'xiami:song'
+    IE_DESC = '虾米音乐'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/song/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://www.xiami.com/song/1775610518',
+        'md5': '521dd6bea40fd5c9c69f913c232cb57e',
+        'info_dict': {
+            'id': '1775610518',
+            'ext': 'mp3',
+            'title': 'Woman',
+            'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
+            'duration': 265,
+            'creator': 'HONNE',
+            'track': 'Woman',
+            'album': 'Woman',
+            'artist': 'HONNE',
+            'subtitles': {
+                'origin': [{
+                    'ext': 'lrc',
+                }],
+            },
+        },
+        'skip': 'Georestricted',
+    }, {
+        'url': 'http://www.xiami.com/song/1775256504',
+        'md5': '932a3abd45c6aa2b1fdbe028fcb4c4fc',
+        'info_dict': {
+            'id': '1775256504',
+            'ext': 'mp3',
+            'title': '悟空',
+            'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
+            'duration': 200,
+            'creator': '戴荃',
+            'track': '悟空',
+            'album': '悟空',
+            'artist': '戴荃',
+            'subtitles': {
+                'origin': [{
+                    'ext': 'lrc',
+                }],
+            },
+        },
+        'skip': 'Georestricted',
+    }]
+
+    def _real_extract(self, url):
+        return self._extract_tracks(self._match_id(url))[0]
+
+
+class XiamiPlaylistBaseIE(XiamiBaseIE):
+    def _real_extract(self, url):
+        item_id = self._match_id(url)
+        return self.playlist_result(self._extract_tracks(item_id, self._TYPE), item_id)
+
+
+class XiamiAlbumIE(XiamiPlaylistBaseIE):
+    IE_NAME = 'xiami:album'
+    IE_DESC = '虾米音乐 - 专辑'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/album/(?P<id>[0-9]+)'
+    _TYPE = '1'
+    _TESTS = [{
+        'url': 'http://www.xiami.com/album/2100300444',
+        'info_dict': {
+            'id': '2100300444',
+        },
+        'playlist_count': 10,
+        'skip': 'Georestricted',
+    }, {
+        'url': 'http://www.xiami.com/album/512288?spm=a1z1s.6843761.1110925389.6.hhE9p9',
+        'only_matching': True,
+    }]
+
+
+class XiamiArtistIE(XiamiPlaylistBaseIE):
+    IE_NAME = 'xiami:artist'
+    IE_DESC = '虾米音乐 - 歌手'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/artist/(?P<id>[0-9]+)'
+    _TYPE = '2'
+    _TEST = {
+        'url': 'http://www.xiami.com/artist/2132?spm=0.0.0.0.dKaScp',
+        'info_dict': {
+            'id': '2132',
+        },
+        'playlist_count': 20,
+        'skip': 'Georestricted',
+    }
+
+
+class XiamiCollectionIE(XiamiPlaylistBaseIE):
+    IE_NAME = 'xiami:collection'
+    IE_DESC = '虾米音乐 - 精选集'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/collect/(?P<id>[0-9]+)'
+    _TYPE = '3'
+    _TEST = {
+        'url': 'http://www.xiami.com/collect/156527391?spm=a1z1s.2943601.6856193.12.4jpBnr',
+        'info_dict': {
+            'id': '156527391',
+        },
+        'playlist_mincount': 29,
+        'skip': 'Georestricted',
+    }
diff --git a/youtube_dl/extractor/xminus.py b/youtube_dl/extractor/xminus.py

index 7c9d8af6f2585207347d58d08fc607ebf4d28900..36e5ead1e690db9bb0c1c1a64650a69c784bbe76 100644 (file)
--- a/youtube_dl/extractor/xminus.py
+++ b/youtube_dl/extractor/xminus.py
@@ -2,15 +2,15 @@
  from __future__ import unicode_literals
  
  import re
+import time
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_chr,
      compat_ord,
  )
  from ..utils import (
      int_or_none,
-    parse_filesize,
+    parse_duration,
  )
  
  
@@ -22,7 +22,7 @@ class XMinusIE(InfoExtractor):
          'info_dict': {
              'id': '4542',
              'ext': 'mp3',
-            'title': 'Леонид Агутин-Песенка шофера',
+            'title': 'Леонид Агутин-Песенка шофёра',
              'duration': 156,
              'tbr': 320,
              'filesize_approx': 5900000,
@@ -36,38 +36,41 @@ class XMinusIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          artist = self._html_search_regex(
-            r'minus_track\.artist="(.+?)"', webpage, 'artist')
+            r'<a[^>]+href="/artist/\d+">([^<]+)</a>', webpage, 'artist')
          title = artist + '-' + self._html_search_regex(
-            r'minus_track\.title="(.+?)"', webpage, 'title')
-        duration = int_or_none(self._html_search_regex(
-            r'minus_track\.dur_sec=\'([0-9]*?)\'',
+            r'<span[^>]+class="minustrack-full-title(?:\s+[^"]+)?"[^>]*>([^<]+)', webpage, 'title')
+        duration = parse_duration(self._html_search_regex(
+            r'<span[^>]+class="player-duration(?:\s+[^"]+)?"[^>]*>([^<]+)',
              webpage, 'duration', fatal=False))
-        filesize_approx = parse_filesize(self._html_search_regex(
-            r'<div id="finfo"[^>]*>\s*↓\s*([0-9.]+\s*[a-zA-Z][bB])',
-            webpage, 'approximate filesize', fatal=False))
-        tbr = int_or_none(self._html_search_regex(
-            r'<div class="quality[^"]*"></div>\s*([0-9]+)\s*kbps',
-            webpage, 'bitrate', fatal=False))
+        mobj = re.search(
+            r'<div[^>]+class="dw-info(?:\s+[^"]+)?"[^>]*>(?P<tbr>\d+)\s*кбит/c\s+(?P<filesize>[0-9.]+)\s*мб</div>',
+            webpage)
+        tbr = filesize_approx = None
+        if mobj:
+            filesize_approx = float(mobj.group('filesize')) * 1000000
+            tbr = float(mobj.group('tbr'))
          view_count = int_or_none(self._html_search_regex(
-            r'<div class="quality.*?► ([0-9]+)',
+            r'<span><[^>]+class="icon-chart-bar".*?>(\d+)</span>',
              webpage, 'view count', fatal=False))
          description = self._html_search_regex(
-            r'(?s)<div id="song_texts">(.*?)</div><br',
+            r'(?s)<pre[^>]+id="lyrics-original"[^>]*>(.*?)</pre>',
              webpage, 'song lyrics', fatal=False)
          if description:
              description = re.sub(' *\r *', '\n', description)
  
-        enc_token = self._html_search_regex(
-            r'minus_track\.s?tkn="(.+?)"', webpage, 'enc_token')
-        token = ''.join(
-            c if pos == 3 else compat_chr(compat_ord(c) - 1)
-            for pos, c in enumerate(reversed(enc_token)))
-        video_url = 'http://x-minus.org/dwlf/%s/%s.mp3' % (video_id, token)
+        k = self._search_regex(
+            r'<div[^>]+id="player-bottom"[^>]+data-k="([^"]+)">', webpage,
+            'encoded data')
+        h = time.time() / 3600
+        a = sum(map(int, [compat_ord(c) for c in k])) + int(video_id) + h
+        video_url = 'http://x-minus.me/dl/minus?id=%s&tkn2=%df%d' % (video_id, a, h)
  
          return {
              'id': video_id,
              'title': title,
              'url': video_url,
+            # The extension is unknown until actual downloading
+            'ext': 'mp3',
              'duration': duration,
              'filesize_approx': filesize_approx,
              'tbr': tbr,
diff --git a/youtube_dl/extractor/xnxx.py b/youtube_dl/extractor/xnxx.py

index 5a41f8ffa0c5a46a3d0431a6aac8e93ba8ca1cb9..e0a6255dc4df8f2a2bd56ffcf1363089a08e6aea 100644 (file)
--- a/youtube_dl/extractor/xnxx.py
+++ b/youtube_dl/extractor/xnxx.py
@@ -1,4 +1,4 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
@@ -6,17 +6,23 @@ from ..compat import compat_urllib_parse_unquote
  
  
  class XNXXIE(InfoExtractor):
-    _VALID_URL = r'^https?://(?:video|www)\.xnxx\.com/video(?P<id>[0-9]+)/(.*)'
-    _TEST = {
-        'url': 'http://video.xnxx.com/video1135332/lida_naked_funny_actress_5_',
-        'md5': '0831677e2b4761795f68d417e0b7b445',
+    _VALID_URL = r'https?://(?:video|www)\.xnxx\.com/video-?(?P<id>[0-9a-z]+)/'
+    _TESTS = [{
+        'url': 'http://www.xnxx.com/video-55awb78/skyrim_test_video',
+        'md5': 'ef7ecee5af78f8b03dca2cf31341d3a0',
          'info_dict': {
-            'id': '1135332',
+            'id': '55awb78',
              'ext': 'flv',
-            'title': 'lida » Naked Funny Actress  (5)',
+            'title': 'Skyrim Test Video',
              'age_limit': 18,
-        }
-    }
+        },
+    }, {
+        'url': 'http://video.xnxx.com/video1135332/lida_naked_funny_actress_5_',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.xnxx.com/video-55awb78/',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/xtube.py b/youtube_dl/extractor/xtube.py

index 4075b8a4f8a705cf29aa1430656146350a8d07aa..83bc1fef2095b322a67199c60e27fcc6f8f1bcbc 100644 (file)
--- a/youtube_dl/extractor/xtube.py
+++ b/youtube_dl/extractor/xtube.py
@@ -4,17 +4,23 @@ import itertools
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
  from ..utils import (
      int_or_none,
      orderedSet,
+    parse_duration,
      sanitized_Request,
      str_to_int,
  )
  
  
  class XTubeIE(InfoExtractor):
-    _VALID_URL = r'(?:xtube:|https?://(?:www\.)?xtube\.com/(?:watch\.php\?.*\bv=|video-watch/(?P<display_id>[^/]+)-))(?P<id>[^/?&#]+)'
+    _VALID_URL = r'''(?x)
+                        (?:
+                            xtube:|
+                            https?://(?:www\.)?xtube\.com/(?:watch\.php\?.*\bv=|video-watch/(?P<display_id>[^/]+)-)
+                        )
+                        (?P<id>[^/?&#]+)
+                    '''
  
      _TESTS = [{
          # old URL schema
@@ -27,6 +33,8 @@ class XTubeIE(InfoExtractor):
              'description': 'contains:an ET kind of thing',
              'uploader': 'greenshowers',
              'duration': 450,
+            'view_count': int,
+            'comment_count': int,
              'age_limit': 18,
          }
      }, {
@@ -51,21 +59,30 @@ class XTubeIE(InfoExtractor):
          req.add_header('Cookie', 'age_verified=1; cookiesAccepted=1')
          webpage = self._download_webpage(req, display_id)
  
-        flashvars = self._parse_json(
-            self._search_regex(
-                r'xt\.playerOps\s*=\s*({.+?});', webpage, 'player ops'),
-            video_id)['flashvars']
-
-        title = flashvars.get('title') or self._search_regex(
-            r'<h1>([^<]+)</h1>', webpage, 'title')
-        video_url = compat_urllib_parse_unquote(flashvars['video_url'])
-        duration = int_or_none(flashvars.get('video_duration'))
-
-        uploader = self._search_regex(
-            r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
-            webpage, 'uploader', fatal=False)
+        sources = self._parse_json(self._search_regex(
+            r'sources\s*:\s*({.+?}),', webpage, 'sources'), video_id)
+
+        formats = []
+        for format_id, format_url in sources.items():
+            formats.append({
+                'url': format_url,
+                'format_id': format_id,
+                'height': int_or_none(format_id),
+            })
+        self._sort_formats(formats)
+
+        title = self._search_regex(
+            (r'<h1>(?P<title>[^<]+)</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
+            webpage, 'title', group='title')
          description = self._search_regex(
              r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)
+        uploader = self._search_regex(
+            (r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
+             r'<span[^>]+class="nickname"[^>]*>([^<]+)'),
+            webpage, 'uploader', fatal=False)
+        duration = parse_duration(self._search_regex(
+            r'<dt>Runtime:</dt>\s*<dd>([^<]+)</dd>',
+            webpage, 'duration', fatal=False))
          view_count = str_to_int(self._search_regex(
              r'<dt>Views:</dt>\s*<dd>([\d,\.]+)</dd>',
              webpage, 'view count', fatal=False))
@@ -76,7 +93,6 @@ class XTubeIE(InfoExtractor):
          return {
              'id': video_id,
              'display_id': display_id,
-            'url': video_url,
              'title': title,
              'description': description,
              'uploader': uploader,
@@ -84,6 +100,7 @@ class XTubeIE(InfoExtractor):
              'view_count': view_count,
              'comment_count': comment_count,
              'age_limit': 18,
+            'formats': formats,
          }
  
  
diff --git a/youtube_dl/extractor/xuite.py b/youtube_dl/extractor/xuite.py

index 2466410faaba4e0047fe26099ee936b68dcb9e34..4b9c1ee9c5222f48c5634184f703baa062cf3ae9 100644 (file)
--- a/youtube_dl/extractor/xuite.py
+++ b/youtube_dl/extractor/xuite.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import base64
@@ -66,6 +66,21 @@ class XuiteIE(InfoExtractor):
              'uploader_id': '242127761',
              'categories': ['電玩動漫'],
          },
+        'skip': 'Video removed',
+    }, {
+        # Video with encoded media id
+        # from http://forgetfulbc.blogspot.com/2016/06/date.html
+        'url': 'http://vlog.xuite.net/embed/cE1xbENoLTI3NDQ3MzM2LmZsdg==?ar=0&as=0',
+        'info_dict': {
+            'id': 'cE1xbENoLTI3NDQ3MzM2LmZsdg==',
+            'ext': 'mp4',
+            'title': '男女平權只是口號？專家解釋約會時男生是否該幫女生付錢 (中字)',
+            'description': 'md5:f0abdcb69df300f522a5442ef3146f2a',
+            'timestamp': 1466160960,
+            'upload_date': '20160617',
+            'uploader': 'B.C. & Lowy',
+            'uploader_id': '232279340',
+        },
      }, {
          'url': 'http://vlog.xuite.net/play/S1dDUjdyLTMyOTc3NjcuZmx2/%E5%AD%AB%E7%87%95%E5%A7%BF-%E7%9C%BC%E6%B7%9A%E6%88%90%E8%A9%A9',
          'only_matching': True,
@@ -79,10 +94,9 @@ class XuiteIE(InfoExtractor):
      def base64_encode_utf8(data):
          return base64.b64encode(data.encode('utf-8')).decode('utf-8')
  
-    def _extract_flv_config(self, media_id):
-        base64_media_id = self.base64_encode_utf8(media_id)
+    def _extract_flv_config(self, encoded_media_id):
          flv_config = self._download_xml(
-            'http://vlog.xuite.net/flash/player?media=%s' % base64_media_id,
+            'http://vlog.xuite.net/flash/player?media=%s' % encoded_media_id,
              'flv config')
          prop_dict = {}
          for prop in flv_config.findall('./property'):
@@ -107,9 +121,14 @@ class XuiteIE(InfoExtractor):
                  '%s returned error: %s' % (self.IE_NAME, error_msg),
                  expected=True)
  
-        video_id = self._html_search_regex(
-            r'data-mediaid="(\d+)"', webpage, 'media id')
-        flv_config = self._extract_flv_config(video_id)
+        encoded_media_id = self._search_regex(
+            r'attributes\.name\s*=\s*"([^"]+)"', webpage,
+            'encoded media id', default=None)
+        if encoded_media_id is None:
+            video_id = self._html_search_regex(
+                r'data-mediaid="(\d+)"', webpage, 'media id')
+            encoded_media_id = self.base64_encode_utf8(video_id)
+        flv_config = self._extract_flv_config(encoded_media_id)
  
          FORMATS = {
              'audio': 'mp3',
diff --git a/youtube_dl/extractor/xvideos.py b/youtube_dl/extractor/xvideos.py

index 710ad5041988b0e1c932b135af91a27036dfd664..30825daae956e12f450e32bc66aea737a5c293e0 100644 (file)
--- a/youtube_dl/extractor/xvideos.py
+++ b/youtube_dl/extractor/xvideos.py
@@ -8,7 +8,6 @@ from ..utils import (
      clean_html,
      ExtractorError,
      determine_ext,
-    sanitized_Request,
  )
  
  
@@ -16,17 +15,15 @@ class XVideosIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?xvideos\.com/video(?P<id>[0-9]+)(?:.*)'
      _TEST = {
          'url': 'http://www.xvideos.com/video4588838/biker_takes_his_girl',
-        'md5': '4b46ae6ea5e6e9086e714d883313c0c9',
+        'md5': '14cea69fcb84db54293b1e971466c2e1',
          'info_dict': {
              'id': '4588838',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Biker Takes his Girl',
              'age_limit': 18,
          }
      }
  
-    _ANDROID_USER_AGENT = 'Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19'
-
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
@@ -35,31 +32,34 @@ class XVideosIE(InfoExtractor):
          if mobj:
              raise ExtractorError('%s said: %s' % (self.IE_NAME, clean_html(mobj.group(1))), expected=True)
  
-        video_url = compat_urllib_parse_unquote(
-            self._search_regex(r'flv_url=(.+?)&', webpage, 'video URL'))
          video_title = self._html_search_regex(
              r'<title>(.*?)\s+-\s+XVID', webpage, 'title')
          video_thumbnail = self._search_regex(
              r'url_bigthumb=(.+?)&amp', webpage, 'thumbnail', fatal=False)
  
-        formats = [{
-            'url': video_url,
-        }]
+        formats = []
  
-        android_req = sanitized_Request(url)
-        android_req.add_header('User-Agent', self._ANDROID_USER_AGENT)
-        android_webpage = self._download_webpage(android_req, video_id, fatal=False)
+        video_url = compat_urllib_parse_unquote(self._search_regex(
+            r'flv_url=(.+?)&', webpage, 'video URL', default=''))
+        if video_url:
+            formats.append({
+                'url': video_url,
+                'format_id': 'flv',
+            })
  
-        if android_webpage is not None:
-            player_params_str = self._search_regex(
-                'mobileReplacePlayerDivTwoQual\(([^)]+)\)',
-                android_webpage, 'player parameters', default='')
-            player_params = list(map(lambda s: s.strip(' \''), player_params_str.split(',')))
-            if player_params:
-                formats.extend([{
-                    'url': param,
-                    'preference': -10,
-                } for param in player_params if determine_ext(param) == 'mp4'])
+        for kind, _, format_url in re.findall(
+                r'setVideo([^(]+)\((["\'])(http.+?)\2\)', webpage):
+            format_id = kind.lower()
+            if format_id == 'hls':
+                formats.extend(self._extract_m3u8_formats(
+                    format_url, video_id, 'mp4',
+                    entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+            elif format_id in ('urllow', 'urlhigh'):
+                formats.append({
+                    'url': format_url,
+                    'format_id': '%s-%s' % (determine_ext(format_url, 'mp4'), format_id[3:]),
+                    'quality': -2 if format_id.endswith('low') else None,
+                })
  
          self._sort_formats(formats)
  
@@ -67,7 +67,6 @@ class XVideosIE(InfoExtractor):
              'id': video_id,
              'formats': formats,
              'title': video_title,
-            'ext': 'flv',
              'thumbnail': video_thumbnail,
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/yahoo.py b/youtube_dl/extractor/yahoo.py

index b2d8f4b48daddcf734d3a1fb461d1b92736bcfd1..4951414e91ffbc34dc83629403c8b64ffb5e5682 100644 (file)
--- a/youtube_dl/extractor/yahoo.py
+++ b/youtube_dl/extractor/yahoo.py
@@ -8,7 +8,6 @@ import re
  from .common import InfoExtractor, SearchInfoExtractor
  from ..compat import (
      compat_urllib_parse,
-    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
@@ -17,14 +16,19 @@ from ..utils import (
      ExtractorError,
      int_or_none,
      mimetype2ext,
+    determine_ext,
  )
  
+from .brightcove import (
+    BrightcoveLegacyIE,
+    BrightcoveNewIE,
+)
  from .nbc import NBCSportsVPlayerIE
  
  
  class YahooIE(InfoExtractor):
      IE_DESC = 'Yahoo screen and movies'
-    _VALID_URL = r'(?P<url>(?P<host>https?://(?:[a-zA-Z]{2}\.)?[\da-zA-Z_-]+\.yahoo\.com)/(?:[^/]+/)*(?P<display_id>.+)?-(?P<id>[0-9]+)(?:-[a-z]+)?\.html)'
+    _VALID_URL = r'(?P<url>(?P<host>https?://(?:[a-zA-Z]{2}\.)?[\da-zA-Z_-]+\.yahoo\.com)/(?:[^/]+/)*(?P<display_id>.+)?-(?P<id>[0-9]+)(?:-[a-z]+)?(?:\.html)?)'
      _TESTS = [
          {
              'url': 'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html',
@@ -38,7 +42,7 @@ class YahooIE(InfoExtractor):
          },
          {
              'url': 'http://screen.yahoo.com/wired/codefellas-s1-ep12-cougar-lies-103000935.html',
-            'md5': 'd6e6fc6e1313c608f316ddad7b82b306',
+            'md5': '251af144a19ebc4a033e8ba91ac726bb',
              'info_dict': {
                  'id': 'd1dedf8c-d58c-38c3-8963-e899929ae0a9',
                  'ext': 'mp4',
@@ -49,7 +53,7 @@ class YahooIE(InfoExtractor):
          },
          {
              'url': 'https://screen.yahoo.com/community/community-sizzle-reel-203225340.html?format=embed',
-            'md5': '60e8ac193d8fb71997caa8fce54c6460',
+            'md5': '7993e572fac98e044588d0b5260f4352',
              'info_dict': {
                  'id': '4fe78544-8d48-39d8-97cd-13f205d9fcdb',
                  'ext': 'mp4',
@@ -59,22 +63,22 @@ class YahooIE(InfoExtractor):
              }
          },
          {
-            'url': 'https://tw.screen.yahoo.com/election-2014-askmayor/敢問市長-黃秀霜批賴清德-非常高傲-033009720.html',
-            'md5': '3a09cf59349cfaddae1797acc3c087fc',
+            'url': 'https://tw.news.yahoo.com/%E6%95%A2%E5%95%8F%E5%B8%82%E9%95%B7%20%E9%BB%83%E7%A7%80%E9%9C%9C%E6%89%B9%E8%B3%B4%E6%B8%85%E5%BE%B7%20%E9%9D%9E%E5%B8%B8%E9%AB%98%E5%82%B2-034024051.html',
+            'md5': '45c024bad51e63e9b6f6fad7a43a8c23',
              'info_dict': {
                  'id': 'cac903b3-fcf4-3c14-b632-643ab541712f',
                  'ext': 'mp4',
                  'title': '敢問市長／黃秀霜批賴清德「非常高傲」',
                  'description': '直言台南沒捷運 交通居五都之末',
                  'duration': 396,
-            }
+            },
          },
          {
              'url': 'https://uk.screen.yahoo.com/editor-picks/cute-raccoon-freed-drain-using-091756545.html',
-            'md5': '0b51660361f0e27c9789e7037ef76f4b',
+            'md5': '71298482f7c64cbb7fa064e4553ff1c1',
              'info_dict': {
                  'id': 'b3affa53-2e14-3590-852b-0e0db6cd1a58',
-                'ext': 'mp4',
+                'ext': 'webm',
                  'title': 'Cute Raccoon Freed From Drain\u00a0Using Angle Grinder',
                  'description': 'md5:f66c890e1490f4910a9953c941dee944',
                  'duration': 97,
@@ -89,17 +93,32 @@ class YahooIE(InfoExtractor):
                  'title': 'Program that makes hockey more affordable not offered in Manitoba',
                  'description': 'md5:c54a609f4c078d92b74ffb9bf1f496f4',
                  'duration': 121,
-            }
+            },
+            'skip': 'Video gone',
          }, {
              'url': 'https://ca.finance.yahoo.com/news/hackers-sony-more-trouble-well-154609075.html',
-            'md5': '226a895aae7e21b0129e2a2006fe9690',
              'info_dict': {
-                'id': 'e624c4bc-3389-34de-9dfc-025f74943409',
-                'ext': 'mp4',
-                'title': '\'The Interview\' TV Spot: War',
-                'description': 'The Interview',
-                'duration': 30,
-            }
+                'id': '154609075',
+            },
+            'playlist': [{
+                'md5': '000887d0dc609bc3a47c974151a40fb8',
+                'info_dict': {
+                    'id': 'e624c4bc-3389-34de-9dfc-025f74943409',
+                    'ext': 'mp4',
+                    'title': '\'The Interview\' TV Spot: War',
+                    'description': 'The Interview',
+                    'duration': 30,
+                },
+            }, {
+                'md5': '81bc74faf10750fe36e4542f9a184c66',
+                'info_dict': {
+                    'id': '1fc8ada0-718e-3abe-a450-bf31f246d1a9',
+                    'ext': 'mp4',
+                    'title': '\'The Interview\' TV Spot: Guys',
+                    'description': 'The Interview',
+                    'duration': 30,
+                },
+            }],
          }, {
              'url': 'http://news.yahoo.com/video/china-moses-crazy-blues-104538833.html',
              'md5': '88e209b417f173d86186bef6e4d1f160',
@@ -119,10 +138,11 @@ class YahooIE(InfoExtractor):
                  'title': 'Connect the Dots: Dark Side of Virgo',
                  'description': 'md5:1428185051cfd1949807ad4ff6d3686a',
                  'duration': 201,
-            }
+            },
+            'skip': 'Domain name in.lifestyle.yahoo.com gone',
          }, {
              'url': 'https://www.yahoo.com/movies/v/true-story-trailer-173000497.html',
-            'md5': '989396ae73d20c6f057746fb226aa215',
+            'md5': '2a9752f74cb898af5d1083ea9f661b58',
              'info_dict': {
                  'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1',
                  'ext': 'mp4',
@@ -141,6 +161,9 @@ class YahooIE(InfoExtractor):
                  'ext': 'flv',
                  'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d',
                  'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson',
+                'upload_date': '20150313',
+                'uploader': 'NBCU-SPORTS',
+                'timestamp': 1426270238,
              }
          }, {
              'url': 'https://tw.news.yahoo.com/-100120367.html',
@@ -166,6 +189,44 @@ class YahooIE(InfoExtractor):
                  'description': 'While they play feuding fathers in \'Daddy\'s Home,\' star Will Ferrell & Mark Wahlberg share their true feelings on parenthood.',
              },
          },
+        {
+            # config['models']['applet_model']['data']['sapi'] has no query
+            'url': 'https://www.yahoo.com/music/livenation/event/galactic-2016',
+            'md5': 'dac0c72d502bc5facda80c9e6d5c98db',
+            'info_dict': {
+                'id': 'a6015640-e9e5-3efb-bb60-05589a183919',
+                'ext': 'mp4',
+                'description': 'Galactic',
+                'title': 'Dolla Diva (feat. Maggie Koerner)',
+            },
+            'skip': 'redirect to https://www.yahoo.com/music',
+        },
+        {
+            # yahoo://article/
+            'url': 'https://www.yahoo.com/movies/video/true-story-trailer-173000497.html',
+            'info_dict': {
+                'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1',
+                'ext': 'mp4',
+                'title': "'True Story' Trailer",
+                'description': 'True Story',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        {
+            # ytwnews://cavideo/
+            'url': 'https://tw.video.yahoo.com/movie-tw/單車天使-中文版預-092316541.html',
+            'info_dict': {
+                'id': 'ba133ff2-0793-3510-b636-59dfe9ff6cff',
+                'ext': 'mp4',
+                'title': '單車天使 - 中文版預',
+                'description': '中文版預',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
      ]
  
      def _real_extract(self, url):
@@ -174,23 +235,32 @@ class YahooIE(InfoExtractor):
          page_id = mobj.group('id')
          url = mobj.group('url')
          host = mobj.group('host')
-        webpage = self._download_webpage(url, display_id)
+        webpage, urlh = self._download_webpage_handle(url, display_id)
+        if 'err=404' in urlh.geturl():
+            raise ExtractorError('Video gone', expected=True)
  
          # Look for iframed media first
-        iframe_m = re.search(r'<iframe[^>]+src="(/video/.+?-\d+\.html\?format=embed.*?)"', webpage)
-        if iframe_m:
-            iframepage = self._download_webpage(
-                host + iframe_m.group(1), display_id, 'Downloading iframe webpage')
-            items_json = self._search_regex(
-                r'mediaItems: (\[.+?\])$', iframepage, 'items', flags=re.MULTILINE, default=None)
-            if items_json:
-                items = json.loads(items_json)
-                video_id = items[0]['id']
-                return self._get_info(video_id, display_id, webpage)
+        entries = []
+        iframe_urls = re.findall(r'<iframe[^>]+src="(/video/.+?-\d+\.html\?format=embed.*?)"', webpage)
+        for idx, iframe_url in enumerate(iframe_urls):
+            entries.append(self.url_result(host + iframe_url, 'Yahoo'))
+        if entries:
+            return self.playlist_result(entries, page_id)
+
          # Look for NBCSports iframes
          nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage)
          if nbc_sports_url:
-            return self.url_result(nbc_sports_url, 'NBCSportsVPlayer')
+            return self.url_result(nbc_sports_url, NBCSportsVPlayerIE.ie_key())
+
+        # Look for Brightcove Legacy Studio embeds
+        bc_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
+        if bc_url:
+            return self.url_result(bc_url, BrightcoveLegacyIE.ie_key())
+
+        # Look for Brightcove New Studio embeds
+        bc_url = BrightcoveNewIE._extract_url(webpage)
+        if bc_url:
+            return self.url_result(bc_url, BrightcoveNewIE.ie_key())
  
          # Query result is often embedded in webpage as JSON. Sometimes explicit requests
          # to video API results in a failure with geo restriction reason therefore using
@@ -202,8 +272,10 @@ class YahooIE(InfoExtractor):
              config = self._parse_json(config_json, display_id, fatal=False)
              if config:
                  sapi = config.get('models', {}).get('applet_model', {}).get('data', {}).get('sapi')
-                if sapi:
-                    return self._extract_info(display_id, sapi, webpage)
+                if sapi and 'query' in sapi:
+                    info = self._extract_info(display_id, sapi, webpage)
+                    self._sort_formats(info['formats'])
+                    return info
  
          items_json = self._search_regex(
              r'mediaItems: ({.*?})$', webpage, 'items', flags=re.MULTILINE,
@@ -223,7 +295,8 @@ class YahooIE(InfoExtractor):
                      r'"first_videoid"\s*:\s*"([^"]+)"',
                      r'%s[^}]*"ccm_id"\s*:\s*"([^"]+)"' % re.escape(page_id),
                      r'<article[^>]data-uuid=["\']([^"\']+)',
-                    r'yahoo://article/view\?.*\buuid=([^&"\']+)',
+                    r'<meta[^<>]+yahoo://article/view\?.*\buuid=([^&"\']+)',
+                    r'<meta[^<>]+["\']ytwnews://cavideo/(?:[^/]+/)+([\da-fA-F-]+)[&"\']',
                  ]
                  video_id = self._search_regex(
                      CONTENT_ID_REGEXES, webpage, 'content ID')
@@ -249,15 +322,17 @@ class YahooIE(InfoExtractor):
  
          formats = []
          for s in info['streams']:
+            tbr = int_or_none(s.get('bitrate'))
              format_info = {
                  'width': int_or_none(s.get('width')),
                  'height': int_or_none(s.get('height')),
-                'tbr': int_or_none(s.get('bitrate')),
+                'tbr': tbr,
              }
  
              host = s['host']
              path = s['path']
              if host.startswith('rtmp'):
+                fmt = 'rtmp'
                  format_info.update({
                      'url': host,
                      'play_path': path,
@@ -265,14 +340,18 @@ class YahooIE(InfoExtractor):
                  })
              else:
                  if s.get('format') == 'm3u8_playlist':
-                    format_info['protocol'] = 'm3u8_native'
-                    format_info['ext'] = 'mp4'
+                    fmt = 'hls'
+                    format_info.update({
+                        'protocol': 'm3u8_native',
+                        'ext': 'mp4',
+                    })
+                else:
+                    fmt = format_info['ext'] = determine_ext(path)
                  format_url = compat_urlparse.urljoin(host, path)
                  format_info['url'] = format_url
+            format_info['format_id'] = fmt + ('-%d' % tbr if tbr else '')
              formats.append(format_info)
  
-        self._sort_formats(formats)
-
          closed_captions = self._html_search_regex(
              r'"closedcaptions":(\[[^\]]+\])', webpage, 'closed captions',
              default='[]')
@@ -303,17 +382,25 @@ class YahooIE(InfoExtractor):
      def _get_info(self, video_id, display_id, webpage):
          region = self._search_regex(
              r'\\?"region\\?"\s*:\s*\\?"([^"]+?)\\?"',
-            webpage, 'region', fatal=False, default='US')
-        data = compat_urllib_parse_urlencode({
-            'protocol': 'http',
-            'region': region,
-        })
-        query_url = (
-            'https://video.media.yql.yahoo.com/v1/video/sapi/streams/'
-            '{id}?{data}'.format(id=video_id, data=data))
-        query_result = self._download_json(
-            query_url, display_id, 'Downloading video info')
-        return self._extract_info(display_id, query_result, webpage)
+            webpage, 'region', fatal=False, default='US').upper()
+        formats = []
+        info = {}
+        for fmt in ('webm', 'mp4'):
+            query_result = self._download_json(
+                'https://video.media.yql.yahoo.com/v1/video/sapi/streams/' + video_id,
+                display_id, 'Downloading %s video info' % fmt, query={
+                    'protocol': 'http',
+                    'region': region,
+                    'format': fmt,
+                })
+            info = self._extract_info(display_id, query_result, webpage)
+            formats.extend(info['formats'])
+        formats.extend(self._extract_m3u8_formats(
+            'http://video.media.yql.yahoo.com/v1/hls/%s?region=%s' % (video_id, region),
+            video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+        self._sort_formats(formats)
+        info['formats'] = formats
+        return info
  
  
  class YahooSearchIE(SearchInfoExtractor):
diff --git a/youtube_dl/extractor/yam.py b/youtube_dl/extractor/yam.py

index 63bbc06346a04b385c722eaae22d0ff5c41445f4..ef553554736884ea4b43cb424028b5302c6a872e 100644 (file)
--- a/youtube_dl/extractor/yam.py
+++ b/youtube_dl/extractor/yam.py
@@ -15,7 +15,7 @@ from ..utils import (
  
  class YamIE(InfoExtractor):
      IE_DESC = '蕃薯藤yam天空部落'
-    _VALID_URL = r'https?://mymedia.yam.com/m/(?P<id>\d+)'
+    _VALID_URL = r'https?://mymedia\.yam\.com/m/(?P<id>\d+)'
  
      _TESTS = [{
          # An audio hosted on Yam
diff --git a/youtube_dl/extractor/yandexmusic.py b/youtube_dl/extractor/yandexmusic.py

index 025716958caa292a9060290d194547054c93ca2b..fd6268ba4119d02988172a4771514cc34603db1a 100644 (file)
--- a/youtube_dl/extractor/yandexmusic.py
+++ b/youtube_dl/extractor/yandexmusic.py
@@ -10,17 +10,35 @@ from ..utils import (
      ExtractorError,
      int_or_none,
      float_or_none,
-    sanitized_Request,
-    urlencode_postdata,
  )
  
  
  class YandexMusicBaseIE(InfoExtractor):
      @staticmethod
      def _handle_error(response):
-        error = response.get('error')
-        if error:
-            raise ExtractorError(error, expected=True)
+        if isinstance(response, dict):
+            error = response.get('error')
+            if error:
+                raise ExtractorError(error, expected=True)
+            if response.get('type') == 'captcha' or 'captcha' in response:
+                YandexMusicBaseIE._raise_captcha()
+
+    @staticmethod
+    def _raise_captcha():
+        raise ExtractorError(
+            'YandexMusic has considered youtube-dl requests automated and '
+            'asks you to solve a CAPTCHA. You can either wait for some '
+            'time until unblocked and optionally use --sleep-interval '
+            'in future or alternatively you can go to https://music.yandex.ru/ '
+            'solve CAPTCHA, then export cookies and pass cookie file to '
+            'youtube-dl with --cookies',
+            expected=True)
+
+    def _download_webpage(self, *args, **kwargs):
+        webpage = super(YandexMusicBaseIE, self)._download_webpage(*args, **kwargs)
+        if 'Нам очень жаль, но&nbsp;запросы, поступившие с&nbsp;вашего IP-адреса, похожи на&nbsp;автоматические.' in webpage:
+            self._raise_captcha()
+        return webpage
  
      def _download_json(self, *args, **kwargs):
          response = super(YandexMusicBaseIE, self)._download_json(*args, **kwargs)
@@ -39,10 +57,16 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
          'info_dict': {
              'id': '4878838',
              'ext': 'mp3',
-            'title': 'Carlo Ambrosio - Gypsy Eyes 1',
+            'title': 'Carlo Ambrosio & Fabio Di Bari, Carlo Ambrosio - Gypsy Eyes 1',
              'filesize': 4628061,
              'duration': 193.04,
-        }
+            'track': 'Gypsy Eyes 1',
+            'album': 'Gypsy Soul',
+            'album_artist': 'Carlo Ambrosio',
+            'artist': 'Carlo Ambrosio & Fabio Di Bari, Carlo Ambrosio',
+            'release_year': '2009',
+        },
+        'skip': 'Travis CI servers blocked by YandexMusic',
      }
  
      def _get_track_url(self, storage_dir, track_id):
@@ -51,6 +75,12 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
              % storage_dir,
              track_id, 'Downloading track location JSON')
  
+        # Each string is now wrapped in a list, this is probably only temporarily thus
+        # supporting both scenarios (see https://github.com/rg3/youtube-dl/issues/10193)
+        for k, v in data.items():
+            if v and isinstance(v, list):
+                data[k] = v[0]
+
          key = hashlib.md5(('XGRlBW9FXlekgbPrRHuSiA' + data['path'][1:] + data['s']).encode('utf-8')).hexdigest()
          storage = storage_dir.split('.')
  
@@ -64,16 +94,45 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
              thumbnail = cover_uri.replace('%%', 'orig')
              if not thumbnail.startswith('http'):
                  thumbnail = 'http://' + thumbnail
-        return {
+
+        track_title = track['title']
+        track_info = {
              'id': track['id'],
              'ext': 'mp3',
              'url': self._get_track_url(track['storageDir'], track['id']),
-            'title': '%s - %s' % (track['artists'][0]['name'], track['title']),
              'filesize': int_or_none(track.get('fileSize')),
              'duration': float_or_none(track.get('durationMs'), 1000),
              'thumbnail': thumbnail,
+            'track': track_title,
          }
  
+        def extract_artist(artist_list):
+            if artist_list and isinstance(artist_list, list):
+                artists_names = [a['name'] for a in artist_list if a.get('name')]
+                if artists_names:
+                    return ', '.join(artists_names)
+
+        albums = track.get('albums')
+        if albums and isinstance(albums, list):
+            album = albums[0]
+            if isinstance(album, dict):
+                year = album.get('year')
+                track_info.update({
+                    'album': album.get('title'),
+                    'album_artist': extract_artist(album.get('artists')),
+                    'release_year': compat_str(year) if year else None,
+                })
+
+        track_artist = extract_artist(track.get('artists'))
+        if track_artist:
+            track_info.update({
+                'artist': track_artist,
+                'title': '%s - %s' % (track_artist, track_title),
+            })
+        else:
+            track_info['title'] = track_title
+        return track_info
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          album_id, track_id = mobj.group('album_id'), mobj.group('id')
@@ -105,6 +164,7 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
              'title': 'Carlo Ambrosio - Gypsy Soul (2009)',
          },
          'playlist_count': 50,
+        'skip': 'Travis CI servers blocked by YandexMusic',
      }
  
      def _real_extract(self, url):
@@ -127,7 +187,7 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
  class YandexMusicPlaylistIE(YandexMusicPlaylistBaseIE):
      IE_NAME = 'yandexmusic:playlist'
      IE_DESC = 'Яндекс.Музыка - Плейлист'
-    _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/users/[^/]+/playlists/(?P<id>\d+)'
+    _VALID_URL = r'https?://music\.yandex\.(?P<tld>ru|kz|ua|by)/users/(?P<user>[^/]+)/playlists/(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://music.yandex.ru/users/music.partners/playlists/1245',
@@ -137,6 +197,7 @@ class YandexMusicPlaylistIE(YandexMusicPlaylistBaseIE):
              'description': 'md5:3b9f27b0efbe53f2ee1e844d07155cc9',
          },
          'playlist_count': 6,
+        'skip': 'Travis CI servers blocked by YandexMusic',
      }, {
          # playlist exceeding the limit of 150 tracks shipped with webpage (see
          # https://github.com/rg3/youtube-dl/issues/6666)
@@ -145,46 +206,64 @@ class YandexMusicPlaylistIE(YandexMusicPlaylistBaseIE):
              'id': '1036',
              'title': 'Музыка 90-х',
          },
-        'playlist_count': 310,
+        'playlist_mincount': 300,
+        'skip': 'Travis CI servers blocked by YandexMusic',
      }]
  
      def _real_extract(self, url):
-        playlist_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, playlist_id)
-
-        mu = self._parse_json(
-            self._search_regex(
-                r'var\s+Mu\s*=\s*({.+?});\s*</script>', webpage, 'player'),
-            playlist_id)
-
-        playlist = mu['pageData']['playlist']
-        tracks, track_ids = playlist['tracks'], playlist['trackIds']
-
-        # tracks dictionary shipped with webpage is limited to 150 tracks,
+        mobj = re.match(self._VALID_URL, url)
+        tld = mobj.group('tld')
+        user = mobj.group('user')
+        playlist_id = mobj.group('id')
+
+        playlist = self._download_json(
+            'https://music.yandex.%s/handlers/playlist.jsx' % tld,
+            playlist_id, 'Downloading missing tracks JSON',
+            fatal=False,
+            headers={
+                'Referer': url,
+                'X-Requested-With': 'XMLHttpRequest',
+                'X-Retpath-Y': url,
+            },
+            query={
+                'owner': user,
+                'kinds': playlist_id,
+                'light': 'true',
+                'lang': tld,
+                'external-domain': 'music.yandex.%s' % tld,
+                'overembed': 'false',
+            })['playlist']
+
+        tracks, track_ids = playlist['tracks'], map(compat_str, playlist['trackIds'])
+
+        # tracks dictionary shipped with playlist.jsx API is limited to 150 tracks,
          # missing tracks should be retrieved manually.
          if len(tracks) < len(track_ids):
-            present_track_ids = set([compat_str(track['id']) for track in tracks if track.get('id')])
-            missing_track_ids = set(map(compat_str, track_ids)) - set(present_track_ids)
-            request = sanitized_Request(
-                'https://music.yandex.ru/handlers/track-entries.jsx',
-                urlencode_postdata({
+            present_track_ids = set([
+                compat_str(track['id'])
+                for track in tracks if track.get('id')])
+            missing_track_ids = [
+                track_id for track_id in track_ids
+                if track_id not in present_track_ids]
+            missing_tracks = self._download_json(
+                'https://music.yandex.%s/handlers/track-entries.jsx' % tld,
+                playlist_id, 'Downloading missing tracks JSON',
+                fatal=False,
+                headers={
+                    'Referer': url,
+                    'X-Requested-With': 'XMLHttpRequest',
+                },
+                query={
                      'entries': ','.join(missing_track_ids),
-                    'lang': mu.get('settings', {}).get('lang', 'en'),
-                    'external-domain': 'music.yandex.ru',
+                    'lang': tld,
+                    'external-domain': 'music.yandex.%s' % tld,
                      'overembed': 'false',
-                    'sign': mu.get('authData', {}).get('user', {}).get('sign'),
                      'strict': 'true',
-                }))
-            request.add_header('Referer', url)
-            request.add_header('X-Requested-With', 'XMLHttpRequest')
-
-            missing_tracks = self._download_json(
-                request, playlist_id, 'Downloading missing tracks JSON', fatal=False)
+                })
              if missing_tracks:
                  tracks.extend(missing_tracks)
  
          return self.playlist_result(
              self._build_playlist(tracks),
              compat_str(playlist_id),
-            playlist['title'], playlist.get('description'))
+            playlist.get('title'), playlist.get('description'))
diff --git a/youtube_dl/extractor/youjizz.py b/youtube_dl/extractor/youjizz.py

index 4150b28daffad5c8cae227c4bf76a125733ced73..b50f34e9bb30e47c679940ca1577ea8cc6683934 100644 (file)
--- a/youtube_dl/extractor/youjizz.py
+++ b/youtube_dl/extractor/youjizz.py
@@ -1,61 +1,39 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-)
  
  
  class YouJizzIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:\w+\.)?youjizz\.com/videos/[^/#?]+-(?P<id>[0-9]+)\.html(?:$|[?#])'
-    _TEST = {
+    _VALID_URL = r'https?://(?:\w+\.)?youjizz\.com/videos/(?:[^/#?]+)?-(?P<id>[0-9]+)\.html(?:$|[?#])'
+    _TESTS = [{
          'url': 'http://www.youjizz.com/videos/zeichentrick-1-2189178.html',
-        'md5': '07e15fa469ba384c7693fd246905547c',
+        'md5': '78fc1901148284c69af12640e01c6310',
          'info_dict': {
              'id': '2189178',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Zeichentrick 1',
              'age_limit': 18,
          }
-    }
+    }, {
+        'url': 'http://www.youjizz.com/videos/-2189178.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
+        # YouJizz's HTML5 player has invalid HTML
+        webpage = webpage.replace('"controls', '" controls')
          age_limit = self._rta_search(webpage)
          video_title = self._html_search_regex(
              r'<title>\s*(.*)\s*</title>', webpage, 'title')
  
-        embed_page_url = self._search_regex(
-            r'(https?://www.youjizz.com/videos/embed/[0-9]+)',
-            webpage, 'embed page')
-        webpage = self._download_webpage(
-            embed_page_url, video_id, note='downloading embed page')
-
-        # Get the video URL
-        m_playlist = re.search(r'so.addVariable\("playlist", ?"(?P<playlist>.+?)"\);', webpage)
-        if m_playlist is not None:
-            playlist_url = m_playlist.group('playlist')
-            playlist_page = self._download_webpage(playlist_url, video_id,
-                                                   'Downloading playlist page')
-            m_levels = list(re.finditer(r'<level bitrate="(\d+?)" file="(.*?)"', playlist_page))
-            if len(m_levels) == 0:
-                raise ExtractorError('Unable to extract video url')
-            videos = [(int(m.group(1)), m.group(2)) for m in m_levels]
-            (_, video_url) = sorted(videos)[0]
-            video_url = video_url.replace('%252F', '%2F')
-        else:
-            video_url = self._search_regex(r'so.addVariable\("file",encodeURIComponent\("(?P<source>[^"]+)"\)\);',
-                                           webpage, 'video URL')
+        info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
  
-        return {
+        info_dict.update({
              'id': video_id,
-            'url': video_url,
              'title': video_title,
-            'ext': 'flv',
-            'format': 'flv',
-            'player_url': embed_page_url,
              'age_limit': age_limit,
-        }
+        })
+
+        return info_dict
diff --git a/youtube_dl/extractor/youku.py b/youtube_dl/extractor/youku.py

index fd7eb5a6d52f8c2ec348d3cfc908dee3d4743f0d..e37f237c76c6880eb1d442e8302dcef558d0e9d8 100644 (file)
--- a/youtube_dl/extractor/youku.py
+++ b/youtube_dl/extractor/youku.py
@@ -2,7 +2,9 @@
  from __future__ import unicode_literals
  
  import base64
+import itertools
  import random
+import re
  import string
  import time
  
@@ -13,7 +15,7 @@ from ..compat import (
  )
  from ..utils import (
      ExtractorError,
-    sanitized_Request,
+    get_element_by_attribute,
  )
  
  
@@ -64,6 +66,14 @@ class YoukuIE(InfoExtractor):
          'params': {
              'videopassword': '100600',
          },
+    }, {
+        # /play/get.json contains streams with "channel_type":"tail"
+        'url': 'http://v.youku.com/v_show/id_XOTUxMzg4NDMy.html',
+        'info_dict': {
+            'id': 'XOTUxMzg4NDMy',
+            'title': '我的世界☆明月庄主☆车震猎杀☆杀人艺术Minecraft',
+        },
+        'playlist_count': 6,
      }]
  
      def construct_video_urls(self, data):
@@ -92,6 +102,8 @@ class YoukuIE(InfoExtractor):
  
          fileid_dict = {}
          for stream in data['stream']:
+            if stream.get('channel_type') == 'tail':
+                continue
              format = stream.get('stream_type')
              fileid = stream['stream_fileid']
              fileid_dict[format] = fileid
@@ -117,6 +129,8 @@ class YoukuIE(InfoExtractor):
          # generate video_urls
          video_urls_dict = {}
          for stream in data['stream']:
+            if stream.get('channel_type') == 'tail':
+                continue
              format = stream.get('stream_type')
              video_urls = []
              for dt in stream['segs']:
@@ -203,14 +217,10 @@ class YoukuIE(InfoExtractor):
              headers = {
                  'Referer': req_url,
              }
+            headers.update(self.geo_verification_headers())
              self._set_cookie('youku.com', 'xreferrer', 'http://www.youku.com')
-            req = sanitized_Request(req_url, headers=headers)
  
-            cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
-            if cn_verification_proxy:
-                req.add_header('Ytdl-request-proxy', cn_verification_proxy)
-
-            raw_data = self._download_json(req, video_id, note=note)
+            raw_data = self._download_json(req_url, video_id, note=note, headers=headers)
  
              return raw_data['data']
  
@@ -253,6 +263,8 @@ class YoukuIE(InfoExtractor):
              # which one has all
          } for i in range(max(len(v.get('segs')) for v in data['stream']))]
          for stream in data['stream']:
+            if stream.get('channel_type') == 'tail':
+                continue
              fm = stream.get('stream_type')
              video_urls = video_urls_dict[fm]
              for video_url, seg, entry in zip(video_urls, stream['segs'], entries):
@@ -261,6 +273,8 @@ class YoukuIE(InfoExtractor):
                      'format_id': self.get_format_name(fm),
                      'ext': self.parse_ext_l(fm),
                      'filesize': int(seg['size']),
+                    'width': stream.get('width'),
+                    'height': stream.get('height'),
                  })
  
          return {
@@ -269,3 +283,52 @@ class YoukuIE(InfoExtractor):
              'title': title,
              'entries': entries,
          }
+
+
+class YoukuShowIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?youku\.com/show_page/id_(?P<id>[0-9a-z]+)\.html'
+    IE_NAME = 'youku:show'
+
+    _TEST = {
+        'url': 'http://www.youku.com/show_page/id_zc7c670be07ff11e48b3f.html',
+        'info_dict': {
+            'id': 'zc7c670be07ff11e48b3f',
+            'title': '花千骨 未删减版',
+            'description': 'md5:578d4f2145ae3f9128d9d4d863312910',
+        },
+        'playlist_count': 50,
+    }
+
+    _PAGE_SIZE = 40
+
+    def _find_videos_in_page(self, webpage):
+        videos = re.findall(
+            r'<li><a[^>]+href="(?P<url>https?://v\.youku\.com/[^"]+)"[^>]+title="(?P<title>[^"]+)"', webpage)
+        return [
+            self.url_result(video_url, YoukuIE.ie_key(), title)
+            for video_url, title in videos]
+
+    def _real_extract(self, url):
+        show_id = self._match_id(url)
+        webpage = self._download_webpage(url, show_id)
+
+        entries = self._find_videos_in_page(webpage)
+
+        playlist_title = self._html_search_regex(
+            r'<span[^>]+class="name">([^<]+)</span>', webpage, 'playlist title', fatal=False)
+        detail_div = get_element_by_attribute('class', 'detail', webpage) or ''
+        playlist_description = self._html_search_regex(
+            r'<span[^>]+style="display:none"[^>]*>([^<]+)</span>',
+            detail_div, 'playlist description', fatal=False)
+
+        for idx in itertools.count(1):
+            episodes_page = self._download_webpage(
+                'http://www.youku.com/show_episode/id_%s.html' % show_id,
+                show_id, query={'divid': 'reload_%d' % (idx * self._PAGE_SIZE + 1)},
+                note='Downloading episodes page %d' % idx)
+            new_entries = self._find_videos_in_page(episodes_page)
+            entries.extend(new_entries)
+            if len(new_entries) < self._PAGE_SIZE:
+                break
+
+        return self.playlist_result(entries, show_id, playlist_title, playlist_description)
diff --git a/youtube_dl/extractor/youporn.py b/youtube_dl/extractor/youporn.py

index 1124fe6c280cb0e23bee3a41ea323165ec714dce..0265a64a7d3c014001b2d0e81789f0e904b32d62 100644 (file)
--- a/youtube_dl/extractor/youporn.py
+++ b/youtube_dl/extractor/youporn.py
@@ -17,7 +17,7 @@ class YouPornIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?youporn\.com/watch/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
      _TESTS = [{
          'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
-        'md5': '71ec5fcfddacf80f495efa8b6a8d9a89',
+        'md5': '3744d24c50438cf5b6f6d59feb5055c2',
          'info_dict': {
              'id': '505835',
              'display_id': 'sex-ed-is-it-safe-to-masturbate-daily',
@@ -35,7 +35,7 @@ class YouPornIE(InfoExtractor):
              'age_limit': 18,
          },
      }, {
-        # Anonymous User uploader
+        # Unknown uploader
          'url': 'http://www.youporn.com/watch/561726/big-tits-awesome-brunette-on-amazing-webcam-show/?from=related3&al=2&from_id=561726&pos=4',
          'info_dict': {
              'id': '561726',
@@ -44,7 +44,7 @@ class YouPornIE(InfoExtractor):
              'title': 'Big Tits Awesome Brunette On amazing webcam show',
              'description': 'http://sweetlivegirls.com Big Tits Awesome Brunette On amazing webcam show.mp4',
              'thumbnail': 're:^https?://.*\.jpg$',
-            'uploader': 'Anonymous User',
+            'uploader': 'Unknown',
              'upload_date': '20111125',
              'average_rating': int,
              'view_count': int,
@@ -121,36 +121,36 @@ class YouPornIE(InfoExtractor):
              webpage, 'thumbnail', fatal=False, group='thumbnail')
  
          uploader = self._html_search_regex(
-            r'(?s)<div[^>]+class=["\']videoInfoBy(?:\s+[^"\']+)?["\'][^>]*>\s*By:\s*</div>(.+?)</(?:a|div)>',
+            r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>',
              webpage, 'uploader', fatal=False)
          upload_date = unified_strdate(self._html_search_regex(
-            r'(?s)<div[^>]+class=["\']videoInfoTime["\'][^>]*>(.+?)</div>',
+            r'(?s)<div[^>]+class=["\']videoInfo(?:Date|Time)["\'][^>]*>(.+?)</div>',
              webpage, 'upload date', fatal=False))
  
          age_limit = self._rta_search(webpage)
  
          average_rating = int_or_none(self._search_regex(
-            r'<div[^>]+class=["\']videoInfoRating["\'][^>]*>\s*<div[^>]+class=["\']videoRatingPercentage["\'][^>]*>(\d+)%</div>',
+            r'<div[^>]+class=["\']videoRatingPercentage["\'][^>]*>(\d+)%</div>',
              webpage, 'average rating', fatal=False))
  
          view_count = str_to_int(self._search_regex(
-            r'(?s)<div[^>]+class=["\']videoInfoViews["\'][^>]*>.*?([\d,.]+)\s*</div>',
-            webpage, 'view count', fatal=False))
+            r'(?s)<div[^>]+class=(["\']).*?\bvideoInfoViews\b.*?\1[^>]*>.*?(?P<count>[\d,.]+)<',
+            webpage, 'view count', fatal=False, group='count'))
          comment_count = str_to_int(self._search_regex(
              r'>All [Cc]omments? \(([\d,.]+)\)',
              webpage, 'comment count', fatal=False))
  
-        def extract_tag_box(title):
-            tag_box = self._search_regex(
-                (r'<div[^>]+class=["\']tagBoxTitle["\'][^>]*>\s*%s\b.*?</div>\s*'
-                 '<div[^>]+class=["\']tagBoxContent["\']>(.+?)</div>') % re.escape(title),
-                webpage, '%s tag box' % title, default=None)
+        def extract_tag_box(regex, title):
+            tag_box = self._search_regex(regex, webpage, title, default=None)
              if not tag_box:
                  return []
              return re.findall(r'<a[^>]+href=[^>]+>([^<]+)', tag_box)
  
-        categories = extract_tag_box('Category')
-        tags = extract_tag_box('Tags')
+        categories = extract_tag_box(
+            r'(?s)Categories:.*?</[^>]+>(.+?)</div>', 'categories')
+        tags = extract_tag_box(
+            r'(?s)Tags:.*?</div>\s*<div[^>]+class=["\']tagBoxContent["\'][^>]*>(.+?)</div>',
+            'tags')
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index 28355bf4607b76c9b75da617803ff58d248e4b82..7ccb875a5035dfd192e5362f1334e5b830f247e3 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -53,6 +53,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
      """Provide base functions for Youtube extractors"""
      _LOGIN_URL = 'https://accounts.google.com/ServiceLogin'
      _TWOFACTOR_URL = 'https://accounts.google.com/signin/challenge'
+    _PASSWORD_CHALLENGE_URL = 'https://accounts.google.com/signin/challenge/sl/password'
      _NETRC_MACHINE = 'youtube'
      # If True it will raise an error if no login info is provided
      _LOGIN_REQUIRED = False
@@ -90,48 +91,34 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
          if login_page is False:
              return
  
-        galx = self._search_regex(r'(?s)<input.+?name="GALX".+?value="(.+?)"',
-                                  login_page, 'Login GALX parameter')
+        login_form = self._hidden_inputs(login_page)
  
-        # Log in
-        login_form_strs = {
-            'continue': 'https://www.youtube.com/signin?action_handle_signin=true&feature=sign_in_button&hl=en_US&nomobiletemp=1',
+        login_form.update({
+            'checkConnection': 'youtube',
              'Email': username,
-            'GALX': galx,
              'Passwd': password,
+        })
  
-            'PersistentCookie': 'yes',
-            '_utf8': '霱',
-            'bgresponse': 'js_disabled',
-            'checkConnection': '',
-            'checkedDomains': 'youtube',
-            'dnConn': '',
-            'pstMsg': '0',
-            'rmShown': '1',
-            'secTok': '',
-            'signIn': 'Sign in',
-            'timeStmp': '',
-            'service': 'youtube',
-            'uilel': '3',
-            'hl': 'en_US',
-        }
-
-        login_data = urlencode_postdata(login_form_strs)
-
-        req = sanitized_Request(self._LOGIN_URL, login_data)
          login_results = self._download_webpage(
-            req, None,
-            note='Logging in', errnote='unable to log in', fatal=False)
+            self._PASSWORD_CHALLENGE_URL, None,
+            note='Logging in', errnote='unable to log in', fatal=False,
+            data=urlencode_postdata(login_form))
          if login_results is False:
              return False
  
+        error_msg = self._html_search_regex(
+            r'<[^>]+id="errormsg_0_Passwd"[^>]*>([^<]+)<',
+            login_results, 'error message', default=None)
+        if error_msg:
+            raise ExtractorError('Unable to login: %s' % error_msg, expected=True)
+
          if re.search(r'id="errormsg_0_Passwd"', login_results) is not None:
              raise ExtractorError('Please use your account password and a two-factor code instead of an application-specific password.', expected=True)
  
          # Two-Factor
          # TODO add SMS and phone call support - these require making a request and then prompting the user
  
-        if re.search(r'(?i)<form[^>]* id="challenge"', login_results) is not None:
+        if re.search(r'(?i)<form[^>]+id="challenge"', login_results) is not None:
              tfa_code = self._get_tfa_info('2-step verification code')
  
              if not tfa_code:
@@ -159,17 +146,17 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
              if tfa_results is False:
                  return False
  
-            if re.search(r'(?i)<form[^>]* id="challenge"', tfa_results) is not None:
+            if re.search(r'(?i)<form[^>]+id="challenge"', tfa_results) is not None:
                  self._downloader.report_warning('Two-factor code expired or invalid. Please try again, or use a one-use backup code instead.')
                  return False
-            if re.search(r'(?i)<form[^>]* id="gaia_loginform"', tfa_results) is not None:
+            if re.search(r'(?i)<form[^>]+id="gaia_loginform"', tfa_results) is not None:
                  self._downloader.report_warning('unable to log in - did the page structure change?')
                  return False
              if re.search(r'smsauth-interstitial-reviewsettings', tfa_results) is not None:
                  self._downloader.report_warning('Your Google account has a security notice. Please log in on your web browser, resolve the notice, and try again.')
                  return False
  
-        if re.search(r'(?i)<form[^>]* id="gaia_loginform"', login_results) is not None:
+        if re.search(r'(?i)<form[^>]+id="gaia_loginform"', login_results) is not None:
              self._downloader.report_warning('unable to log in: bad username or password')
              return False
          return True
@@ -270,13 +257,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                           ))
                           |(?:
                              youtu\.be|                                        # just youtu.be/xxxx
-                            vid\.plus                                         # or vid.plus/xxxx
+                            vid\.plus|                                        # or vid.plus/xxxx
+                            zwearz\.com/watch|                                # or zwearz.com/watch/xxxx
                           )/
                           |(?:www\.)?cleanvideosearch\.com/media/action/yt/watch\?videoId=
                           )
                       )?                                                       # all until now is optional -> you can pass the naked ID
                       ([0-9A-Za-z_-]{11})                                      # here is it! the YouTube video ID
-                     (?!.*?&list=)                                            # combined list/video URLs are handled by the playlist IE
+                     (?!.*?\blist=)                                            # combined list/video URLs are handled by the playlist IE
                       (?(1).+)?                                                # if we found the ID, everything can follow
                       $"""
      _NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
@@ -337,6 +325,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          '139': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 48, 'preference': -50, 'container': 'm4a_dash'},
          '140': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 128, 'preference': -50, 'container': 'm4a_dash'},
          '141': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 256, 'preference': -50, 'container': 'm4a_dash'},
+        '256': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'preference': -50, 'container': 'm4a_dash'},
+        '258': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'preference': -50, 'container': 'm4a_dash'},
  
          # Dash webm
          '167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
@@ -379,7 +369,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
      IE_NAME = 'youtube'
      _TESTS = [
          {
-            'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9',
+            'url': 'https://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9',
              'info_dict': {
                  'id': 'BaW_jenozKc',
                  'ext': 'mp4',
@@ -399,7 +389,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              }
          },
          {
-            'url': 'http://www.youtube.com/watch?v=UxxajLWwzqY',
+            'url': 'https://www.youtube.com/watch?v=UxxajLWwzqY',
              'note': 'Test generic use_cipher_signature video (#897)',
              'info_dict': {
                  'id': 'UxxajLWwzqY',
@@ -453,7 +443,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              }
          },
          {
-            'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&v=UxxajLWwzqY',
+            'url': 'https://www.youtube.com/watch?v=BaW_jenozKc&v=UxxajLWwzqY',
              'note': 'Use the first video ID in the URL',
              'info_dict': {
                  'id': 'BaW_jenozKc',
@@ -475,7 +465,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              },
          },
          {
-            'url': 'http://www.youtube.com/watch?v=a9LDPn-MO4I',
+            'url': 'https://www.youtube.com/watch?v=a9LDPn-MO4I',
              'note': '256k DASH audio (format 141) via DASH manifest',
              'info_dict': {
                  'id': 'a9LDPn-MO4I',
@@ -492,6 +482,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'youtube_include_dash_manifest': True,
                  'format': '141',
              },
+            'skip': 'format 141 not served anymore',
          },
          # DASH manifest with encrypted signature
          {
@@ -508,7 +499,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              },
              'params': {
                  'youtube_include_dash_manifest': True,
-                'format': '141',
+                'format': '141/bestaudio[ext=m4a]',
              },
          },
          # JS player signature function name containing $
@@ -528,7 +519,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              },
              'params': {
                  'youtube_include_dash_manifest': True,
-                'format': '141',
+                'format': '141/bestaudio[ext=m4a]',
              },
          },
          # Controversy video
@@ -548,7 +539,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          },
          # Normal age-gate video (No vevo, embed allowed)
          {
-            'url': 'http://youtube.com/watch?v=HtVdAasjOgU',
+            'url': 'https://youtube.com/watch?v=HtVdAasjOgU',
              'info_dict': {
                  'id': 'HtVdAasjOgU',
                  'ext': 'mp4',
@@ -564,7 +555,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          },
          # Age-gate video with encrypted signature
          {
-            'url': 'http://www.youtube.com/watch?v=6kLq3WMV1nU',
+            'url': 'https://www.youtube.com/watch?v=6kLq3WMV1nU',
              'info_dict': {
                  'id': '6kLq3WMV1nU',
                  'ext': 'mp4',
@@ -609,7 +600,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/olympic',
                  'license': 'Standard YouTube License',
                  'description': 'HO09  - Women -  GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
-                'uploader': 'Olympics',
+                'uploader': 'Olympic',
                  'title': 'Hockey - Women -  GER-AUS - London 2012 Olympic Games',
              },
              'params': {
@@ -662,7 +653,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
                  'uploader': 'dorappi2000',
                  'license': 'Standard YouTube License',
-                'formats': 'mincount:33',
+                'formats': 'mincount:32',
              },
          },
          # DASH manifest with segment_list
@@ -682,7 +673,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'params': {
                  'youtube_include_dash_manifest': True,
                  'format': '135',  # bestvideo
-            }
+            },
+            'skip': 'This live event has ended.',
          },
          {
              # Multifeed videos (multiple cameras), URL is for Main Camera
@@ -753,9 +745,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'DevConf.cz 2016 Day 2 Workshops 1 14:00 - 15:30',
              },
              'playlist_count': 2,
+            'skip': 'Not multifeed anymore',
+        },
+        {
+            'url': 'https://vid.plus/FlRa-iH7PGw',
+            'only_matching': True,
          },
          {
-            'url': 'http://vid.plus/FlRa-iH7PGw',
+            'url': 'https://zwearz.com/watch/9lWxNJF-ufM/electra-woman-dyna-girl-official-trailer-grace-helbig.html',
              'only_matching': True,
          },
          {
@@ -801,6 +798,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'params': {
                  'skip_download': True,
              },
+            'skip': 'This video does not exist.',
          },
          {
              # Video licensed under Creative Commons
@@ -841,6 +839,29 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          {
              'url': 'https://www.youtube.com/watch?feature=player_embedded&amp;amp;v=V36LpHqtcDY',
              'only_matching': True,
+        },
+        {
+            # YouTube Red paid video (https://github.com/rg3/youtube-dl/issues/10059)
+            'url': 'https://www.youtube.com/watch?v=i1Ko8UG-Tdo',
+            'only_matching': True,
+        },
+        {
+            # Rental video preview
+            'url': 'https://www.youtube.com/watch?v=yYr8q0y5Jfg',
+            'info_dict': {
+                'id': 'uGpuVWrhIzE',
+                'ext': 'mp4',
+                'title': 'Piku - Trailer',
+                'description': 'md5:c36bd60c3fd6f1954086c083c72092eb',
+                'upload_date': '20150811',
+                'uploader': 'FlixMatrix',
+                'uploader_id': 'FlixMatrixKaravan',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/FlixMatrixKaravan',
+                'license': 'Standard YouTube License',
+            },
+            'params': {
+                'skip_download': True,
+            },
          }
      ]
  
@@ -1251,6 +1272,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      # Convert to the same format returned by compat_parse_qs
                      video_info = dict((k, [v]) for k, v in args.items())
                      add_dash_mpd(video_info)
+                # Rental video is not rented but preview is available (e.g.
+                # https://www.youtube.com/watch?v=yYr8q0y5Jfg,
+                # https://github.com/rg3/youtube-dl/issues/10532)
+                if not video_info and args.get('ypc_vid'):
+                    return self.url_result(
+                        args['ypc_vid'], YoutubeIE.ie_key(), video_id=args['ypc_vid'])
                  if args.get('livestream') == '1' or args.get('live_playback') == 1:
                      is_live = True
              if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
@@ -1315,10 +1342,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          if video_description:
              video_description = re.sub(r'''(?x)
                  <a\s+
-                    (?:[a-zA-Z-]+="[^"]+"\s+)*?
+                    (?:[a-zA-Z-]+="[^"]*"\s+)*?
                      (?:title|href)="([^"]+)"\s+
-                    (?:[a-zA-Z-]+="[^"]+"\s+)*?
-                    class="(?:yt-uix-redirect-link|yt-uix-sessionlink[^"]*)"[^>]*>
+                    (?:[a-zA-Z-]+="[^"]*"\s+)*?
+                    class="[^"]*"[^>]*>
                  [^<]+\.{3}\s*
                  </a>
              ''', r'\1', video_description)
@@ -1713,16 +1740,52 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          }
  
  
+class YoutubeSharedVideoIE(InfoExtractor):
+    _VALID_URL = r'(?:https?:)?//(?:www\.)?youtube\.com/shared\?.*\bci=(?P<id>[0-9A-Za-z_-]{11})'
+    IE_NAME = 'youtube:shared'
+
+    _TEST = {
+        'url': 'https://www.youtube.com/shared?ci=1nEzmT-M4fU',
+        'info_dict': {
+            'id': 'uPDB5I9wfp8',
+            'ext': 'webm',
+            'title': 'Pocoyo: 90 minutos de episódios completos Português para crianças - PARTE 3',
+            'description': 'md5:d9e4d9346a2dfff4c7dc4c8cec0f546d',
+            'upload_date': '20160219',
+            'uploader': 'Pocoyo - Português (BR)',
+            'uploader_id': 'PocoyoBrazil',
+        },
+        'add_ie': ['Youtube'],
+        'params': {
+            # There are already too many Youtube downloads
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        real_video_id = self._html_search_meta(
+            'videoId', webpage, 'YouTube video id', fatal=True)
+
+        return self.url_result(real_video_id, YoutubeIE.ie_key())
+
+
  class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
      IE_DESC = 'YouTube.com playlists'
      _VALID_URL = r"""(?x)(?:
                          (?:https?://)?
                          (?:\w+\.)?
-                        youtube\.com/
                          (?:
-                           (?:course|view_play_list|my_playlists|artist|playlist|watch|embed/videoseries)
-                           \? (?:.*?[&;])*? (?:p|a|list)=
-                        |  p/
+                            youtube\.com/
+                            (?:
+                               (?:course|view_play_list|my_playlists|artist|playlist|watch|embed/videoseries)
+                               \? (?:.*?[&;])*? (?:p|a|list)=
+                            |  p/
+                            )|
+                            youtu\.be/[0-9A-Za-z_-]{11}\?.*?\blist=
                          )
                          (
                              (?:PL|LL|EC|UU|FL|RD|UL)?[0-9A-Za-z-_]{10,}
@@ -1783,7 +1846,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
          'playlist_count': 2,
      }, {
          'note': 'embedded',
-        'url': 'http://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
+        'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
          'playlist_count': 4,
          'info_dict': {
              'title': 'JODA15',
@@ -1791,7 +1854,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
          }
      }, {
          'note': 'Embedded SWF player',
-        'url': 'http://www.youtube.com/p/YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ?hl=en_US&fs=1&rel=0',
+        'url': 'https://www.youtube.com/p/YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ?hl=en_US&fs=1&rel=0',
          'playlist_count': 4,
          'info_dict': {
              'title': 'JODA7',
@@ -1804,7 +1867,53 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'title': 'Uploads from Interstellar Movie',
              'id': 'UUXw-G3eDE9trcvY2sBMM_aA',
          },
-        'playlist_mincout': 21,
+        'playlist_mincount': 21,
+    }, {
+        # Playlist URL that does not actually serve a playlist
+        'url': 'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4',
+        'info_dict': {
+            'id': 'FqZTN594JQw',
+            'ext': 'webm',
+            'title': "Smiley's People 01 detective, Adventure Series, Action",
+            'uploader': 'STREEM',
+            'uploader_id': 'UCyPhqAZgwYWZfxElWVbVJng',
+            'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCyPhqAZgwYWZfxElWVbVJng',
+            'upload_date': '20150526',
+            'license': 'Standard YouTube License',
+            'description': 'md5:507cdcb5a49ac0da37a920ece610be80',
+            'categories': ['People & Blogs'],
+            'tags': list,
+            'like_count': int,
+            'dislike_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': [YoutubeIE.ie_key()],
+    }, {
+        'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
+        'info_dict': {
+            'id': 'yeWKywCrFtk',
+            'ext': 'mp4',
+            'title': 'Small Scale Baler and Braiding Rugs',
+            'uploader': 'Backus-Page House Museum',
+            'uploader_id': 'backuspagemuseum',
+            'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
+            'upload_date': '20161008',
+            'license': 'Standard YouTube License',
+            'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
+            'categories': ['Nonprofits & Activism'],
+            'tags': list,
+            'like_count': int,
+            'dislike_count': int,
+        },
+        'params': {
+            'noplaylist': True,
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
+        'only_matching': True,
      }]
  
      def _real_initialize(self):
@@ -1813,20 +1922,32 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
      def _extract_mix(self, playlist_id):
          # The mixes are generated from a single video
          # the id of the playlist is just 'RD' + video_id
-        url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[-11:], playlist_id)
-        webpage = self._download_webpage(
-            url, playlist_id, 'Downloading Youtube mix')
+        ids = []
+        last_id = playlist_id[-11:]
+        for n in itertools.count(1):
+            url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
+            webpage = self._download_webpage(
+                url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
+            new_ids = orderedSet(re.findall(
+                r'''(?xs)data-video-username=".*?".*?
+                           href="/watch\?v=([0-9A-Za-z_-]{11})&amp;[^"]*?list=%s''' % re.escape(playlist_id),
+                webpage))
+            # Fetch new pages until all the videos are repeated, it seems that
+            # there are always 51 unique videos.
+            new_ids = [_id for _id in new_ids if _id not in ids]
+            if not new_ids:
+                break
+            ids.extend(new_ids)
+            last_id = ids[-1]
+
+        url_results = self._ids_to_results(ids)
+
          search_title = lambda class_name: get_element_by_attribute('class', class_name, webpage)
          title_span = (
              search_title('playlist-title') or
              search_title('title long-title') or
              search_title('title'))
          title = clean_html(title_span)
-        ids = orderedSet(re.findall(
-            r'''(?xs)data-video-username=".*?".*?
-                       href="/watch\?v=([0-9A-Za-z_-]{11})&amp;[^"]*?list=%s''' % re.escape(playlist_id),
-            webpage))
-        url_results = self._ids_to_results(ids)
  
          return self.playlist_result(url_results, playlist_id, title)
  
@@ -1853,20 +1974,35 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
  
          playlist_title = self._html_search_regex(
              r'(?s)<h1 class="pl-header-title[^"]*"[^>]*>\s*(.*?)\s*</h1>',
-            page, 'title')
+            page, 'title', default=None)
+
+        has_videos = True
+
+        if not playlist_title:
+            try:
+                # Some playlist URLs don't actually serve a playlist (e.g.
+                # https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4)
+                next(self._entries(page, playlist_id))
+            except StopIteration:
+                has_videos = False
  
-        return self.playlist_result(self._entries(page, playlist_id), playlist_id, playlist_title)
+        return has_videos, self.playlist_result(
+            self._entries(page, playlist_id), playlist_id, playlist_title)
  
      def _check_download_just_video(self, url, playlist_id):
          # Check if it's a video-specific URL
          query_dict = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
-        if 'v' in query_dict:
-            video_id = query_dict['v'][0]
+        video_id = query_dict.get('v', [None])[0] or self._search_regex(
+            r'(?:^|//)youtu\.be/([0-9A-Za-z_-]{11})', url,
+            'video id', default=None)
+        if video_id:
              if self._downloader.params.get('noplaylist'):
                  self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
-                return self.url_result(video_id, 'Youtube', video_id=video_id)
+                return video_id, self.url_result(video_id, 'Youtube', video_id=video_id)
              else:
                  self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
+                return video_id, None
+        return None, None
  
      def _real_extract(self, url):
          # Extract playlist id
@@ -1875,15 +2011,23 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              raise ExtractorError('Invalid URL: %s' % url)
          playlist_id = mobj.group(1) or mobj.group(2)
  
-        video = self._check_download_just_video(url, playlist_id)
+        video_id, video = self._check_download_just_video(url, playlist_id)
          if video:
              return video
  
-        if playlist_id.startswith('RD') or playlist_id.startswith('UL'):
+        if playlist_id.startswith(('RD', 'UL', 'PU')):
              # Mixes require a custom extraction process
              return self._extract_mix(playlist_id)
  
-        return self._extract_playlist(playlist_id)
+        has_videos, playlist = self._extract_playlist(playlist_id)
+        if has_videos or not video_id:
+            return playlist
+
+        # Some playlist URLs don't actually serve a playlist (see
+        # https://github.com/rg3/youtube-dl/issues/10537).
+        # Fallback to plain video extraction if there is a video id
+        # along with playlist id.
+        return self.url_result(video_id, 'Youtube', video_id=video_id)
  
  
  class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
@@ -1916,10 +2060,13 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
          return (False if YoutubePlaylistsIE.suitable(url) or YoutubeLiveIE.suitable(url)
                  else super(YoutubeChannelIE, cls).suitable(url))
  
+    def _build_template_url(self, url, channel_id):
+        return self._TEMPLATE_URL % channel_id
+
      def _real_extract(self, url):
          channel_id = self._match_id(url)
  
-        url = self._TEMPLATE_URL % channel_id
+        url = self._build_template_url(url, channel_id)
  
          # Channel by page listing is restricted to 35 pages of 30 items, i.e. 1050 videos total (see #5778)
          # Workaround by extracting as a playlist if managed to obtain channel playlist URL
@@ -1933,9 +2080,13 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
              channel_playlist_id = self._html_search_meta(
                  'channelId', channel_page, 'channel id', default=None)
              if not channel_playlist_id:
-                channel_playlist_id = self._search_regex(
-                    r'data-(?:channel-external-|yt)id="([^"]+)"',
-                    channel_page, 'channel id', default=None)
+                channel_url = self._html_search_meta(
+                    ('al:ios:url', 'twitter:app:url:iphone', 'twitter:app:url:ipad'),
+                    channel_page, 'channel url', default=None)
+                if channel_url:
+                    channel_playlist_id = self._search_regex(
+                        r'vnd\.youtube://user/([0-9A-Za-z_-]+)',
+                        channel_url, 'channel id', default=None)
          if channel_playlist_id and channel_playlist_id.startswith('UC'):
              playlist_id = 'UU' + channel_playlist_id[2:]
              return self.url_result(
@@ -1958,44 +2109,77 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
                  for video_id, video_title in self.extract_videos_from_page(channel_page)]
              return self.playlist_result(entries, channel_id)
  
+        try:
+            next(self._entries(channel_page, channel_id))
+        except StopIteration:
+            alert_message = self._html_search_regex(
+                r'(?s)<div[^>]+class=(["\']).*?\byt-alert-message\b.*?\1[^>]*>(?P<alert>[^<]+)</div>',
+                channel_page, 'alert', default=None, group='alert')
+            if alert_message:
+                raise ExtractorError('Youtube said: %s' % alert_message, expected=True)
+
          return self.playlist_result(self._entries(channel_page, channel_id), channel_id)
  
  
  class YoutubeUserIE(YoutubeChannelIE):
      IE_DESC = 'YouTube.com user videos (URL or "ytuser" keyword)'
-    _VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:user/)?(?!(?:attribution_link|watch|results)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
-    _TEMPLATE_URL = 'https://www.youtube.com/user/%s/videos'
+    _VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
+    _TEMPLATE_URL = 'https://www.youtube.com/%s/%s/videos'
      IE_NAME = 'youtube:user'
  
      _TESTS = [{
          'url': 'https://www.youtube.com/user/TheLinuxFoundation',
          'playlist_mincount': 320,
          'info_dict': {
-            'title': 'TheLinuxFoundation',
+            'id': 'UUfX55Sx5hEFjoC3cNs6mCUQ',
+            'title': 'Uploads from The Linux Foundation',
+        }
+    }, {
+        # Only available via https://www.youtube.com/c/12minuteathlete/videos
+        # but not https://www.youtube.com/user/12minuteathlete/videos
+        'url': 'https://www.youtube.com/c/12minuteathlete/videos',
+        'playlist_mincount': 249,
+        'info_dict': {
+            'id': 'UUVjM-zV6_opMDx7WYxnjZiQ',
+            'title': 'Uploads from 12 Minute Athlete',
          }
      }, {
          'url': 'ytuser:phihag',
          'only_matching': True,
+    }, {
+        'url': 'https://www.youtube.com/c/gametrailers',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.youtube.com/gametrailers',
+        'only_matching': True,
+    }, {
+        # This channel is not available.
+        'url': 'https://www.youtube.com/user/kananishinoSMEJ/videos',
+        'only_matching': True,
      }]
  
      @classmethod
      def suitable(cls, url):
          # Don't return True if the url can be extracted with other youtube
          # extractor, the regex would is too permissive and it would match.
-        other_ies = iter(klass for (name, klass) in globals().items() if name.endswith('IE') and klass is not cls)
-        if any(ie.suitable(url) for ie in other_ies):
+        other_yt_ies = iter(klass for (name, klass) in globals().items() if name.startswith('Youtube') and name.endswith('IE') and klass is not cls)
+        if any(ie.suitable(url) for ie in other_yt_ies):
              return False
          else:
              return super(YoutubeUserIE, cls).suitable(url)
  
+    def _build_template_url(self, url, channel_id):
+        mobj = re.match(self._VALID_URL, url)
+        return self._TEMPLATE_URL % (mobj.group('user') or 'user', mobj.group('id'))
+
  
  class YoutubeLiveIE(YoutubeBaseInfoExtractor):
      IE_DESC = 'YouTube.com live streams'
-    _VALID_URL = r'(?P<base_url>https?://(?:\w+\.)?youtube\.com/(?:user|channel)/(?P<id>[^/]+))/live'
+    _VALID_URL = r'(?P<base_url>https?://(?:\w+\.)?youtube\.com/(?:(?:user|channel|c)/)?(?P<id>[^/]+))/live'
      IE_NAME = 'youtube:live'
  
      _TESTS = [{
-        'url': 'http://www.youtube.com/user/TheYoungTurks/live',
+        'url': 'https://www.youtube.com/user/TheYoungTurks/live',
          'info_dict': {
              'id': 'a48o2S1cPoo',
              'ext': 'mp4',
@@ -2015,7 +2199,13 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
              'skip_download': True,
          },
      }, {
-        'url': 'http://www.youtube.com/channel/UC1yBKRuGpC1tSM73A0ZjYjQ/live',
+        'url': 'https://www.youtube.com/channel/UC1yBKRuGpC1tSM73A0ZjYjQ/live',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.youtube.com/c/CommanderVideoHq/live',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.youtube.com/TheYoungTurks/live',
          'only_matching': True,
      }]
  
@@ -2040,7 +2230,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
      IE_NAME = 'youtube:playlists'
  
      _TESTS = [{
-        'url': 'http://www.youtube.com/user/ThirstForScience/playlists',
+        'url': 'https://www.youtube.com/user/ThirstForScience/playlists',
          'playlist_mincount': 4,
          'info_dict': {
              'id': 'ThirstForScience',
@@ -2048,7 +2238,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
          },
      }, {
          # with "Load more" button
-        'url': 'http://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd',
+        'url': 'https://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd',
          'playlist_mincount': 70,
          'info_dict': {
              'id': 'igorkle1',
@@ -2116,10 +2306,11 @@ class YoutubeSearchDateIE(YoutubeSearchIE):
      _EXTRA_QUERY_ARGS = {'search_sort': 'video_date_uploaded'}
  
  
-class YoutubeSearchURLIE(InfoExtractor):
+class YoutubeSearchURLIE(YoutubePlaylistBaseInfoExtractor):
      IE_DESC = 'YouTube.com search URLs'
      IE_NAME = 'youtube:search_url'
      _VALID_URL = r'https?://(?:www\.)?youtube\.com/results\?(.*?&)?(?:search_query|q)=(?P<query>[^&]+)(?:[&]|$)'
+    _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(?:[^"]*"[^>]+\btitle="(?P<title>[^"]+))?'
      _TESTS = [{
          'url': 'https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video',
          'playlist_mincount': 5,
@@ -2134,37 +2325,13 @@ class YoutubeSearchURLIE(InfoExtractor):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          query = compat_urllib_parse_unquote_plus(mobj.group('query'))
-
          webpage = self._download_webpage(url, query)
-        result_code = self._search_regex(
-            r'(?s)<ol[^>]+class="item-section"(.*?)</ol>', webpage, 'result HTML')
-
-        part_codes = re.findall(
-            r'(?s)<h3[^>]+class="[^"]*yt-lockup-title[^"]*"[^>]*>(.*?)</h3>', result_code)
-        entries = []
-        for part_code in part_codes:
-            part_title = self._html_search_regex(
-                [r'(?s)title="([^"]+)"', r'>([^<]+)</a>'], part_code, 'item title', fatal=False)
-            part_url_snippet = self._html_search_regex(
-                r'(?s)href="([^"]+)"', part_code, 'item URL')
-            part_url = compat_urlparse.urljoin(
-                'https://www.youtube.com/', part_url_snippet)
-            entries.append({
-                '_type': 'url',
-                'url': part_url,
-                'title': part_title,
-            })
-
-        return {
-            '_type': 'playlist',
-            'entries': entries,
-            'title': query,
-        }
+        return self.playlist_result(self._process_page(webpage), playlist_title=query)
  
  
  class YoutubeShowIE(YoutubePlaylistsBaseInfoExtractor):
      IE_DESC = 'YouTube.com (multi-season) shows'
-    _VALID_URL = r'https?://www\.youtube\.com/show/(?P<id>[^?#]*)'
+    _VALID_URL = r'https?://(?:www\.)?youtube\.com/show/(?P<id>[^?#]*)'
      IE_NAME = 'youtube:show'
      _TESTS = [{
          'url': 'https://www.youtube.com/show/airdisasters',
@@ -2233,7 +2400,7 @@ class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
  class YoutubeWatchLaterIE(YoutubePlaylistIE):
      IE_NAME = 'youtube:watchlater'
      IE_DESC = 'Youtube watch later list, ":ytwatchlater" for short (requires authentication)'
-    _VALID_URL = r'https?://www\.youtube\.com/(?:feed/watch_later|(?:playlist|watch)\?(?:.+&)?list=WL)|:ytwatchlater'
+    _VALID_URL = r'https?://(?:www\.)?youtube\.com/(?:feed/watch_later|(?:playlist|watch)\?(?:.+&)?list=WL)|:ytwatchlater'
  
      _TESTS = [{
          'url': 'https://www.youtube.com/playlist?list=WL',
@@ -2244,16 +2411,17 @@ class YoutubeWatchLaterIE(YoutubePlaylistIE):
      }]
  
      def _real_extract(self, url):
-        video = self._check_download_just_video(url, 'WL')
+        _, video = self._check_download_just_video(url, 'WL')
          if video:
              return video
-        return self._extract_playlist('WL')
+        _, playlist = self._extract_playlist('WL')
+        return playlist
  
  
  class YoutubeFavouritesIE(YoutubeBaseInfoExtractor):
      IE_NAME = 'youtube:favorites'
      IE_DESC = 'YouTube.com favourite videos, ":ytfav" for short (requires authentication)'
-    _VALID_URL = r'https?://www\.youtube\.com/my_favorites|:ytfav(?:ou?rites)?'
+    _VALID_URL = r'https?://(?:www\.)?youtube\.com/my_favorites|:ytfav(?:ou?rites)?'
      _LOGIN_REQUIRED = True
  
      def _real_extract(self, url):
@@ -2264,21 +2432,21 @@ class YoutubeFavouritesIE(YoutubeBaseInfoExtractor):
  
  class YoutubeRecommendedIE(YoutubeFeedsInfoExtractor):
      IE_DESC = 'YouTube.com recommended videos, ":ytrec" for short (requires authentication)'
-    _VALID_URL = r'https?://www\.youtube\.com/feed/recommended|:ytrec(?:ommended)?'
+    _VALID_URL = r'https?://(?:www\.)?youtube\.com/feed/recommended|:ytrec(?:ommended)?'
      _FEED_NAME = 'recommended'
      _PLAYLIST_TITLE = 'Youtube Recommended videos'
  
  
  class YoutubeSubscriptionsIE(YoutubeFeedsInfoExtractor):
      IE_DESC = 'YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)'
-    _VALID_URL = r'https?://www\.youtube\.com/feed/subscriptions|:ytsubs(?:criptions)?'
+    _VALID_URL = r'https?://(?:www\.)?youtube\.com/feed/subscriptions|:ytsubs(?:criptions)?'
      _FEED_NAME = 'subscriptions'
      _PLAYLIST_TITLE = 'Youtube Subscriptions'
  
  
  class YoutubeHistoryIE(YoutubeFeedsInfoExtractor):
      IE_DESC = 'Youtube watch history, ":ythistory" for short (requires authentication)'
-    _VALID_URL = 'https?://www\.youtube\.com/feed/history|:ythistory'
+    _VALID_URL = r'https?://(?:www\.)?youtube\.com/feed/history|:ythistory'
      _FEED_NAME = 'history'
      _PLAYLIST_TITLE = 'Youtube History'
  
@@ -2303,10 +2471,10 @@ class YoutubeTruncatedURLIE(InfoExtractor):
      '''
  
      _TESTS = [{
-        'url': 'http://www.youtube.com/watch?annotation_id=annotation_3951667041',
+        'url': 'https://www.youtube.com/watch?annotation_id=annotation_3951667041',
          'only_matching': True,
      }, {
-        'url': 'http://www.youtube.com/watch?',
+        'url': 'https://www.youtube.com/watch?',
          'only_matching': True,
      }, {
          'url': 'https://www.youtube.com/watch?x-yt-cl=84503534',
@@ -2327,7 +2495,7 @@ class YoutubeTruncatedURLIE(InfoExtractor):
              'Did you forget to quote the URL? Remember that & is a meta '
              'character in most shells, so you want to put the URL in quotes, '
              'like  youtube-dl '
-            '"http://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" '
+            '"https://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" '
              ' or simply  youtube-dl BaW_jenozKc  .',
              expected=True)
  
diff --git a/youtube_dl/extractor/zdf.py b/youtube_dl/extractor/zdf.py

index 81c22a6270f99eb08efd5dc5cfb798a860c4c6c8..2ef17727592405b7bb20b378403d82470b52ce2f 100644 (file)
--- a/youtube_dl/extractor/zdf.py
+++ b/youtube_dl/extractor/zdf.py
@@ -85,6 +85,13 @@ class ZDFIE(InfoExtractor):
          uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader')
          uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id')
          upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date'))
+        subtitles = {}
+        captions_url = doc.find('.//caption/url')
+        if captions_url is not None:
+            subtitles['de'] = [{
+                'url': captions_url.text,
+                'ext': 'ttml',
+            }]
  
          def xml_to_thumbnails(fnode):
              thumbnails = []
@@ -190,6 +197,7 @@ class ZDFIE(InfoExtractor):
              'uploader_id': uploader_id,
              'upload_date': upload_date,
              'formats': formats,
+            'subtitles': subtitles,
          }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/zingmp3.py b/youtube_dl/extractor/zingmp3.py

index 437eecb6737161c9d730bf9a93eaed1bdb541799..0f0e9d0eb9b1ac945934b11a134d143d82b19fb0 100644 (file)
--- a/youtube_dl/extractor/zingmp3.py
+++ b/youtube_dl/extractor/zingmp3.py
@@ -1,16 +1,20 @@
-# coding=utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    update_url_query,
+)
  
  
  class ZingMp3BaseInfoExtractor(InfoExtractor):
  
-    def _extract_item(self, item, fatal=True):
-        error_message = item.find('./errormessage').text
+    def _extract_item(self, item, page_type, fatal=True):
+        error_message = item.get('msg')
          if error_message:
              if not fatal:
                  return
@@ -18,25 +22,48 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
                  '%s returned error: %s' % (self.IE_NAME, error_message),
                  expected=True)
  
-        title = item.find('./title').text.strip()
-        source = item.find('./source').text
-        extension = item.attrib['type']
-        thumbnail = item.find('./backimage').text
+        formats = []
+        for quality, source_url in zip(item.get('qualities') or item.get('quality', []), item.get('source_list') or item.get('source', [])):
+            if not source_url or source_url == 'require vip':
+                continue
+            if not re.match(r'https?://', source_url):
+                source_url = '//' + source_url
+            source_url = self._proto_relative_url(source_url, 'http:')
+            quality_num = int_or_none(quality)
+            f = {
+                'format_id': quality,
+                'url': source_url,
+            }
+            if page_type == 'video':
+                f.update({
+                    'height': quality_num,
+                    'ext': 'mp4',
+                })
+            else:
+                f.update({
+                    'abr': quality_num,
+                    'ext': 'mp3',
+                })
+            formats.append(f)
+
+        cover = item.get('cover')
  
          return {
-            'title': title,
-            'url': source,
-            'ext': extension,
-            'thumbnail': thumbnail,
+            'title': (item.get('name') or item.get('title')).strip(),
+            'formats': formats,
+            'thumbnail': 'http:/' + cover if cover else None,
+            'artist': item.get('artist'),
          }
  
-    def _extract_player_xml(self, player_xml_url, id, playlist_title=None):
-        player_xml = self._download_xml(player_xml_url, id, 'Downloading Player XML')
-        items = player_xml.findall('./item')
+    def _extract_player_json(self, player_json_url, id, page_type, playlist_title=None):
+        player_json = self._download_json(player_json_url, id, 'Downloading Player JSON')
+        items = player_json['data']
+        if 'item' in items:
+            items = items['item']
  
          if len(items) == 1:
              # one single song
-            data = self._extract_item(items[0])
+            data = self._extract_item(items[0], page_type)
              data['id'] = id
  
              return data
@@ -45,7 +72,7 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
              entries = []
  
              for i, item in enumerate(items, 1):
-                entry = self._extract_item(item, fatal=False)
+                entry = self._extract_item(item, page_type, fatal=False)
                  if not entry:
                      continue
                  entry['id'] = '%s-%d' % (id, i)
@@ -59,8 +86,8 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
              }
  
  
-class ZingMp3SongIE(ZingMp3BaseInfoExtractor):
-    _VALID_URL = r'https?://mp3\.zing\.vn/bai-hat/(?P<slug>[^/]+)/(?P<song_id>\w+)\.html'
+class ZingMp3IE(ZingMp3BaseInfoExtractor):
+    _VALID_URL = r'https?://mp3\.zing\.vn/(?:bai-hat|album|playlist|video-clip)/[^/]+/(?P<id>\w+)\.html'
      _TESTS = [{
          'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
          'md5': 'ead7ae13693b3205cbc89536a077daed',
@@ -70,51 +97,47 @@ class ZingMp3SongIE(ZingMp3BaseInfoExtractor):
              'ext': 'mp3',
              'thumbnail': 're:^https?://.*\.jpg$',
          },
-    }]
-    IE_NAME = 'zingmp3:song'
-    IE_DESC = 'mp3.zing.vn songs'
-
-    def _real_extract(self, url):
-        matched = re.match(self._VALID_URL, url)
-        slug = matched.group('slug')
-        song_id = matched.group('song_id')
-
-        webpage = self._download_webpage(
-            'http://mp3.zing.vn/bai-hat/%s/%s.html' % (slug, song_id), song_id)
-
-        player_xml_url = self._search_regex(
-            r'&amp;xmlURL=(?P<xml_url>[^&]+)&', webpage, 'player xml url')
-
-        return self._extract_player_xml(player_xml_url, song_id)
-
-
-class ZingMp3AlbumIE(ZingMp3BaseInfoExtractor):
-    _VALID_URL = r'https?://mp3\.zing\.vn/(?:album|playlist)/(?P<slug>[^/]+)/(?P<album_id>\w+)\.html'
-    _TESTS = [{
+    }, {
+        'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html',
+        'md5': '870295a9cd8045c0e15663565902618d',
+        'info_dict': {
+            'id': 'ZW6BAEA0',
+            'title': 'Let It Go (Frozen OST)',
+            'ext': 'mp4',
+        },
+    }, {
          'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
          'info_dict': {
              '_type': 'playlist',
              'id': 'ZWZBWDAF',
-            'title': 'Lâu Đài Tình Ái - Bằng Kiều ft. Minh Tuyết | Album 320 lossless',
+            'title': 'Lâu Đài Tình Ái - Bằng Kiều,Minh Tuyết | Album 320 lossless',
          },
          'playlist_count': 10,
+        'skip': 'removed at the request of the owner',
      }, {
          'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
          'only_matching': True,
      }]
-    IE_NAME = 'zingmp3:album'
-    IE_DESC = 'mp3.zing.vn albums'
+    IE_NAME = 'zingmp3'
+    IE_DESC = 'mp3.zing.vn'
  
      def _real_extract(self, url):
-        matched = re.match(self._VALID_URL, url)
-        slug = matched.group('slug')
-        album_id = matched.group('album_id')
-
-        webpage = self._download_webpage(
-            'http://mp3.zing.vn/album/%s/%s.html' % (slug, album_id), album_id)
-        player_xml_url = self._search_regex(
-            r'&amp;xmlURL=(?P<xml_url>[^&]+)&', webpage, 'player xml url')
-
-        return self._extract_player_xml(
-            player_xml_url, album_id,
-            playlist_title=self._og_search_title(webpage))
+        page_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, page_id)
+
+        player_json_url = self._search_regex([
+            r'data-xml="([^"]+)',
+            r'&amp;xmlURL=([^&]+)&'
+        ], webpage, 'player xml url')
+
+        playlist_title = None
+        page_type = self._search_regex(r'/(?:html5)?xml/([^/-]+)', player_json_url, 'page type')
+        if page_type == 'video':
+            player_json_url = update_url_query(player_json_url, {'format': 'json'})
+        else:
+            player_json_url = player_json_url.replace('/xml/', '/html5xml/')
+            if page_type == 'album':
+                playlist_title = self._og_search_title(webpage)
+
+        return self._extract_player_json(player_json_url, page_id, page_type, playlist_title)
diff --git a/youtube_dl/extractor/zippcast.py b/youtube_dl/extractor/zippcast.py

deleted file mode 100644 (file)

index de81937..0000000
--- a/youtube_dl/extractor/zippcast.py
+++ /dev/null
@@ -1,94 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    determine_ext,
-    str_to_int,
-)
-
-
-class ZippCastIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?zippcast\.com/(?:video/|videoview\.php\?.*\bvplay=)(?P<id>[0-9a-zA-Z]+)'
-    _TESTS = [{
-        # m3u8, hq direct link
-        'url': 'http://www.zippcast.com/video/c9cfd5c7e44dbc29c81',
-        'md5': '5ea0263b5606866c4d6cda0fc5e8c6b6',
-        'info_dict': {
-            'id': 'c9cfd5c7e44dbc29c81',
-            'ext': 'mp4',
-            'title': '[Vinesauce] Vinny - Digital Space Traveler',
-            'description': 'Muted on youtube, but now uploaded in it\'s original form.',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'uploader': 'vinesauce',
-            'view_count': int,
-            'categories': ['Entertainment'],
-            'tags': list,
-        },
-    }, {
-        # f4m, lq ipod direct link
-        'url': 'http://www.zippcast.com/video/b79c0a233e9c6581775',
-        'only_matching': True,
-    }, {
-        'url': 'http://www.zippcast.com/videoview.php?vplay=c9cfd5c7e44dbc29c81&auto=no',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(
-            'http://www.zippcast.com/video/%s' % video_id, video_id)
-
-        formats = []
-        video_url = self._search_regex(
-            r'<source[^>]+src=(["\'])(?P<url>.+?)\1', webpage,
-            'video url', default=None, group='url')
-        if video_url:
-            formats.append({
-                'url': video_url,
-                'format_id': 'http',
-                'preference': 0,  # direct link is almost always of worse quality
-            })
-        src_url = self._search_regex(
-            r'src\s*:\s*(?:escape\()?(["\'])(?P<url>http://.+?)\1',
-            webpage, 'src', default=None, group='url')
-        ext = determine_ext(src_url)
-        if ext == 'm3u8':
-            formats.extend(self._extract_m3u8_formats(
-                src_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                m3u8_id='hls', fatal=False))
-        elif ext == 'f4m':
-            formats.extend(self._extract_f4m_formats(
-                src_url, video_id, f4m_id='hds', fatal=False))
-        self._sort_formats(formats)
-
-        title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage) or self._html_search_meta(
-            'description', webpage)
-        uploader = self._search_regex(
-            r'<a[^>]+href="https?://[^/]+/profile/[^>]+>([^<]+)</a>',
-            webpage, 'uploader', fatal=False)
-        thumbnail = self._og_search_thumbnail(webpage)
-        view_count = str_to_int(self._search_regex(
-            r'>([\d,.]+) views!', webpage, 'view count', fatal=False))
-
-        categories = re.findall(
-            r'<a[^>]+href="https?://[^/]+/categories/[^"]+">([^<]+),?<',
-            webpage)
-        tags = re.findall(
-            r'<a[^>]+href="https?://[^/]+/search/tags/[^"]+">([^<]+),?<',
-            webpage)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'view_count': view_count,
-            'categories': categories,
-            'tags': tags,
-            'formats': formats,
-        }
diff --git a/youtube_dl/jsinterp.py b/youtube_dl/jsinterp.py

index a7440c58242079ea1c6874e1bed0abe756fdc814..a8df4aef0a2553222d45b9f38131a2945470d412 100644 (file)
--- a/youtube_dl/jsinterp.py
+++ b/youtube_dl/jsinterp.py
@@ -198,12 +198,12 @@ class JSInterpreter(object):
              return opfunc(x, y)
  
          m = re.match(
-            r'^(?P<func>%s)\((?P<args>[a-zA-Z0-9_$,]+)\)$' % _NAME_RE, expr)
+            r'^(?P<func>%s)\((?P<args>[a-zA-Z0-9_$,]*)\)$' % _NAME_RE, expr)
          if m:
              fname = m.group('func')
              argvals = tuple([
                  int(v) if v.isdigit() else local_vars[v]
-                for v in m.group('args').split(',')])
+                for v in m.group('args').split(',')]) if len(m.group('args')) > 0 else tuple()
              if fname not in self._functions:
                  self._functions[fname] = self.extract_function(fname)
              return self._functions[fname](argvals)
@@ -232,7 +232,7 @@ class JSInterpreter(object):
      def extract_function(self, funcname):
          func_m = re.search(
              r'''(?x)
-                (?:function\s+%s|[{;,]%s\s*=\s*function|var\s+%s\s*=\s*function)\s*
+                (?:function\s+%s|[{;,]\s*%s\s*=\s*function|var\s+%s\s*=\s*function)\s*
                  \((?P<args>[^)]*)\)\s*
                  \{(?P<code>[^}]+)\}''' % (
                  re.escape(funcname), re.escape(funcname), re.escape(funcname)),
diff --git a/youtube_dl/options.py b/youtube_dl/options.py

index 7819f14ab0b36786e06a432f15cb9e0429320288..53497fbc6f60a945b6350ce36e352a8eb6ef1f2c 100644 (file)
--- a/youtube_dl/options.py
+++ b/youtube_dl/options.py
@@ -2,6 +2,7 @@ from __future__ import unicode_literals
  
  import os.path
  import optparse
+import re
  import sys
  
  from .downloader.external import list_external_downloaders
@@ -26,9 +27,11 @@ def parseOpts(overrideArguments=None):
          except IOError:
              return default  # silently skip if file is not present
          try:
-            res = []
-            for l in optionf:
-                res += compat_shlex_split(l, comments=True)
+            # FIXME: https://github.com/rg3/youtube-dl/commit/dfe5fa49aed02cf36ba9f743b11b0903554b5e56
+            contents = optionf.read()
+            if sys.version_info < (3,):
+                contents = contents.decode(preferredencoding())
+            res = compat_shlex_split(contents, comments=True)
          finally:
              optionf.close()
          return res
@@ -91,8 +94,18 @@ def parseOpts(overrideArguments=None):
          setattr(parser.values, option.dest, value.split(','))
  
      def _hide_login_info(opts):
-        opts = list(opts)
-        for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
+        PRIVATE_OPTS = ['-p', '--password', '-u', '--username', '--video-password', '--ap-password', '--ap-username']
+        eqre = re.compile('^(?P<key>' + ('|'.join(re.escape(po) for po in PRIVATE_OPTS)) + ')=.+$')
+
+        def _scrub_eq(o):
+            m = eqre.match(o)
+            if m:
+                return m.group('key') + '=PRIVATE'
+            else:
+                return o
+
+        opts = list(map(_scrub_eq, opts))
+        for private_opt in PRIVATE_OPTS:
              try:
                  i = opts.index(private_opt)
                  opts[i + 1] = 'PRIVATE'
@@ -188,7 +201,10 @@ def parseOpts(overrideArguments=None):
      network.add_option(
          '--proxy', dest='proxy',
          default=None, metavar='URL',
-        help='Use the specified HTTP/HTTPS proxy. Pass in an empty string (--proxy "") for direct connection')
+        help='Use the specified HTTP/HTTPS/SOCKS proxy. To enable experimental '
+             'SOCKS proxy, specify a proper scheme. For example '
+             'socks5://127.0.0.1:1080/. Pass in an empty string (--proxy "") '
+             'for direct connection')
      network.add_option(
          '--socket-timeout',
          dest='socket_timeout', type=float, default=None, metavar='SECONDS',
@@ -208,11 +224,16 @@ def parseOpts(overrideArguments=None):
          action='store_const', const='::', dest='source_address',
          help='Make all connections via IPv6 (experimental)',
      )
+    network.add_option(
+        '--geo-verification-proxy',
+        dest='geo_verification_proxy', default=None, metavar='URL',
+        help='Use this proxy to verify the IP address for some geo-restricted sites. '
+        'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
+    )
      network.add_option(
          '--cn-verification-proxy',
          dest='cn_verification_proxy', default=None, metavar='URL',
-        help='Use this proxy to verify the IP address for some Chinese sites. '
-        'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
+        help=optparse.SUPPRESS_HELP,
      )
  
      selection = optparse.OptionGroup(parser, 'Video Selection')
@@ -330,6 +351,24 @@ def parseOpts(overrideArguments=None):
          dest='videopassword', metavar='PASSWORD',
          help='Video password (vimeo, smotri, youku)')
  
+    adobe_pass = optparse.OptionGroup(parser, 'Adobe Pass Options')
+    adobe_pass.add_option(
+        '--ap-mso',
+        dest='ap_mso', metavar='MSO',
+        help='Adobe Pass multiple-system operator (TV provider) identifier, use --ap-list-mso for a list of available MSOs')
+    adobe_pass.add_option(
+        '--ap-username',
+        dest='ap_username', metavar='USERNAME',
+        help='Multiple-system operator account login')
+    adobe_pass.add_option(
+        '--ap-password',
+        dest='ap_password', metavar='PASSWORD',
+        help='Multiple-system operator account password. If this option is left out, youtube-dl will ask interactively.')
+    adobe_pass.add_option(
+        '--ap-list-mso',
+        action='store_true', dest='ap_list_mso', default=False,
+        help='List all supported multiple-system operators')
+
      video_format = optparse.OptionGroup(parser, 'Video Format Options')
      video_format.add_option(
          '-f', '--format',
@@ -392,8 +431,8 @@ def parseOpts(overrideArguments=None):
  
      downloader = optparse.OptionGroup(parser, 'Download Options')
      downloader.add_option(
-        '-r', '--rate-limit',
-        dest='ratelimit', metavar='LIMIT',
+        '-r', '--limit-rate', '--rate-limit',
+        dest='ratelimit', metavar='RATE',
          help='Maximum download rate in bytes per second (e.g. 50K or 4.2M)')
      downloader.add_option(
          '-R', '--retries',
@@ -402,7 +441,15 @@ def parseOpts(overrideArguments=None):
      downloader.add_option(
          '--fragment-retries',
          dest='fragment_retries', metavar='RETRIES', default=10,
-        help='Number of retries for a fragment (default is %default), or "infinite" (DASH only)')
+        help='Number of retries for a fragment (default is %default), or "infinite" (DASH and hlsnative only)')
+    downloader.add_option(
+        '--skip-unavailable-fragments',
+        action='store_true', dest='skip_unavailable_fragments', default=True,
+        help='Skip unavailable fragments (DASH and hlsnative only)')
+    general.add_option(
+        '--abort-on-unavailable-fragment',
+        action='store_false', dest='skip_unavailable_fragments',
+        help='Abort downloading when some fragment is not available')
      downloader.add_option(
          '--buffer-size',
          dest='buffersize', metavar='SIZE', default='1024',
@@ -425,8 +472,12 @@ def parseOpts(overrideArguments=None):
          help='Set file xattribute ytdl.filesize with expected filesize (experimental)')
      downloader.add_option(
          '--hls-prefer-native',
-        dest='hls_prefer_native', action='store_true',
-        help='Use the native HLS downloader instead of ffmpeg (experimental)')
+        dest='hls_prefer_native', action='store_true', default=None,
+        help='Use the native HLS downloader instead of ffmpeg')
+    downloader.add_option(
+        '--hls-prefer-ffmpeg',
+        dest='hls_prefer_native', action='store_false', default=None,
+        help='Use ffmpeg instead of the native HLS downloader')
      downloader.add_option(
          '--hls-use-mpegts',
          dest='hls_use_mpegts', action='store_true',
@@ -474,9 +525,20 @@ def parseOpts(overrideArguments=None):
          dest='bidi_workaround', action='store_true',
          help='Work around terminals that lack bidirectional text support. Requires bidiv or fribidi executable in PATH')
      workarounds.add_option(
-        '--sleep-interval', metavar='SECONDS',
+        '--sleep-interval', '--min-sleep-interval', metavar='SECONDS',
          dest='sleep_interval', type=float,
-        help='Number of seconds to sleep before each download.')
+        help=(
+            'Number of seconds to sleep before each download when used alone '
+            'or a lower bound of a range for randomized sleep before each download '
+            '(minimum possible number of seconds to sleep) when used along with '
+            '--max-sleep-interval.'))
+    workarounds.add_option(
+        '--max-sleep-interval', metavar='SECONDS',
+        dest='max_sleep_interval', type=float,
+        help=(
+            'Upper bound of a range for randomized sleep before each download '
+            '(maximum possible number of seconds to sleep). Must only be used '
+            'along with --min-sleep-interval.'))
  
      verbosity = optparse.OptionGroup(parser, 'Verbosity / Simulation Options')
      verbosity.add_option(
@@ -592,22 +654,7 @@ def parseOpts(overrideArguments=None):
      filesystem.add_option(
          '-o', '--output',
          dest='outtmpl', metavar='TEMPLATE',
-        help=('Output filename template. Use %(title)s to get the title, '
-              '%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, '
-              '%(autonumber)s to get an automatically incremented number, '
-              '%(ext)s for the filename extension, '
-              '%(format)s for the format description (like "22 - 1280x720" or "HD"), '
-              '%(format_id)s for the unique id of the format (like YouTube\'s itags: "137"), '
-              '%(upload_date)s for the upload date (YYYYMMDD), '
-              '%(extractor)s for the provider (youtube, metacafe, etc), '
-              '%(id)s for the video id, '
-              '%(playlist_title)s, %(playlist_id)s, or %(playlist)s (=title if present, ID otherwise) for the playlist the video is in, '
-              '%(playlist_index)s for the position in the playlist. '
-              '%(height)s and %(width)s for the width and height of the video format. '
-              '%(resolution)s for a textual description of the resolution of the video format. '
-              '%% for a literal percent. '
-              'Use - to output to stdout. Can also be used to download to a different directory, '
-              'for example with -o \'/my/downloads/%(uploader)s/%(title)s-%(id)s.%(ext)s\' .'))
+        help=('Output filename template, see the "OUTPUT TEMPLATE" for all the info'))
      filesystem.add_option(
          '--autonumber-size',
          dest='autonumber_size', metavar='NUMBER',
@@ -661,7 +708,7 @@ def parseOpts(overrideArguments=None):
          action='store_true', dest='writeannotations', default=False,
          help='Write video annotations to a .annotations.xml file')
      filesystem.add_option(
-        '--load-info',
+        '--load-info-json', '--load-info',
          dest='load_info_filename', metavar='FILE',
          help='JSON file containing the video information (created with the "--write-info-json" option)')
      filesystem.add_option(
@@ -784,6 +831,7 @@ def parseOpts(overrideArguments=None):
      parser.add_option_group(video_format)
      parser.add_option_group(subtitles)
      parser.add_option_group(authentication)
+    parser.add_option_group(adobe_pass)
      parser.add_option_group(postproc)
  
      if overrideArguments is not None:
@@ -802,11 +850,11 @@ def parseOpts(overrideArguments=None):
              system_conf = []
              user_conf = []
          else:
-            system_conf = compat_conf(_readOptions('/etc/youtube-dl.conf'))
+            system_conf = _readOptions('/etc/youtube-dl.conf')
              if '--ignore-config' in system_conf:
                  user_conf = []
              else:
-                user_conf = compat_conf(_readUserConf())
+                user_conf = _readUserConf()
          argv = system_conf + user_conf + command_line_conf
  
          opts, args = parser.parse_args(argv)
diff --git a/youtube_dl/postprocessor/embedthumbnail.py b/youtube_dl/postprocessor/embedthumbnail.py

index 3bad5a266b6d51aaf0c92224a94986957da230f2..e606a58de886533fb5239b9bb958fbff9606a4ee 100644 (file)
--- a/youtube_dl/postprocessor/embedthumbnail.py
+++ b/youtube_dl/postprocessor/embedthumbnail.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  
@@ -40,7 +40,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
                  'Skipping embedding the thumbnail because the file is missing.')
              return [], info
  
-        if info['ext'] in ('mp3', 'mkv'):
+        if info['ext'] == 'mp3':
              options = [
                  '-c', 'copy', '-map', '0', '-map', '1',
                  '-metadata:s:v', 'title="Album cover"', '-metadata:s:v', 'comment="Cover (Front)"']
diff --git a/youtube_dl/postprocessor/execafterdownload.py b/youtube_dl/postprocessor/execafterdownload.py

index 74f66d669c0679a9eece06b1924ecc9f5dae00d2..90630c2d7391de9fd288662c8f207433702f8c99 100644 (file)
--- a/youtube_dl/postprocessor/execafterdownload.py
+++ b/youtube_dl/postprocessor/execafterdownload.py
@@ -3,7 +3,7 @@ from __future__ import unicode_literals
  import subprocess
  
  from .common import PostProcessor
-from ..compat import shlex_quote
+from ..compat import compat_shlex_quote
  from ..utils import PostProcessingError
  
  
@@ -17,7 +17,7 @@ class ExecAfterDownloadPP(PostProcessor):
          if '{}' not in cmd:
              cmd += ' {}'
  
-        cmd = cmd.replace('{}', shlex_quote(information['filepath']))
+        cmd = cmd.replace('{}', compat_shlex_quote(information['filepath']))
  
          self._downloader.to_screen('[exec] Executing command: %s' % cmd)
          retCode = subprocess.call(cmd, shell=True)
diff --git a/youtube_dl/postprocessor/ffmpeg.py b/youtube_dl/postprocessor/ffmpeg.py

index 06b8c05482013c1c521267c7d4406c0124f0c972..1881f4849e23c749d51da2e45d655ed4e6a68314 100644 (file)
--- a/youtube_dl/postprocessor/ffmpeg.py
+++ b/youtube_dl/postprocessor/ffmpeg.py
@@ -139,6 +139,30 @@ class FFmpegPostProcessor(PostProcessor):
      def probe_executable(self):
          return self._paths[self.probe_basename]
  
+    def get_audio_codec(self, path):
+        if not self.probe_available:
+            raise PostProcessingError('ffprobe or avprobe not found. Please install one.')
+        try:
+            cmd = [
+                encodeFilename(self.probe_executable, True),
+                encodeArgument('-show_streams'),
+                encodeFilename(self._ffmpeg_filename_argument(path), True)]
+            if self._downloader.params.get('verbose', False):
+                self._downloader.to_screen('[debug] %s command line: %s' % (self.basename, shell_quote(cmd)))
+            handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE, stdin=subprocess.PIPE)
+            output = handle.communicate()[0]
+            if handle.wait() != 0:
+                return None
+        except (IOError, OSError):
+            return None
+        audio_codec = None
+        for line in output.decode('ascii', 'ignore').split('\n'):
+            if line.startswith('codec_name='):
+                audio_codec = line.split('=')[1].strip()
+            elif line.strip() == 'codec_type=audio' and audio_codec is not None:
+                return audio_codec
+        return None
+
      def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
          self.check_version()
  
@@ -175,7 +199,8 @@ class FFmpegPostProcessor(PostProcessor):
          # Always use 'file:' because the filename may contain ':' (ffmpeg
          # interprets that as a protocol) or can start with '-' (-- is broken in
          # ffmpeg, see https://ffmpeg.org/trac/ffmpeg/ticket/2127 for details)
-        return 'file:' + fn
+        # Also leave '-' intact in order not to break streaming to stdout.
+        return 'file:' + fn if fn != '-' else fn
  
  
  class FFmpegExtractAudioPP(FFmpegPostProcessor):
@@ -187,31 +212,6 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
          self._preferredquality = preferredquality
          self._nopostoverwrites = nopostoverwrites
  
-    def get_audio_codec(self, path):
-
-        if not self.probe_available:
-            raise PostProcessingError('ffprobe or avprobe not found. Please install one.')
-        try:
-            cmd = [
-                encodeFilename(self.probe_executable, True),
-                encodeArgument('-show_streams'),
-                encodeFilename(self._ffmpeg_filename_argument(path), True)]
-            if self._downloader.params.get('verbose', False):
-                self._downloader.to_screen('[debug] %s command line: %s' % (self.basename, shell_quote(cmd)))
-            handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE, stdin=subprocess.PIPE)
-            output = handle.communicate()[0]
-            if handle.wait() != 0:
-                return None
-        except (IOError, OSError):
-            return None
-        audio_codec = None
-        for line in output.decode('ascii', 'ignore').split('\n'):
-            if line.startswith('codec_name='):
-                audio_codec = line.split('=')[1].strip()
-            elif line.strip() == 'codec_type=audio' and audio_codec is not None:
-                return audio_codec
-        return None
-
      def run_ffmpeg(self, path, out_path, codec, more_opts):
          if codec is None:
              acodec_opts = []
@@ -279,6 +279,9 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
          prefix, sep, ext = path.rpartition('.')  # not os.path.splitext, since the latter does not work on unicode in all setups
          new_path = prefix + sep + extension
  
+        information['filepath'] = new_path
+        information['ext'] = extension
+
          # If we download foo.mp3 and convert it to... foo.mp3, then don't delete foo.mp3, silly.
          if (new_path == path or
                  (self._nopostoverwrites and os.path.exists(encodeFilename(new_path)))):
@@ -300,9 +303,6 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
                  new_path, time.time(), information['filetime'],
                  errnote='Cannot update utime of audio file')
  
-        information['filepath'] = new_path
-        information['ext'] = extension
-
          return [path], information
  
  
@@ -388,23 +388,30 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
  class FFmpegMetadataPP(FFmpegPostProcessor):
      def run(self, info):
          metadata = {}
-        if info.get('title') is not None:
-            metadata['title'] = info['title']
-        if info.get('upload_date') is not None:
-            metadata['date'] = info['upload_date']
-        if info.get('artist') is not None:
-            metadata['artist'] = info['artist']
-        elif info.get('uploader') is not None:
-            metadata['artist'] = info['uploader']
-        elif info.get('uploader_id') is not None:
-            metadata['artist'] = info['uploader_id']
-        if info.get('description') is not None:
-            metadata['description'] = info['description']
-            metadata['comment'] = info['description']
-        if info.get('webpage_url') is not None:
-            metadata['purl'] = info['webpage_url']
-        if info.get('album') is not None:
-            metadata['album'] = info['album']
+
+        def add(meta_list, info_list=None):
+            if not info_list:
+                info_list = meta_list
+            if not isinstance(meta_list, (list, tuple)):
+                meta_list = (meta_list,)
+            if not isinstance(info_list, (list, tuple)):
+                info_list = (info_list,)
+            for info_f in info_list:
+                if info.get(info_f) is not None:
+                    for meta_f in meta_list:
+                        metadata[meta_f] = info[info_f]
+                    break
+
+        add('title', ('track', 'title'))
+        add('date', 'upload_date')
+        add(('description', 'comment'), 'description')
+        add('purl', 'webpage_url')
+        add('track', 'track_number')
+        add('artist', ('artist', 'creator', 'uploader', 'uploader_id'))
+        add('genre')
+        add('album')
+        add('album_artist')
+        add('disc', 'disc_number')
  
          if not metadata:
              self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add')
@@ -496,15 +503,15 @@ class FFmpegFixupM4aPP(FFmpegPostProcessor):
  class FFmpegFixupM3u8PP(FFmpegPostProcessor):
      def run(self, info):
          filename = info['filepath']
-        temp_filename = prepend_extension(filename, 'temp')
+        if self.get_audio_codec(filename) == 'aac':
+            temp_filename = prepend_extension(filename, 'temp')
  
-        options = ['-c', 'copy', '-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
-        self._downloader.to_screen('[ffmpeg] Fixing malformated aac bitstream in "%s"' % filename)
-        self.run_ffmpeg(filename, temp_filename, options)
-
-        os.remove(encodeFilename(filename))
-        os.rename(encodeFilename(temp_filename), encodeFilename(filename))
+            options = ['-c', 'copy', '-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
+            self._downloader.to_screen('[ffmpeg] Fixing malformated aac bitstream in "%s"' % filename)
+            self.run_ffmpeg(filename, temp_filename, options)
  
+            os.remove(encodeFilename(filename))
+            os.rename(encodeFilename(temp_filename), encodeFilename(filename))
          return [], info
  
  
@@ -536,7 +543,7 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
              sub_filenames.append(old_file)
              new_file = subtitles_filename(filename, lang, new_ext)
  
-            if ext == 'dfxp' or ext == 'ttml':
+            if ext == 'dfxp' or ext == 'ttml' or ext == 'tt':
                  self._downloader.report_warning(
                      'You have requested to convert dfxp (TTML) subtitles into another format, '
                      'which results in style information loss')
diff --git a/youtube_dl/postprocessor/metadatafromtitle.py b/youtube_dl/postprocessor/metadatafromtitle.py

index 42377fa0f0bde0d3fa6ae578c6eaa0fe4a73474d..920573da9d8f472b8fdd8681cab0be1c6331afb7 100644 (file)
--- a/youtube_dl/postprocessor/metadatafromtitle.py
+++ b/youtube_dl/postprocessor/metadatafromtitle.py
@@ -3,11 +3,6 @@ from __future__ import unicode_literals
  import re
  
  from .common import PostProcessor
-from ..utils import PostProcessingError
-
-
-class MetadataFromTitlePPError(PostProcessingError):
-    pass
  
  
  class MetadataFromTitlePP(PostProcessor):
@@ -38,7 +33,8 @@ class MetadataFromTitlePP(PostProcessor):
          title = info['title']
          match = re.match(self._titleregex, title)
          if match is None:
-            raise MetadataFromTitlePPError('Could not interpret title of video as "%s"' % self._titleformat)
+            self._downloader.to_screen('[fromtitle] Could not interpret title of video as "%s"' % self._titleformat)
+            return [], info
          for attribute, value in match.groupdict().items():
              value = match.group(attribute)
              info[attribute] = value
diff --git a/youtube_dl/postprocessor/xattrpp.py b/youtube_dl/postprocessor/xattrpp.py

index e39ca60aa08326b6f05814ff800bb09c75755e48..fbdfa02acc88ff8ba82684a2e5545aebe3fce5da 100644 (file)
--- a/youtube_dl/postprocessor/xattrpp.py
+++ b/youtube_dl/postprocessor/xattrpp.py
@@ -1,37 +1,15 @@
  from __future__ import unicode_literals
  
-import os
-import subprocess
-import sys
-import errno
-
  from .common import PostProcessor
  from ..compat import compat_os_name
  from ..utils import (
-    check_executable,
      hyphenate_date,
-    version_tuple,
-    PostProcessingError,
-    encodeArgument,
-    encodeFilename,
+    write_xattr,
+    XAttrMetadataError,
+    XAttrUnavailableError,
  )
  
  
-class XAttrMetadataError(PostProcessingError):
-    def __init__(self, code=None, msg='Unknown error'):
-        super(XAttrMetadataError, self).__init__(msg)
-        self.code = code
-
-        # Parsing code and msg
-        if (self.code in (errno.ENOSPC, errno.EDQUOT) or
-                'No space left' in self.msg or 'Disk quota excedded' in self.msg):
-            self.reason = 'NO_SPACE'
-        elif self.code == errno.E2BIG or 'Argument list too long' in self.msg:
-            self.reason = 'VALUE_TOO_LONG'
-        else:
-            self.reason = 'NOT_SUPPORTED'
-
-
  class XAttrMetadataPP(PostProcessor):
  
      #
@@ -48,88 +26,6 @@ class XAttrMetadataPP(PostProcessor):
      def run(self, info):
          """ Set extended attributes on downloaded file (if xattr support is found). """
  
-        # This mess below finds the best xattr tool for the job and creates a
-        # "write_xattr" function.
-        try:
-            # try the pyxattr module...
-            import xattr
-
-            # Unicode arguments are not supported in python-pyxattr until
-            # version 0.5.0
-            # See https://github.com/rg3/youtube-dl/issues/5498
-            pyxattr_required_version = '0.5.0'
-            if version_tuple(xattr.__version__) < version_tuple(pyxattr_required_version):
-                self._downloader.report_warning(
-                    'python-pyxattr is detected but is too old. '
-                    'youtube-dl requires %s or above while your version is %s. '
-                    'Falling back to other xattr implementations' % (
-                        pyxattr_required_version, xattr.__version__))
-
-                raise ImportError
-
-            def write_xattr(path, key, value):
-                try:
-                    xattr.set(path, key, value)
-                except EnvironmentError as e:
-                    raise XAttrMetadataError(e.errno, e.strerror)
-
-        except ImportError:
-            if compat_os_name == 'nt':
-                # Write xattrs to NTFS Alternate Data Streams:
-                # http://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.28ADS.29
-                def write_xattr(path, key, value):
-                    assert ':' not in key
-                    assert os.path.exists(path)
-
-                    ads_fn = path + ':' + key
-                    try:
-                        with open(ads_fn, 'wb') as f:
-                            f.write(value)
-                    except EnvironmentError as e:
-                        raise XAttrMetadataError(e.errno, e.strerror)
-            else:
-                user_has_setfattr = check_executable('setfattr', ['--version'])
-                user_has_xattr = check_executable('xattr', ['-h'])
-
-                if user_has_setfattr or user_has_xattr:
-
-                    def write_xattr(path, key, value):
-                        value = value.decode('utf-8')
-                        if user_has_setfattr:
-                            executable = 'setfattr'
-                            opts = ['-n', key, '-v', value]
-                        elif user_has_xattr:
-                            executable = 'xattr'
-                            opts = ['-w', key, value]
-
-                        cmd = ([encodeFilename(executable, True)] +
-                               [encodeArgument(o) for o in opts] +
-                               [encodeFilename(path, True)])
-
-                        try:
-                            p = subprocess.Popen(
-                                cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
-                        except EnvironmentError as e:
-                            raise XAttrMetadataError(e.errno, e.strerror)
-                        stdout, stderr = p.communicate()
-                        stderr = stderr.decode('utf-8', 'replace')
-                        if p.returncode != 0:
-                            raise XAttrMetadataError(p.returncode, stderr)
-
-                else:
-                    # On Unix, and can't find pyxattr, setfattr, or xattr.
-                    if sys.platform.startswith('linux'):
-                        self._downloader.report_error(
-                            "Couldn't find a tool to set the xattrs. "
-                            "Install either the python 'pyxattr' or 'xattr' "
-                            "modules, or the GNU 'attr' package "
-                            "(which contains the 'setfattr' tool).")
-                    else:
-                        self._downloader.report_error(
-                            "Couldn't find a tool to set the xattrs. "
-                            "Install either the python 'xattr' module, "
-                            "or the 'xattr' binary.")
-
          # Write the metadata to the file's xattrs
          self._downloader.to_screen('[metadata] Writing metadata to file\'s xattrs')
  
@@ -159,6 +55,10 @@ class XAttrMetadataPP(PostProcessor):
  
              return [], info
  
+        except XAttrUnavailableError as e:
+            self._downloader.report_error(str(e))
+            return [], info
+
          except XAttrMetadataError as e:
              if e.reason == 'NO_SPACE':
                  self._downloader.report_warning(
diff --git a/youtube_dl/socks.py b/youtube_dl/socks.py

new file mode 100644 (file)

index 0000000..63d19b3
--- /dev/null
+++ b/youtube_dl/socks.py
@@ -0,0 +1,272 @@
+# Public Domain SOCKS proxy protocol implementation
+# Adapted from https://gist.github.com/bluec0re/cafd3764412967417fd3
+
+from __future__ import unicode_literals
+
+# References:
+# SOCKS4 protocol http://www.openssh.com/txt/socks4.protocol
+# SOCKS4A protocol http://www.openssh.com/txt/socks4a.protocol
+# SOCKS5 protocol https://tools.ietf.org/html/rfc1928
+# SOCKS5 username/password authentication https://tools.ietf.org/html/rfc1929
+
+import collections
+import socket
+
+from .compat import (
+    compat_ord,
+    compat_struct_pack,
+    compat_struct_unpack,
+)
+
+__author__ = 'Timo Schmid <coding@timoschmid.de>'
+
+SOCKS4_VERSION = 4
+SOCKS4_REPLY_VERSION = 0x00
+# Excerpt from SOCKS4A protocol:
+# if the client cannot resolve the destination host's domain name to find its
+# IP address, it should set the first three bytes of DSTIP to NULL and the last
+# byte to a non-zero value.
+SOCKS4_DEFAULT_DSTIP = compat_struct_pack('!BBBB', 0, 0, 0, 0xFF)
+
+SOCKS5_VERSION = 5
+SOCKS5_USER_AUTH_VERSION = 0x01
+SOCKS5_USER_AUTH_SUCCESS = 0x00
+
+
+class Socks4Command(object):
+    CMD_CONNECT = 0x01
+    CMD_BIND = 0x02
+
+
+class Socks5Command(Socks4Command):
+    CMD_UDP_ASSOCIATE = 0x03
+
+
+class Socks5Auth(object):
+    AUTH_NONE = 0x00
+    AUTH_GSSAPI = 0x01
+    AUTH_USER_PASS = 0x02
+    AUTH_NO_ACCEPTABLE = 0xFF  # For server response
+
+
+class Socks5AddressType(object):
+    ATYP_IPV4 = 0x01
+    ATYP_DOMAINNAME = 0x03
+    ATYP_IPV6 = 0x04
+
+
+class ProxyError(IOError):
+    ERR_SUCCESS = 0x00
+
+    def __init__(self, code=None, msg=None):
+        if code is not None and msg is None:
+            msg = self.CODES.get(code) and 'unknown error'
+        super(ProxyError, self).__init__(code, msg)
+
+
+class InvalidVersionError(ProxyError):
+    def __init__(self, expected_version, got_version):
+        msg = ('Invalid response version from server. Expected {0:02x} got '
+               '{1:02x}'.format(expected_version, got_version))
+        super(InvalidVersionError, self).__init__(0, msg)
+
+
+class Socks4Error(ProxyError):
+    ERR_SUCCESS = 90
+
+    CODES = {
+        91: 'request rejected or failed',
+        92: 'request rejected because SOCKS server cannot connect to identd on the client',
+        93: 'request rejected because the client program and identd report different user-ids'
+    }
+
+
+class Socks5Error(ProxyError):
+    ERR_GENERAL_FAILURE = 0x01
+
+    CODES = {
+        0x01: 'general SOCKS server failure',
+        0x02: 'connection not allowed by ruleset',
+        0x03: 'Network unreachable',
+        0x04: 'Host unreachable',
+        0x05: 'Connection refused',
+        0x06: 'TTL expired',
+        0x07: 'Command not supported',
+        0x08: 'Address type not supported',
+        0xFE: 'unknown username or invalid password',
+        0xFF: 'all offered authentication methods were rejected'
+    }
+
+
+class ProxyType(object):
+    SOCKS4 = 0
+    SOCKS4A = 1
+    SOCKS5 = 2
+
+
+Proxy = collections.namedtuple('Proxy', (
+    'type', 'host', 'port', 'username', 'password', 'remote_dns'))
+
+
+class sockssocket(socket.socket):
+    def __init__(self, *args, **kwargs):
+        self._proxy = None
+        super(sockssocket, self).__init__(*args, **kwargs)
+
+    def setproxy(self, proxytype, addr, port, rdns=True, username=None, password=None):
+        assert proxytype in (ProxyType.SOCKS4, ProxyType.SOCKS4A, ProxyType.SOCKS5)
+
+        self._proxy = Proxy(proxytype, addr, port, username, password, rdns)
+
+    def recvall(self, cnt):
+        data = b''
+        while len(data) < cnt:
+            cur = self.recv(cnt - len(data))
+            if not cur:
+                raise IOError('{0} bytes missing'.format(cnt - len(data)))
+            data += cur
+        return data
+
+    def _recv_bytes(self, cnt):
+        data = self.recvall(cnt)
+        return compat_struct_unpack('!{0}B'.format(cnt), data)
+
+    @staticmethod
+    def _len_and_data(data):
+        return compat_struct_pack('!B', len(data)) + data
+
+    def _check_response_version(self, expected_version, got_version):
+        if got_version != expected_version:
+            self.close()
+            raise InvalidVersionError(expected_version, got_version)
+
+    def _resolve_address(self, destaddr, default, use_remote_dns):
+        try:
+            return socket.inet_aton(destaddr)
+        except socket.error:
+            if use_remote_dns and self._proxy.remote_dns:
+                return default
+            else:
+                return socket.inet_aton(socket.gethostbyname(destaddr))
+
+    def _setup_socks4(self, address, is_4a=False):
+        destaddr, port = address
+
+        ipaddr = self._resolve_address(destaddr, SOCKS4_DEFAULT_DSTIP, use_remote_dns=is_4a)
+
+        packet = compat_struct_pack('!BBH', SOCKS4_VERSION, Socks4Command.CMD_CONNECT, port) + ipaddr
+
+        username = (self._proxy.username or '').encode('utf-8')
+        packet += username + b'\x00'
+
+        if is_4a and self._proxy.remote_dns:
+            packet += destaddr.encode('utf-8') + b'\x00'
+
+        self.sendall(packet)
+
+        version, resp_code, dstport, dsthost = compat_struct_unpack('!BBHI', self.recvall(8))
+
+        self._check_response_version(SOCKS4_REPLY_VERSION, version)
+
+        if resp_code != Socks4Error.ERR_SUCCESS:
+            self.close()
+            raise Socks4Error(resp_code)
+
+        return (dsthost, dstport)
+
+    def _setup_socks4a(self, address):
+        self._setup_socks4(address, is_4a=True)
+
+    def _socks5_auth(self):
+        packet = compat_struct_pack('!B', SOCKS5_VERSION)
+
+        auth_methods = [Socks5Auth.AUTH_NONE]
+        if self._proxy.username and self._proxy.password:
+            auth_methods.append(Socks5Auth.AUTH_USER_PASS)
+
+        packet += compat_struct_pack('!B', len(auth_methods))
+        packet += compat_struct_pack('!{0}B'.format(len(auth_methods)), *auth_methods)
+
+        self.sendall(packet)
+
+        version, method = self._recv_bytes(2)
+
+        self._check_response_version(SOCKS5_VERSION, version)
+
+        if method == Socks5Auth.AUTH_NO_ACCEPTABLE:
+            self.close()
+            raise Socks5Error(method)
+
+        if method == Socks5Auth.AUTH_USER_PASS:
+            username = self._proxy.username.encode('utf-8')
+            password = self._proxy.password.encode('utf-8')
+            packet = compat_struct_pack('!B', SOCKS5_USER_AUTH_VERSION)
+            packet += self._len_and_data(username) + self._len_and_data(password)
+            self.sendall(packet)
+
+            version, status = self._recv_bytes(2)
+
+            self._check_response_version(SOCKS5_USER_AUTH_VERSION, version)
+
+            if status != SOCKS5_USER_AUTH_SUCCESS:
+                self.close()
+                raise Socks5Error(Socks5Error.ERR_GENERAL_FAILURE)
+
+    def _setup_socks5(self, address):
+        destaddr, port = address
+
+        ipaddr = self._resolve_address(destaddr, None, use_remote_dns=True)
+
+        self._socks5_auth()
+
+        reserved = 0
+        packet = compat_struct_pack('!BBB', SOCKS5_VERSION, Socks5Command.CMD_CONNECT, reserved)
+        if ipaddr is None:
+            destaddr = destaddr.encode('utf-8')
+            packet += compat_struct_pack('!B', Socks5AddressType.ATYP_DOMAINNAME)
+            packet += self._len_and_data(destaddr)
+        else:
+            packet += compat_struct_pack('!B', Socks5AddressType.ATYP_IPV4) + ipaddr
+        packet += compat_struct_pack('!H', port)
+
+        self.sendall(packet)
+
+        version, status, reserved, atype = self._recv_bytes(4)
+
+        self._check_response_version(SOCKS5_VERSION, version)
+
+        if status != Socks5Error.ERR_SUCCESS:
+            self.close()
+            raise Socks5Error(status)
+
+        if atype == Socks5AddressType.ATYP_IPV4:
+            destaddr = self.recvall(4)
+        elif atype == Socks5AddressType.ATYP_DOMAINNAME:
+            alen = compat_ord(self.recv(1))
+            destaddr = self.recvall(alen)
+        elif atype == Socks5AddressType.ATYP_IPV6:
+            destaddr = self.recvall(16)
+        destport = compat_struct_unpack('!H', self.recvall(2))[0]
+
+        return (destaddr, destport)
+
+    def _make_proxy(self, connect_func, address):
+        if not self._proxy:
+            return connect_func(self, address)
+
+        result = connect_func(self, (self._proxy.host, self._proxy.port))
+        if result != 0 and result is not None:
+            return result
+        setup_funcs = {
+            ProxyType.SOCKS4: self._setup_socks4,
+            ProxyType.SOCKS4A: self._setup_socks4a,
+            ProxyType.SOCKS5: self._setup_socks5,
+        }
+        setup_funcs[self._proxy.type](address)
+        return result
+
+    def connect(self, address):
+        self._make_proxy(socket.socket.connect, address)
+
+    def connect_ex(self, address):
+        return self._make_proxy(socket.socket.connect_ex, address)
diff --git a/youtube_dl/swfinterp.py b/youtube_dl/swfinterp.py

index 06c1d6cc1755ef022aa78967d4b651e21fd66618..0c71585753134e93fba8d8de5cee003d31f050c9 100644 (file)
--- a/youtube_dl/swfinterp.py
+++ b/youtube_dl/swfinterp.py
@@ -4,10 +4,12 @@ import collections
  import io
  import zlib
  
-from .compat import compat_str
+from .compat import (
+    compat_str,
+    compat_struct_unpack,
+)
  from .utils import (
      ExtractorError,
-    struct_unpack,
  )
  
  
@@ -23,17 +25,17 @@ def _extract_tags(file_contents):
              file_contents[:1])
  
      # Determine number of bits in framesize rectangle
-    framesize_nbits = struct_unpack('!B', content[:1])[0] >> 3
+    framesize_nbits = compat_struct_unpack('!B', content[:1])[0] >> 3
      framesize_len = (5 + 4 * framesize_nbits + 7) // 8
  
      pos = framesize_len + 2 + 2
      while pos < len(content):
-        header16 = struct_unpack('<H', content[pos:pos + 2])[0]
+        header16 = compat_struct_unpack('<H', content[pos:pos + 2])[0]
          pos += 2
          tag_code = header16 >> 6
          tag_len = header16 & 0x3f
          if tag_len == 0x3f:
-            tag_len = struct_unpack('<I', content[pos:pos + 4])[0]
+            tag_len = compat_struct_unpack('<I', content[pos:pos + 4])[0]
              pos += 4
          assert pos + tag_len <= len(content), \
              ('Tag %d ends at %d+%d - that\'s longer than the file (%d)'
@@ -101,7 +103,7 @@ def _read_int(reader):
      for _ in range(5):
          buf = reader.read(1)
          assert len(buf) == 1
-        b = struct_unpack('<B', buf)[0]
+        b = compat_struct_unpack('<B', buf)[0]
          res = res | ((b & 0x7f) << shift)
          if b & 0x80 == 0:
              break
@@ -113,6 +115,8 @@ def _u30(reader):
      res = _read_int(reader)
      assert res & 0xf0000000 == 0
      return res
+
+
  _u32 = _read_int
  
  
@@ -127,7 +131,7 @@ def _s24(reader):
      bs = reader.read(3)
      assert len(bs) == 3
      last_byte = b'\xff' if (ord(bs[2:3]) >= 0x80) else b'\x00'
-    return struct_unpack('<i', bs + last_byte)[0]
+    return compat_struct_unpack('<i', bs + last_byte)[0]
  
  
  def _read_string(reader):
@@ -146,7 +150,7 @@ def _read_bytes(count, reader):
  
  def _read_byte(reader):
      resb = _read_bytes(1, reader=reader)
-    res = struct_unpack('<B', resb)[0]
+    res = compat_struct_unpack('<B', resb)[0]
      return res
  
  
@@ -174,6 +178,7 @@ class _Undefined(object):
          return 'undefined'
      __repr__ = __str__
  
+
  undefined = _Undefined()
  
  
diff --git a/youtube_dl/update.py b/youtube_dl/update.py

index 676ebe1c42d1d6b54eb50bfc3f087e6fee8e20f0..ebce9666a21465b53b93ccd0bd263b29349720a0 100644 (file)
--- a/youtube_dl/update.py
+++ b/youtube_dl/update.py
@@ -83,11 +83,8 @@ def update_self(to_screen, verbose, opener):
  
      print_notes(to_screen, versions_info['versions'])
  
-    filename = sys.argv[0]
-    # Py2EXE: Filename could be different
-    if hasattr(sys, 'frozen') and not os.path.isfile(filename):
-        if os.path.isfile(filename + '.exe'):
-            filename += '.exe'
+    # sys.executable is set to the full pathname of the exe-file for py2exe
+    filename = sys.executable if hasattr(sys, 'frozen') else sys.argv[0]
  
      if not os.access(filename, os.W_OK):
          to_screen('ERROR: no write permissions on %s' % filename)
@@ -95,7 +92,7 @@ def update_self(to_screen, verbose, opener):
  
      # Py2EXE
      if hasattr(sys, 'frozen'):
-        exe = os.path.abspath(filename)
+        exe = filename
          directory = os.path.dirname(exe)
          if not os.access(directory, os.W_OK):
              to_screen('ERROR: no write permissions on %s' % directory)
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py

index 6d27b80c02912b01b7f64ef85f20097f8216abe4..9595bcf9f120ea4d24133e3f7399e637d14ac035 100644 (file)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -1,5 +1,5 @@
  #!/usr/bin/env python
-# -*- coding: utf-8 -*-
+# coding: utf-8
  
  from __future__ import unicode_literals
  
@@ -14,8 +14,8 @@ import email.utils
  import errno
  import functools
  import gzip
-import itertools
  import io
+import itertools
  import json
  import locale
  import math
@@ -24,9 +24,8 @@ import os
  import pipes
  import platform
  import re
-import ssl
  import socket
-import struct
+import ssl
  import subprocess
  import sys
  import tempfile
@@ -40,27 +39,46 @@ from .compat import (
      compat_chr,
      compat_etree_fromstring,
      compat_html_entities,
+    compat_html_entities_html5,
      compat_http_client,
      compat_kwargs,
+    compat_os_name,
      compat_parse_qs,
+    compat_shlex_quote,
      compat_socket_create_connection,
      compat_str,
+    compat_struct_pack,
+    compat_struct_unpack,
      compat_urllib_error,
      compat_urllib_parse,
      compat_urllib_parse_urlencode,
      compat_urllib_parse_urlparse,
+    compat_urllib_parse_unquote_plus,
      compat_urllib_request,
      compat_urlparse,
      compat_xpath,
-    shlex_quote,
+)
+
+from .socks import (
+    ProxyType,
+    sockssocket,
  )
  
  
+def register_socks_protocols():
+    # "Register" SOCKS protocols
+    # In Python < 2.6.5, urlsplit() suffers from bug https://bugs.python.org/issue7904
+    # URLs with protocols not in urlparse.uses_netloc are not handled correctly
+    for scheme in ('socks', 'socks4', 'socks4a', 'socks5'):
+        if scheme not in compat_urlparse.uses_netloc:
+            compat_urlparse.uses_netloc.append(scheme)
+
+
  # This is not clearly defined otherwise
  compiled_regex_type = type(re.compile(''))
  
  std_headers = {
-    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/44.0 (Chrome)',
+    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)',
      'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Encoding': 'gzip, deflate',
@@ -74,6 +92,13 @@ ENGLISH_MONTH_NAMES = [
      'January', 'February', 'March', 'April', 'May', 'June',
      'July', 'August', 'September', 'October', 'November', 'December']
  
+MONTH_NAMES = {
+    'en': ENGLISH_MONTH_NAMES,
+    'fr': [
+        'janvier', 'février', 'mars', 'avril', 'mai', 'juin',
+        'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'],
+}
+
  KNOWN_EXTENSIONS = (
      'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'aac',
      'flv', 'f4v', 'f4a', 'f4b',
@@ -89,6 +114,59 @@ KNOWN_EXTENSIONS = (
      'wav',
      'f4f', 'f4m', 'm3u8', 'smil')
  
+# needed for sanitizing filenames in restricted mode
+ACCENT_CHARS = dict(zip('ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖŐØŒÙÚÛÜŰÝÞßàáâãäåæçèéêëìíîïðñòóôõöőøœùúûüűýþÿ',
+                        itertools.chain('AAAAAA', ['AE'], 'CEEEEIIIIDNOOOOOOO', ['OE'], 'UUUUUYP', ['ss'],
+                                        'aaaaaa', ['ae'], 'ceeeeiiiionooooooo', ['oe'], 'uuuuuypy')))
+
+DATE_FORMATS = (
+    '%d %B %Y',
+    '%d %b %Y',
+    '%B %d %Y',
+    '%b %d %Y',
+    '%b %dst %Y %I:%M',
+    '%b %dnd %Y %I:%M',
+    '%b %dth %Y %I:%M',
+    '%Y %m %d',
+    '%Y-%m-%d',
+    '%Y/%m/%d',
+    '%Y/%m/%d %H:%M',
+    '%Y/%m/%d %H:%M:%S',
+    '%Y-%m-%d %H:%M:%S',
+    '%Y-%m-%d %H:%M:%S.%f',
+    '%d.%m.%Y %H:%M',
+    '%d.%m.%Y %H.%M',
+    '%Y-%m-%dT%H:%M:%SZ',
+    '%Y-%m-%dT%H:%M:%S.%fZ',
+    '%Y-%m-%dT%H:%M:%S.%f0Z',
+    '%Y-%m-%dT%H:%M:%S',
+    '%Y-%m-%dT%H:%M:%S.%f',
+    '%Y-%m-%dT%H:%M',
+    '%b %d %Y at %H:%M',
+    '%b %d %Y at %H:%M:%S',
+)
+
+DATE_FORMATS_DAY_FIRST = list(DATE_FORMATS)
+DATE_FORMATS_DAY_FIRST.extend([
+    '%d-%m-%Y',
+    '%d.%m.%Y',
+    '%d.%m.%y',
+    '%d/%m/%Y',
+    '%d/%m/%y',
+    '%d/%m/%Y %H:%M:%S',
+])
+
+DATE_FORMATS_MONTH_FIRST = list(DATE_FORMATS)
+DATE_FORMATS_MONTH_FIRST.extend([
+    '%m-%d-%Y',
+    '%m.%d.%Y',
+    '%m/%d/%Y',
+    '%m/%d/%y',
+    '%m/%d/%Y %H:%M:%S',
+])
+
+PACKED_CODES_RE = r"}\('(.+)',(\d+),(\d+),'([^']+)'\.split\('\|'\)"
+
  
  def preferredencoding():
      """Get preferred encoding.
@@ -246,18 +324,26 @@ def get_element_by_id(id, html):
      return get_element_by_attribute('id', id, html)
  
  
-def get_element_by_attribute(attribute, value, html):
+def get_element_by_class(class_name, html):
+    return get_element_by_attribute(
+        'class', r'[^\'"]*\b%s\b[^\'"]*' % re.escape(class_name),
+        html, escape_value=False)
+
+
+def get_element_by_attribute(attribute, value, html, escape_value=True):
      """Return the content of the tag with the specified attribute in the passed HTML document"""
  
+    value = re.escape(value) if escape_value else value
+
      m = re.search(r'''(?xs)
          <([a-zA-Z0-9:._-]+)
-         (?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]+|="[^"]+"|='[^']+'))*?
+         (?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'))*?
           \s+%s=['"]?%s['"]?
-         (?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]+|="[^"]+"|='[^']+'))*?
+         (?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'))*?
          \s*>
          (?P<content>.*?)
          </\1>
-    ''' % (re.escape(attribute), re.escape(value)), html)
+    ''' % (re.escape(attribute), value), html)
  
      if not m:
          return None
@@ -365,6 +451,8 @@ def sanitize_filename(s, restricted=False, is_id=False):
      Set is_id if this is not an arbitrary string, but an ID that should be kept if possible
      """
      def replace_insane(char):
+        if restricted and char in ACCENT_CHARS:
+            return ACCENT_CHARS[char]
          if char == '?' or ord(char) < 32 or ord(char) == 127:
              return ''
          elif char == '"':
@@ -434,12 +522,19 @@ def orderedSet(iterable):
      return res
  
  
-def _htmlentity_transform(entity):
+def _htmlentity_transform(entity_with_semicolon):
      """Transforms an HTML entity to a character."""
+    entity = entity_with_semicolon[:-1]
+
      # Known non-numeric HTML entity
      if entity in compat_html_entities.name2codepoint:
          return compat_chr(compat_html_entities.name2codepoint[entity])
  
+    # TODO: HTML5 allows entities without a semicolon. For example,
+    # '&Eacuteric' should be decoded as 'Éric'.
+    if entity_with_semicolon in compat_html_entities_html5:
+        return compat_html_entities_html5[entity_with_semicolon]
+
      mobj = re.match(r'#(x[0-9a-fA-F]+|[0-9]+)', entity)
      if mobj is not None:
          numstr = mobj.group(1)
@@ -464,7 +559,7 @@ def unescapeHTML(s):
      assert type(s) == compat_str
  
      return re.sub(
-        r'&([^;]+);', lambda m: _htmlentity_transform(m.group(1)), s)
+        r'&([^;]+;)', lambda m: _htmlentity_transform(m.group(1)), s)
  
  
  def get_subprocess_encoding():
@@ -685,6 +780,26 @@ class ContentTooShortError(Exception):
          self.expected = expected
  
  
+class XAttrMetadataError(Exception):
+    def __init__(self, code=None, msg='Unknown error'):
+        super(XAttrMetadataError, self).__init__(msg)
+        self.code = code
+        self.msg = msg
+
+        # Parsing code and msg
+        if (self.code in (errno.ENOSPC, errno.EDQUOT) or
+                'No space left' in self.msg or 'Disk quota excedded' in self.msg):
+            self.reason = 'NO_SPACE'
+        elif self.code == errno.E2BIG or 'Argument list too long' in self.msg:
+            self.reason = 'VALUE_TOO_LONG'
+        else:
+            self.reason = 'NOT_SUPPORTED'
+
+
+class XAttrUnavailableError(Exception):
+    pass
+
+
  def _create_http_connection(ydl_handler, http_class, is_https, *args, **kwargs):
      # Working around python 2 bug (see http://bugs.python.org/issue17849) by limiting
      # expected HTTP responses to meet HTTP/1.0 or later (see also
@@ -745,8 +860,15 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
          self._params = params
  
      def http_open(self, req):
+        conn_class = compat_http_client.HTTPConnection
+
+        socks_proxy = req.headers.get('Ytdl-socks-proxy')
+        if socks_proxy:
+            conn_class = make_socks_conn_class(conn_class, socks_proxy)
+            del req.headers['Ytdl-socks-proxy']
+
          return self.do_open(functools.partial(
-            _create_http_connection, self, compat_http_client.HTTPConnection, False),
+            _create_http_connection, self, conn_class, False),
              req)
  
      @staticmethod
@@ -778,12 +900,7 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
  
          # Substitute URL if any change after escaping
          if url != url_escaped:
-            req_type = HEADRequest if req.get_method() == 'HEAD' else compat_urllib_request.Request
-            new_req = req_type(
-                url_escaped, data=req.data, headers=req.headers,
-                origin_req_host=req.origin_req_host, unverifiable=req.unverifiable)
-            new_req.timeout = req.timeout
-            req = new_req
+            req = update_Request(req, url=url_escaped)
  
          for h, v in std_headers.items():
              # Capitalize is needed because of Python bug 2275: http://bugs.python.org/issue2275
@@ -837,9 +954,13 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
                  # As of RFC 2616 default charset is iso-8859-1 that is respected by python 3
                  if sys.version_info >= (3, 0):
                      location = location.encode('iso-8859-1').decode('utf-8')
+                else:
+                    location = location.decode('utf-8')
                  location_escaped = escape_url(location)
                  if location != location_escaped:
                      del resp.headers['Location']
+                    if sys.version_info < (3, 0):
+                        location_escaped = location_escaped.encode('utf-8')
                      resp.headers['Location'] = location_escaped
          return resp
  
@@ -847,6 +968,49 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
      https_response = http_response
  
  
+def make_socks_conn_class(base_class, socks_proxy):
+    assert issubclass(base_class, (
+        compat_http_client.HTTPConnection, compat_http_client.HTTPSConnection))
+
+    url_components = compat_urlparse.urlparse(socks_proxy)
+    if url_components.scheme.lower() == 'socks5':
+        socks_type = ProxyType.SOCKS5
+    elif url_components.scheme.lower() in ('socks', 'socks4'):
+        socks_type = ProxyType.SOCKS4
+    elif url_components.scheme.lower() == 'socks4a':
+        socks_type = ProxyType.SOCKS4A
+
+    def unquote_if_non_empty(s):
+        if not s:
+            return s
+        return compat_urllib_parse_unquote_plus(s)
+
+    proxy_args = (
+        socks_type,
+        url_components.hostname, url_components.port or 1080,
+        True,  # Remote DNS
+        unquote_if_non_empty(url_components.username),
+        unquote_if_non_empty(url_components.password),
+    )
+
+    class SocksConnection(base_class):
+        def connect(self):
+            self.sock = sockssocket()
+            self.sock.setproxy(*proxy_args)
+            if type(self.timeout) in (int, float):
+                self.sock.settimeout(self.timeout)
+            self.sock.connect((self.host, self.port))
+
+            if isinstance(self, compat_http_client.HTTPSConnection):
+                if hasattr(self, '_context'):  # Python > 2.6
+                    self.sock = self._context.wrap_socket(
+                        self.sock, server_hostname=self.host)
+                else:
+                    self.sock = ssl.wrap_socket(self.sock)
+
+    return SocksConnection
+
+
  class YoutubeDLHTTPSHandler(compat_urllib_request.HTTPSHandler):
      def __init__(self, params, https_conn_class=None, *args, **kwargs):
          compat_urllib_request.HTTPSHandler.__init__(self, *args, **kwargs)
@@ -855,12 +1019,20 @@ class YoutubeDLHTTPSHandler(compat_urllib_request.HTTPSHandler):
  
      def https_open(self, req):
          kwargs = {}
+        conn_class = self._https_conn_class
+
          if hasattr(self, '_context'):  # python > 2.6
              kwargs['context'] = self._context
          if hasattr(self, '_check_hostname'):  # python 3.x
              kwargs['check_hostname'] = self._check_hostname
+
+        socks_proxy = req.headers.get('Ytdl-socks-proxy')
+        if socks_proxy:
+            conn_class = make_socks_conn_class(conn_class, socks_proxy)
+            del req.headers['Ytdl-socks-proxy']
+
          return self.do_open(functools.partial(
-            _create_http_connection, self, self._https_conn_class, True),
+            _create_http_connection, self, conn_class, True),
              req, **kwargs)
  
  
@@ -888,6 +1060,24 @@ class YoutubeDLCookieProcessor(compat_urllib_request.HTTPCookieProcessor):
      https_response = http_response
  
  
+def extract_timezone(date_str):
+    m = re.search(
+        r'^.{8,}?(?P<tz>Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
+        date_str)
+    if not m:
+        timezone = datetime.timedelta()
+    else:
+        date_str = date_str[:-len(m.group('tz'))]
+        if not m.group('sign'):
+            timezone = datetime.timedelta()
+        else:
+            sign = 1 if m.group('sign') == '+' else -1
+            timezone = datetime.timedelta(
+                hours=sign * int(m.group('hours')),
+                minutes=sign * int(m.group('minutes')))
+    return timezone, date_str
+
+
  def parse_iso8601(date_str, delimiter='T', timezone=None):
      """ Return a UNIX timestamp from the given date """
  
@@ -897,20 +1087,8 @@ def parse_iso8601(date_str, delimiter='T', timezone=None):
      date_str = re.sub(r'\.[0-9]+', '', date_str)
  
      if timezone is None:
-        m = re.search(
-            r'(?:Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
-            date_str)
-        if not m:
-            timezone = datetime.timedelta()
-        else:
-            date_str = date_str[:-len(m.group(0))]
-            if not m.group('sign'):
-                timezone = datetime.timedelta()
-            else:
-                sign = 1 if m.group('sign') == '+' else -1
-                timezone = datetime.timedelta(
-                    hours=sign * int(m.group('hours')),
-                    minutes=sign * int(m.group('minutes')))
+        timezone, date_str = extract_timezone(date_str)
+
      try:
          date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter)
          dt = datetime.datetime.strptime(date_str, date_format) - timezone
@@ -919,6 +1097,10 @@ def parse_iso8601(date_str, delimiter='T', timezone=None):
          pass
  
  
+def date_formats(day_first=True):
+    return DATE_FORMATS_DAY_FIRST if day_first else DATE_FORMATS_MONTH_FIRST
+
+
  def unified_strdate(date_str, day_first=True):
      """Return a string with the date in the format YYYYMMDD"""
  
@@ -927,52 +1109,11 @@ def unified_strdate(date_str, day_first=True):
      upload_date = None
      # Replace commas
      date_str = date_str.replace(',', ' ')
-    # %z (UTC offset) is only supported in python>=3.2
-    if not re.match(r'^[0-9]{1,2}-[0-9]{1,2}-[0-9]{4}$', date_str):
-        date_str = re.sub(r' ?(\+|-)[0-9]{2}:?[0-9]{2}$', '', date_str)
      # Remove AM/PM + timezone
      date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str)
+    _, date_str = extract_timezone(date_str)
  
-    format_expressions = [
-        '%d %B %Y',
-        '%d %b %Y',
-        '%B %d %Y',
-        '%b %d %Y',
-        '%b %dst %Y %I:%M',
-        '%b %dnd %Y %I:%M',
-        '%b %dth %Y %I:%M',
-        '%Y %m %d',
-        '%Y-%m-%d',
-        '%Y/%m/%d',
-        '%Y/%m/%d %H:%M:%S',
-        '%Y-%m-%d %H:%M:%S',
-        '%Y-%m-%d %H:%M:%S.%f',
-        '%d.%m.%Y %H:%M',
-        '%d.%m.%Y %H.%M',
-        '%Y-%m-%dT%H:%M:%SZ',
-        '%Y-%m-%dT%H:%M:%S.%fZ',
-        '%Y-%m-%dT%H:%M:%S.%f0Z',
-        '%Y-%m-%dT%H:%M:%S',
-        '%Y-%m-%dT%H:%M:%S.%f',
-        '%Y-%m-%dT%H:%M',
-    ]
-    if day_first:
-        format_expressions.extend([
-            '%d-%m-%Y',
-            '%d.%m.%Y',
-            '%d/%m/%Y',
-            '%d/%m/%y',
-            '%d/%m/%Y %H:%M:%S',
-        ])
-    else:
-        format_expressions.extend([
-            '%m-%d-%Y',
-            '%m.%d.%Y',
-            '%m/%d/%Y',
-            '%m/%d/%y',
-            '%m/%d/%Y %H:%M:%S',
-        ])
-    for expression in format_expressions:
+    for expression in date_formats(day_first):
          try:
              upload_date = datetime.datetime.strptime(date_str, expression).strftime('%Y%m%d')
          except ValueError:
@@ -980,11 +1121,37 @@ def unified_strdate(date_str, day_first=True):
      if upload_date is None:
          timetuple = email.utils.parsedate_tz(date_str)
          if timetuple:
-            upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d')
+            try:
+                upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d')
+            except ValueError:
+                pass
      if upload_date is not None:
          return compat_str(upload_date)
  
  
+def unified_timestamp(date_str, day_first=True):
+    if date_str is None:
+        return None
+
+    date_str = date_str.replace(',', ' ')
+
+    pm_delta = 12 if re.search(r'(?i)PM', date_str) else 0
+    timezone, date_str = extract_timezone(date_str)
+
+    # Remove AM/PM + timezone
+    date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str)
+
+    for expression in date_formats(day_first):
+        try:
+            dt = datetime.datetime.strptime(date_str, expression) - timezone + datetime.timedelta(hours=pm_delta)
+            return calendar.timegm(dt.timetuple())
+        except ValueError:
+            pass
+    timetuple = email.utils.parsedate_tz(date_str)
+    if timetuple:
+        return calendar.timegm(timetuple) + pm_delta * 3600
+
+
  def determine_ext(url, default_ext='unknown_video'):
      if url is None:
          return default_ext
@@ -1191,7 +1358,7 @@ def bytes_to_intlist(bs):
  def intlist_to_bytes(xs):
      if not xs:
          return b''
-    return struct_pack('%dB' % len(xs), *xs)
+    return compat_struct_pack('%dB' % len(xs), *xs)
  
  
  # Cross-platform file locking
@@ -1319,6 +1486,8 @@ def shell_quote(args):
  def smuggle_url(url, data):
      """ Pass additional data in a URL for internal use. """
  
+    url, idata = unsmuggle_url(url, {})
+    data.update(idata)
      sdata = compat_urllib_parse_urlencode(
          {'__youtubedl_smuggle': json.dumps(data)})
      return url + '#' + sdata
@@ -1367,38 +1536,63 @@ def parse_filesize(s):
      _UNIT_TABLE = {
          'B': 1,
          'b': 1,
+        'bytes': 1,
          'KiB': 1024,
          'KB': 1000,
          'kB': 1024,
          'Kb': 1000,
+        'kb': 1000,
+        'kilobytes': 1000,
+        'kibibytes': 1024,
          'MiB': 1024 ** 2,
          'MB': 1000 ** 2,
          'mB': 1024 ** 2,
          'Mb': 1000 ** 2,
+        'mb': 1000 ** 2,
+        'megabytes': 1000 ** 2,
+        'mebibytes': 1024 ** 2,
          'GiB': 1024 ** 3,
          'GB': 1000 ** 3,
          'gB': 1024 ** 3,
          'Gb': 1000 ** 3,
+        'gb': 1000 ** 3,
+        'gigabytes': 1000 ** 3,
+        'gibibytes': 1024 ** 3,
          'TiB': 1024 ** 4,
          'TB': 1000 ** 4,
          'tB': 1024 ** 4,
          'Tb': 1000 ** 4,
+        'tb': 1000 ** 4,
+        'terabytes': 1000 ** 4,
+        'tebibytes': 1024 ** 4,
          'PiB': 1024 ** 5,
          'PB': 1000 ** 5,
          'pB': 1024 ** 5,
          'Pb': 1000 ** 5,
+        'pb': 1000 ** 5,
+        'petabytes': 1000 ** 5,
+        'pebibytes': 1024 ** 5,
          'EiB': 1024 ** 6,
          'EB': 1000 ** 6,
          'eB': 1024 ** 6,
          'Eb': 1000 ** 6,
+        'eb': 1000 ** 6,
+        'exabytes': 1000 ** 6,
+        'exbibytes': 1024 ** 6,
          'ZiB': 1024 ** 7,
          'ZB': 1000 ** 7,
          'zB': 1024 ** 7,
          'Zb': 1000 ** 7,
+        'zb': 1000 ** 7,
+        'zettabytes': 1000 ** 7,
+        'zebibytes': 1024 ** 7,
          'YiB': 1024 ** 8,
          'YB': 1000 ** 8,
          'yB': 1024 ** 8,
          'Yb': 1000 ** 8,
+        'yb': 1000 ** 8,
+        'yottabytes': 1000 ** 8,
+        'yobibytes': 1024 ** 8,
      }
  
      return lookup_unit_table(_UNIT_TABLE, s)
@@ -1425,11 +1619,13 @@ def parse_count(s):
      return lookup_unit_table(_UNIT_TABLE, s)
  
  
-def month_by_name(name):
+def month_by_name(name, lang='en'):
      """ Return the number of a month by (locale-independently) English name """
  
+    month_names = MONTH_NAMES.get(lang, MONTH_NAMES['en'])
+
      try:
-        return ENGLISH_MONTH_NAMES.index(name) + 1
+        return month_names.index(name) + 1
      except ValueError:
          return None
  
@@ -1474,15 +1670,11 @@ def setproctitle(title):
  
  
  def remove_start(s, start):
-    if s.startswith(start):
-        return s[len(start):]
-    return s
+    return s[len(start):] if s is not None and s.startswith(start) else s
  
  
  def remove_end(s, end):
-    if s.endswith(end):
-        return s[:-len(end)]
-    return s
+    return s[:-len(end)] if s is not None and s.endswith(end) else s
  
  
  def remove_quotes(s):
@@ -1499,11 +1691,20 @@ def url_basename(url):
      return path.strip('/').split('/')[-1]
  
  
+def base_url(url):
+    return re.match(r'https?://[^?#&]+/', url).group()
+
+
  class HEADRequest(compat_urllib_request.Request):
      def get_method(self):
          return 'HEAD'
  
  
+class PUTRequest(compat_urllib_request.Request):
+    def get_method(self):
+        return 'PUT'
+
+
  def int_or_none(v, scale=1, default=None, get_attr=None, invscale=1):
      if get_attr:
          if v is not None:
@@ -1539,50 +1740,56 @@ def float_or_none(v, scale=1, invscale=1, default=None):
          return default
  
  
+def strip_or_none(v):
+    return None if v is None else v.strip()
+
+
  def parse_duration(s):
      if not isinstance(s, compat_basestring):
          return None
  
      s = s.strip()
  
-    m = re.match(
-        r'''(?ix)(?:P?T)?
-        (?:
-            (?P<only_mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*|
-            (?P<only_hours>[0-9.]+)\s*(?:hours?)|
-
-            \s*(?P<hours_reversed>[0-9]+)\s*(?:[:h]|hours?)\s*(?P<mins_reversed>[0-9]+)\s*(?:[:m]|mins?\.?|minutes?)\s*|
-            (?:
+    days, hours, mins, secs, ms = [None] * 5
+    m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?$', s)
+    if m:
+        days, hours, mins, secs, ms = m.groups()
+    else:
+        m = re.match(
+            r'''(?ix)(?:P?T)?
                  (?:
-                    (?:(?P<days>[0-9]+)\s*(?:[:d]|days?)\s*)?
-                    (?P<hours>[0-9]+)\s*(?:[:h]|hours?)\s*
+                    (?P<days>[0-9]+)\s*d(?:ays?)?\s*
                  )?
-                (?P<mins>[0-9]+)\s*(?:[:m]|mins?|minutes?)\s*
-            )?
-            (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*(?:s|secs?|seconds?)?
-        )$''', s)
-    if not m:
-        return None
-    res = 0
-    if m.group('only_mins'):
-        return float_or_none(m.group('only_mins'), invscale=60)
-    if m.group('only_hours'):
-        return float_or_none(m.group('only_hours'), invscale=60 * 60)
-    if m.group('secs'):
-        res += int(m.group('secs'))
-    if m.group('mins_reversed'):
-        res += int(m.group('mins_reversed')) * 60
-    if m.group('mins'):
-        res += int(m.group('mins')) * 60
-    if m.group('hours'):
-        res += int(m.group('hours')) * 60 * 60
-    if m.group('hours_reversed'):
-        res += int(m.group('hours_reversed')) * 60 * 60
-    if m.group('days'):
-        res += int(m.group('days')) * 24 * 60 * 60
-    if m.group('ms'):
-        res += float(m.group('ms'))
-    return res
+                (?:
+                    (?P<hours>[0-9]+)\s*h(?:ours?)?\s*
+                )?
+                (?:
+                    (?P<mins>[0-9]+)\s*m(?:in(?:ute)?s?)?\s*
+                )?
+                (?:
+                    (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
+                )?$''', s)
+        if m:
+            days, hours, mins, secs, ms = m.groups()
+        else:
+            m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)$', s)
+            if m:
+                hours, mins = m.groups()
+            else:
+                return None
+
+    duration = 0
+    if secs:
+        duration += float(secs)
+    if mins:
+        duration += float(mins) * 60
+    if hours:
+        duration += float(hours) * 60 * 60
+    if days:
+        duration += float(days) * 24 * 60 * 60
+    if ms:
+        duration += float(ms)
+    return duration
  
  
  def prepend_extension(filename, ext, expected_real_ext=None):
@@ -1615,8 +1822,12 @@ def get_exe_version(exe, args=['--version'],
      """ Returns the version of the specified executable,
      or False if the executable is not present """
      try:
+        # STDIN should be redirected too. On UNIX-like systems, ffmpeg triggers
+        # SIGTTOU if youtube-dl is run in the background.
+        # See https://github.com/rg3/youtube-dl/issues/955#issuecomment-209789656
          out, _ = subprocess.Popen(
              [encodeArgument(exe)] + args,
+            stdin=subprocess.PIPE,
              stdout=subprocess.PIPE, stderr=subprocess.STDOUT).communicate()
      except OSError:
          return False
@@ -1757,24 +1968,6 @@ def escape_url(url):
          fragment=escape_rfc3986(url_parsed.fragment)
      ).geturl()
  
-try:
-    struct.pack('!I', 0)
-except TypeError:
-    # In Python 2.6 and 2.7.x < 2.7.7, struct requires a bytes argument
-    # See https://bugs.python.org/issue19099
-    def struct_pack(spec, *args):
-        if isinstance(spec, compat_str):
-            spec = spec.encode('ascii')
-        return struct.pack(spec, *args)
-
-    def struct_unpack(spec, *args):
-        if isinstance(spec, compat_str):
-            spec = spec.encode('ascii')
-        return struct.unpack(spec, *args)
-else:
-    struct_pack = struct.pack
-    struct_unpack = struct.unpack
-
  
  def read_batch_urls(batch_fd):
      def fixup(url):
@@ -1797,6 +1990,8 @@ def urlencode_postdata(*args, **kargs):
  
  
  def update_url_query(url, query):
+    if not query:
+        return url
      parsed_url = compat_urlparse.urlparse(url)
      qs = compat_parse_qs(parsed_url.query)
      qs.update(query)
@@ -1804,6 +1999,26 @@ def update_url_query(url, query):
          query=compat_urllib_parse_urlencode(qs, True)))
  
  
+def update_Request(req, url=None, data=None, headers={}, query={}):
+    req_headers = req.headers.copy()
+    req_headers.update(headers)
+    req_data = data or req.data
+    req_url = update_url_query(url or req.get_full_url(), query)
+    req_get_method = req.get_method()
+    if req_get_method == 'HEAD':
+        req_type = HEADRequest
+    elif req_get_method == 'PUT':
+        req_type = PUTRequest
+    else:
+        req_type = compat_urllib_request.Request
+    new_req = req_type(
+        req_url, data=req_data, headers=req_headers,
+        origin_req_host=req.origin_req_host, unverifiable=req.unverifiable)
+    if hasattr(req, 'timeout'):
+        new_req.timeout = req.timeout
+    return new_req
+
+
  def dict_get(d, key_or_keys, default=None, skip_false_values=True):
      if isinstance(key_or_keys, (list, tuple)):
          for key in key_or_keys:
@@ -1814,6 +2029,16 @@ def dict_get(d, key_or_keys, default=None, skip_false_values=True):
      return d.get(key_or_keys, default)
  
  
+def try_get(src, getter, expected_type=None):
+    try:
+        v = getter(src)
+    except (AttributeError, KeyError, TypeError, IndexError):
+        pass
+    else:
+        if expected_type is None or isinstance(v, expected_type):
+            return v
+
+
  def encode_compat_str(string, encoding=preferredencoding(), errors='strict'):
      return string if isinstance(string, compat_str) else compat_str(string, encoding, errors)
  
@@ -1827,16 +2052,32 @@ US_RATINGS = {
  }
  
  
+TV_PARENTAL_GUIDELINES = {
+    'TV-Y': 0,
+    'TV-Y7': 7,
+    'TV-G': 0,
+    'TV-PG': 0,
+    'TV-14': 14,
+    'TV-MA': 17,
+}
+
+
  def parse_age_limit(s):
-    if s is None:
+    if type(s) == int:
+        return s if 0 <= s <= 21 else None
+    if not isinstance(s, compat_basestring):
          return None
      m = re.match(r'^(?P<age>\d{1,2})\+?$', s)
-    return int(m.group('age')) if m else US_RATINGS.get(s)
+    if m:
+        return int(m.group('age'))
+    if s in US_RATINGS:
+        return US_RATINGS[s]
+    return TV_PARENTAL_GUIDELINES.get(s)
  
  
  def strip_jsonp(code):
      return re.sub(
-        r'(?s)^[a-zA-Z0-9_.]+\s*\(\s*(.*)\);?\s*?(?://[^\n]*)*$', r'\1', code)
+        r'(?s)^[a-zA-Z0-9_.$]+\s*\(\s*(.*)\);?\s*?(?://[^\n]*)*$', r'\1', code)
  
  
  def js_to_json(code):
@@ -1844,24 +2085,38 @@ def js_to_json(code):
          v = m.group(0)
          if v in ('true', 'false', 'null'):
              return v
-        if v.startswith('"'):
-            v = re.sub(r"\\'", "'", v[1:-1])
-        elif v.startswith("'"):
-            v = v[1:-1]
-            v = re.sub(r"\\\\|\\'|\"", lambda m: {
-                '\\\\': '\\\\',
-                "\\'": "'",
+        elif v.startswith('/*') or v == ',':
+            return ""
+
+        if v[0] in ("'", '"'):
+            v = re.sub(r'(?s)\\.|"', lambda m: {
                  '"': '\\"',
-            }[m.group(0)], v)
+                "\\'": "'",
+                '\\\n': '',
+                '\\x': '\\u00',
+            }.get(m.group(0), m.group(0)), v[1:-1])
+
+        INTEGER_TABLE = (
+            (r'^(0[xX][0-9a-fA-F]+)\s*:?$', 16),
+            (r'^(0+[0-7]+)\s*:?$', 8),
+        )
+
+        for regex, base in INTEGER_TABLE:
+            im = re.match(regex, v)
+            if im:
+                i = int(im.group(1), base)
+                return '"%d":' % i if v.endswith(':') else '%d' % i
+
          return '"%s"' % v
  
-    res = re.sub(r'''(?x)
-        "(?:[^"\\]*(?:\\\\|\\['"nu]))*[^"\\]*"|
-        '(?:[^'\\]*(?:\\\\|\\['"nu]))*[^'\\]*'|
-        [a-zA-Z_][.a-zA-Z_0-9]*
+    return re.sub(r'''(?sx)
+        "(?:[^"\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^"\\]*"|
+        '(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'|
+        /\*.*?\*/|,(?=\s*[\]}])|
+        [a-zA-Z_][.a-zA-Z_0-9]*|
+        \b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:\s*:)?|
+        [0-9]+(?=\s*:)
          ''', fix_kv, code)
-    res = re.sub(r',(\s*[\]}])', lambda m: m.group(1), res)
-    return res
  
  
  def qualities(quality_ids):
@@ -1909,7 +2164,7 @@ def ytdl_is_updateable():
  
  def args_to_str(args):
      # Get a short string representation for a subprocess command
-    return ' '.join(shlex_quote(a) for a in args)
+    return ' '.join(compat_shlex_quote(a) for a in args)
  
  
  def error_to_compat_str(err):
@@ -1922,13 +2177,20 @@ def error_to_compat_str(err):
  
  
  def mimetype2ext(mt):
+    if mt is None:
+        return None
+
      ext = {
          'audio/mp4': 'm4a',
+        # Per RFC 3003, audio/mpeg can be .mp1, .mp2 or .mp3. Here use .mp3 as
+        # it's the most popular one
+        'audio/mpeg': 'mp3',
      }.get(mt)
      if ext is not None:
          return ext
  
      _, _, res = mt.rpartition('/')
+    res = res.split(';')[0].strip().lower()
  
      return {
          '3gpp': '3gp',
@@ -1940,15 +2202,56 @@ def mimetype2ext(mt):
          'x-flv': 'flv',
          'x-mp4-fragmented': 'mp4',
          'x-ms-wmv': 'wmv',
+        'mpegurl': 'm3u8',
+        'x-mpegurl': 'm3u8',
+        'vnd.apple.mpegurl': 'm3u8',
+        'dash+xml': 'mpd',
+        'f4m': 'f4m',
+        'f4m+xml': 'f4m',
+        'hds+xml': 'f4m',
+        'vnd.ms-sstr+xml': 'ism',
+        'quicktime': 'mov',
      }.get(res, res)
  
  
+def parse_codecs(codecs_str):
+    # http://tools.ietf.org/html/rfc6381
+    if not codecs_str:
+        return {}
+    splited_codecs = list(filter(None, map(
+        lambda str: str.strip(), codecs_str.strip().strip(',').split(','))))
+    vcodec, acodec = None, None
+    for full_codec in splited_codecs:
+        codec = full_codec.split('.')[0]
+        if codec in ('avc1', 'avc2', 'avc3', 'avc4', 'vp9', 'vp8', 'hev1', 'hev2', 'h263', 'h264', 'mp4v'):
+            if not vcodec:
+                vcodec = full_codec
+        elif codec in ('mp4a', 'opus', 'vorbis', 'mp3', 'aac', 'ac-3'):
+            if not acodec:
+                acodec = full_codec
+        else:
+            write_string('WARNING: Unknown codec %s' % full_codec, sys.stderr)
+    if not vcodec and not acodec:
+        if len(splited_codecs) == 2:
+            return {
+                'vcodec': vcodec,
+                'acodec': acodec,
+            }
+        elif len(splited_codecs) == 1:
+            return {
+                'vcodec': 'none',
+                'acodec': vcodec,
+            }
+    else:
+        return {
+            'vcodec': vcodec or 'none',
+            'acodec': acodec or 'none',
+        }
+    return {}
+
+
  def urlhandle_detect_ext(url_handle):
-    try:
-        url_handle.headers
-        getheader = lambda h: url_handle.headers[h]
-    except AttributeError:  # Python < 3
-        getheader = url_handle.info().getheader
+    getheader = url_handle.headers.get
  
      cd = getheader('Content-Disposition')
      if cd:
@@ -2046,11 +2349,18 @@ def _match_one(filter_part, dct):
      m = operator_rex.search(filter_part)
      if m:
          op = COMPARISON_OPERATORS[m.group('op')]
-        if m.group('strval') is not None:
+        actual_value = dct.get(m.group('key'))
+        if (m.group('strval') is not None or
+            # If the original field is a string and matching comparisonvalue is
+            # a number we should respect the origin of the original field
+            # and process comparison value as a string (see
+            # https://github.com/rg3/youtube-dl/issues/11082).
+            actual_value is not None and m.group('intval') is not None and
+                isinstance(actual_value, compat_str)):
              if m.group('op') not in ('=', '!='):
                  raise ValueError(
                      'Operator %s does not support string values!' % m.group('op'))
-            comparison_value = m.group('strval')
+            comparison_value = m.group('strval') or m.group('intval')
          else:
              try:
                  comparison_value = int(m.group('intval'))
@@ -2062,7 +2372,6 @@ def _match_one(filter_part, dct):
                      raise ValueError(
                          'Invalid integer value %r in filter part %r' % (
                              m.group('intval'), filter_part))
-        actual_value = dct.get(m.group('key'))
          if actual_value is None:
              return m.group('none_inclusive')
          return op(actual_value, comparison_value)
@@ -2122,6 +2431,7 @@ def dfxp2srt(dfxp_data):
      _x = functools.partial(xpath_with_ns, ns_map={
          'ttml': 'http://www.w3.org/ns/ttml',
          'ttaf1': 'http://www.w3.org/2006/10/ttaf1',
+        'ttaf1_0604': 'http://www.w3.org/2006/04/ttaf1',
      })
  
      class TTMLPElementParser(object):
@@ -2148,7 +2458,7 @@ def dfxp2srt(dfxp_data):
  
      dfxp = compat_etree_fromstring(dfxp_data.encode('utf-8'))
      out = []
-    paras = dfxp.findall(_x('.//ttml:p')) or dfxp.findall(_x('.//ttaf1:p')) or dfxp.findall('.//p')
+    paras = dfxp.findall(_x('.//ttml:p')) or dfxp.findall(_x('.//ttaf1:p')) or dfxp.findall(_x('.//ttaf1_0604:p')) or dfxp.findall('.//p')
  
      if not paras:
          raise ValueError('Invalid dfxp/TTML subtitle')
@@ -2174,6 +2484,8 @@ def dfxp2srt(dfxp_data):
  
  def cli_option(params, command_option, param):
      param = params.get(param)
+    if param:
+        param = compat_str(param)
      return [command_option, param] if param is not None else []
  
  
@@ -2677,6 +2989,10 @@ class PerRequestProxyHandler(compat_urllib_request.ProxyHandler):
  
          if proxy == '__noproxy__':
              return None  # No Proxy
+        if compat_urlparse.urlparse(proxy).scheme.lower() in ('socks', 'socks4', 'socks4a', 'socks5'):
+            req.add_header('Ytdl-socks-proxy', proxy)
+            # youtube-dl's http/https handlers do wrapping the socket with socks
+            return None
          return compat_urllib_request.ProxyHandler.proxy_open(
              self, req, proxy, type)
  
@@ -2717,9 +3033,7 @@ def encode_base_n(num, n, table=None):
  
  
  def decode_packed_codes(code):
-    mobj = re.search(
-        r"}\('(.+)',(\d+),(\d+),'([^']+)'\.split\('\|'\)",
-        code)
+    mobj = re.search(PACKED_CODES_RE, code)
      obfucasted_code, base, count, symbols = mobj.groups()
      base = int(base)
      count = int(count)
@@ -2734,3 +3048,207 @@ def decode_packed_codes(code):
      return re.sub(
          r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)],
          obfucasted_code)
+
+
+def parse_m3u8_attributes(attrib):
+    info = {}
+    for (key, val) in re.findall(r'(?P<key>[A-Z0-9-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)', attrib):
+        if val.startswith('"'):
+            val = val[1:-1]
+        info[key] = val
+    return info
+
+
+def urshift(val, n):
+    return val >> n if val >= 0 else (val + 0x100000000) >> n
+
+
+# Based on png2str() written by @gdkchan and improved by @yokrysty
+# Originally posted at https://github.com/rg3/youtube-dl/issues/9706
+def decode_png(png_data):
+    # Reference: https://www.w3.org/TR/PNG/
+    header = png_data[8:]
+
+    if png_data[:8] != b'\x89PNG\x0d\x0a\x1a\x0a' or header[4:8] != b'IHDR':
+        raise IOError('Not a valid PNG file.')
+
+    int_map = {1: '>B', 2: '>H', 4: '>I'}
+    unpack_integer = lambda x: compat_struct_unpack(int_map[len(x)], x)[0]
+
+    chunks = []
+
+    while header:
+        length = unpack_integer(header[:4])
+        header = header[4:]
+
+        chunk_type = header[:4]
+        header = header[4:]
+
+        chunk_data = header[:length]
+        header = header[length:]
+
+        header = header[4:]  # Skip CRC
+
+        chunks.append({
+            'type': chunk_type,
+            'length': length,
+            'data': chunk_data
+        })
+
+    ihdr = chunks[0]['data']
+
+    width = unpack_integer(ihdr[:4])
+    height = unpack_integer(ihdr[4:8])
+
+    idat = b''
+
+    for chunk in chunks:
+        if chunk['type'] == b'IDAT':
+            idat += chunk['data']
+
+    if not idat:
+        raise IOError('Unable to read PNG data.')
+
+    decompressed_data = bytearray(zlib.decompress(idat))
+
+    stride = width * 3
+    pixels = []
+
+    def _get_pixel(idx):
+        x = idx % stride
+        y = idx // stride
+        return pixels[y][x]
+
+    for y in range(height):
+        basePos = y * (1 + stride)
+        filter_type = decompressed_data[basePos]
+
+        current_row = []
+
+        pixels.append(current_row)
+
+        for x in range(stride):
+            color = decompressed_data[1 + basePos + x]
+            basex = y * stride + x
+            left = 0
+            up = 0
+
+            if x > 2:
+                left = _get_pixel(basex - 3)
+            if y > 0:
+                up = _get_pixel(basex - stride)
+
+            if filter_type == 1:  # Sub
+                color = (color + left) & 0xff
+            elif filter_type == 2:  # Up
+                color = (color + up) & 0xff
+            elif filter_type == 3:  # Average
+                color = (color + ((left + up) >> 1)) & 0xff
+            elif filter_type == 4:  # Paeth
+                a = left
+                b = up
+                c = 0
+
+                if x > 2 and y > 0:
+                    c = _get_pixel(basex - stride - 3)
+
+                p = a + b - c
+
+                pa = abs(p - a)
+                pb = abs(p - b)
+                pc = abs(p - c)
+
+                if pa <= pb and pa <= pc:
+                    color = (color + a) & 0xff
+                elif pb <= pc:
+                    color = (color + b) & 0xff
+                else:
+                    color = (color + c) & 0xff
+
+            current_row.append(color)
+
+    return width, height, pixels
+
+
+def write_xattr(path, key, value):
+    # This mess below finds the best xattr tool for the job
+    try:
+        # try the pyxattr module...
+        import xattr
+
+        if hasattr(xattr, 'set'):  # pyxattr
+            # Unicode arguments are not supported in python-pyxattr until
+            # version 0.5.0
+            # See https://github.com/rg3/youtube-dl/issues/5498
+            pyxattr_required_version = '0.5.0'
+            if version_tuple(xattr.__version__) < version_tuple(pyxattr_required_version):
+                # TODO: fallback to CLI tools
+                raise XAttrUnavailableError(
+                    'python-pyxattr is detected but is too old. '
+                    'youtube-dl requires %s or above while your version is %s. '
+                    'Falling back to other xattr implementations' % (
+                        pyxattr_required_version, xattr.__version__))
+
+            setxattr = xattr.set
+        else:  # xattr
+            setxattr = xattr.setxattr
+
+        try:
+            setxattr(path, key, value)
+        except EnvironmentError as e:
+            raise XAttrMetadataError(e.errno, e.strerror)
+
+    except ImportError:
+        if compat_os_name == 'nt':
+            # Write xattrs to NTFS Alternate Data Streams:
+            # http://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.28ADS.29
+            assert ':' not in key
+            assert os.path.exists(path)
+
+            ads_fn = path + ':' + key
+            try:
+                with open(ads_fn, 'wb') as f:
+                    f.write(value)
+            except EnvironmentError as e:
+                raise XAttrMetadataError(e.errno, e.strerror)
+        else:
+            user_has_setfattr = check_executable('setfattr', ['--version'])
+            user_has_xattr = check_executable('xattr', ['-h'])
+
+            if user_has_setfattr or user_has_xattr:
+
+                value = value.decode('utf-8')
+                if user_has_setfattr:
+                    executable = 'setfattr'
+                    opts = ['-n', key, '-v', value]
+                elif user_has_xattr:
+                    executable = 'xattr'
+                    opts = ['-w', key, value]
+
+                cmd = ([encodeFilename(executable, True)] +
+                       [encodeArgument(o) for o in opts] +
+                       [encodeFilename(path, True)])
+
+                try:
+                    p = subprocess.Popen(
+                        cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
+                except EnvironmentError as e:
+                    raise XAttrMetadataError(e.errno, e.strerror)
+                stdout, stderr = p.communicate()
+                stderr = stderr.decode('utf-8', 'replace')
+                if p.returncode != 0:
+                    raise XAttrMetadataError(p.returncode, stderr)
+
+            else:
+                # On Unix, and can't find pyxattr, setfattr, or xattr.
+                if sys.platform.startswith('linux'):
+                    raise XAttrUnavailableError(
+                        "Couldn't find a tool to set the xattrs. "
+                        "Install either the python 'pyxattr' or 'xattr' "
+                        "modules, or the GNU 'attr' package "
+                        "(which contains the 'setfattr' tool).")
+                else:
+                    raise XAttrUnavailableError(
+                        "Couldn't find a tool to set the xattrs. "
+                        "Install either the python 'xattr' module, "
+                        "or the 'xattr' binary.")
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index 5daa7f4e8bce9914009d3284c7540dfa54a6d32e..ef9ccc08abde682dab3077c54a66f36ed1e6c958 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
  from __future__ import unicode_literals
  
-__version__ = '2016.03.27'
+__version__ = '2016.11.18'
author	Yen Chi Hsuan <yan12125@gmail.com>
	Sat, 19 Nov 2016 16:05:11 +0000 (00:05 +0800)
committer	Yen Chi Hsuan <yan12125@gmail.com>
	Sat, 19 Nov 2016 16:05:11 +0000 (00:05 +0800)