Muratcan Simsek
Evan Lu
flatgreen
+Brian Foley
+Vignesh Venkat
+Tom Gijselinck
+Founder Fang
+Andrew Alexeyew
preference, for example: "srt" or
"ass/srt/best"
--sub-lang LANGS Languages of the subtitles to download
- (optional) separated by commas, use IETF
- language tags like 'en,pt'
+ (optional) separated by commas, use --list-
+ subs for available language tags
## Authentication Options:
-u, --username USERNAME Login with this account ID
# FORMAT SELECTION
-By default youtube-dl tries to download the best quality, but sometimes you may want to download in a different format.
-The simplest case is requesting a specific format, for example `-f 22`. You can get the list of available formats using `--list-formats`, you can also use a file extension (currently it supports aac, m4a, mp3, mp4, ogg, wav, webm) or the special names `best`, `bestvideo`, `bestaudio` and `worst`.
+By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**.
-If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes, as in `-f 22/17/18`. You can also filter the video results by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`). This works for filesize, height, width, tbr, abr, vbr, asr, and fps and the comparisons <, <=, >, >=, =, != and for ext, acodec, vcodec, container, and protocol and the comparisons =, != . Formats for which the value is not known are excluded unless you put a question mark (?) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Use commas to download multiple formats, such as `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv), for example `-f bestvideo+bestaudio`. Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
+But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
-Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some dash formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
+The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
+
+The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
+
+You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download best quality format of particular file extension served as a single file, e.g. `-f webm` will download best quality format with `webm` extension served as a single file.
+
+You can also use special names to select particular edge case format:
+ - `best`: Select best quality format represented by single file with video and audio
+ - `worst`: Select worst quality format represented by single file with video and audio
+ - `bestvideo`: Select best quality video only format (e.g. DASH video), may not be available
+ - `worstvideo`: Select worst quality video only format, may not be available
+ - `bestaudio`: Select best quality audio only format, may not be available
+ - `worstaudio`: Select worst quality audio only format, may not be available
+
+For example, to download worst quality video only format you can use `-f worstvideo`.
+
+If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
+
+If you want to download several formats of the same video use comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or more sophisticated example combined with precedence feature `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
+
+You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
+
+The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `>=`, `=` (equals), `!=` (not equals):
+ - `filesize`: The number of bytes, if known in advance
+ - `width`: Width of the video, if known
+ - `height`: Height of the video, if known
+ - `tbr`: Average bitrate of audio and video in KBit/s
+ - `abr`: Average audio bitrate in KBit/s
+ - `vbr`: Average video bitrate in KBit/s
+ - `asr`: Audio sampling rate in Hertz
+ - `fps`: Frame rate
+
+Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begins with), `$=` (ends with), `*=` (contains) and following string meta fields:
+ - `ext`: File extension
+ - `acodec`: Name of the audio codec in use
+ - `vcodec`: Name of the video codec in use
+ - `container`: Name of the container format
+ - `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
+
+Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
+
+Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
+
+You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download best video only format, best audio only format and mux them together with ffmpeg/avconv.
+
+Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
+
+Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
+Examples (note on Windows you may need to use double quotes instead of single):
+```bash
+# Download best mp4 format available or any other best if no mp4 available
+$ youtube-dl -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best'
+
+# Download best format available but not better that 480p
+$ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]'
+
+# Download best video only format but no bigger that 50 MB
+$ youtube-dl -f 'best[filesize<50M]'
+
+# Download best format available via direct link over HTTP/HTTPS protocol
+$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
+```
+
+
# VIDEO SELECTION
Videos can be filtered by their upload date using the options `--date`, `--datebefore` or `--dateafter`. They accept dates in two formats:
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
-Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly.
+Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
### Can you add support for this anime video site, or site which shows current movies for free?
import datetime
import glob
-import io # For Python 2 compatibilty
+import io # For Python 2 compatibility
import os
import re
# Supported sites
- **1tv**: Первый канал
- **1up.com**
+ - **20min**
- **220.ro**
- **22tracks:genre**
- **22tracks:track**
- **AdobeTVShow**
- **AdobeTVVideo**
- **AdultSwim**
+ - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **Aftonbladet**
- **AirMozilla**
- **AlJazeera**
- **ARD:mediathek**
- **arte.tv**
- **arte.tv:+7**
+ - **arte.tv:cinema**
- **arte.tv:concert**
- **arte.tv:creative**
- **arte.tv:ddc**
- **Beeg**
- **BehindKink**
- **Bet**
+ - **Bigflix**
- **Bild**: Bild.de
- **BiliBili**
- **BleacherReport**
- **CamdemyFolder**
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
+ - **Canvas**
- **CBS**
- **CBSNews**: CBS News
- **CBSSports**
- **CSpan**: C-SPAN
- **CtsNews**: 華視新聞
- **culturebox.francetvinfo.fr**
+ - **CultureUnplugged**
+ - **CWTV**
- **dailymotion**
- **dailymotion:playlist**
- **dailymotion:user**
- **defense.gouv.fr**
- **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum
+ - **Digiteka**
- **Discovery**
- **Dotsub**
- **DouyuTV**: 斗鱼
- **Helsinki**: helsinki.fi
- **HentaiStigma**
- **HistoricFilms**
- - **History**
- **hitbox**
- **hitbox:live**
- **HornBunny**
- **Instagram**
- **instagram:user**: Instagram user profile
- **InternetVideoArchive**
- - **IPrima**
+ - **IPrima** (Currently broken)
- **iqiyi**: 爱奇艺
- **Ir90Tv**
- **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations
+ - **ivideon**: Ivideon TV
- **Izlesene**
- **JadoreCettePub**
- **JeuxVideo**
- **la7.tv**
- **Laola1Tv**
- **Lecture2Go**
+ - **Lemonde**
- **Letv**: 乐视网
+ - **LetvCloud**: 乐视云
- **LetvPlaylist**
- **LetvTv**
- **Libsyn**
- **livestream**
- **livestream:original**
- **LnkGo**
+ - **LoveHomePorn**
- **lrt.lt**
- **lynda**: lynda.com videos
- **lynda:course**: lynda.com online courses
- **nowness**
- **nowness:playlist**
- **nowness:series**
- - **NowTV**
+ - **NowTV** (Currently broken)
- **NowTVList**
- **nowvideo**: NowVideo
- **npo**: npo.nl and ntr.nl
- **npo.nl:live**
- **npo.nl:radio**
- **npo.nl:radio:fragment**
+ - **Npr**
- **NRK**
- **NRKPlaylist**
- **NRKTV**: NRK TV and NRK Radio
- **RegioTV**
- **Restudy**
- **ReverbNation**
+ - **Revision3**
- **RingTV**
- **RottenTomatoes**
- **Roxwel**
- **RTBF**
- - **Rte**
+ - **rte**: Raidió Teilifís Éireann TV
+ - **rte:radio**: Raidió Teilifís Éireann radio
- **rtl.nl**: rtl.nl and rtlxl.nl
- **RTL2**
- **RTP**
- **rtve.es:live**: RTVE.es live streams
- **RTVNH**
- **RUHD**
+ - **RulePorn**
- **rutube**: Rutube videos
- **rutube:channel**: Rutube channels
- **rutube:embed**: Rutube embedded videos
- **TeleMB**
- **TeleTask**
- **TenPlay**
- - **TestTube**
- **TF1**
- **TheIntercept**
- **TheOnion**
- **ToypicsUser**: Toypics user profile
- **TrailerAddict** (Currently broken)
- **Trilulilu**
+ - **trollvids**
- **TruTube**
- **Tube8**
- **TubiTv**
- - **Tudou**
+ - **tudou**
+ - **tudou:album**
+ - **tudou:playlist**
- **Tumblr**
- **tunein:clip**
- **tunein:program**
- **udemy**
- **udemy:course**
- **UDNEmbed**: 聯合影音
- - **Ultimedia**
- **Unistra**
- **Urort**: NRK P3 Urørt
- **ustream**
- **video.mit.edu**
- **VideoDetective**
- **videofy.me**
- - **VideoMega**
+ - **VideoMega** (Currently broken)
- **videomore**
- **videomore:season**
- **videomore:video**
- **VideoPremium**
- - **VideoTt**: video.tt - Your True Tube
+ - **VideoTt**: video.tt - Your True Tube (Currently broken)
- **videoweed**: VideoWeed
- **Vidme**
- **Vidzi**
- **WebOfStories**
- **WebOfStoriesPlaylist**
- **Weibo**
+ - **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud
- **Wimp**
- **Wistia**
- **ZDFChannel**
- **zingmp3:album**: mp3.zing.vn albums
- **zingmp3:song**: mp3.zing.vn songs
+ - **ZippCast**
from test.helper import FakeYDL, assertRegexpMatches
from youtube_dl import YoutubeDL
-from youtube_dl.compat import compat_str
+from youtube_dl.compat import compat_str, compat_urllib_error
from youtube_dl.extractor import YoutubeIE
from youtube_dl.postprocessor.common import PostProcessor
from youtube_dl.utils import ExtractorError, match_filter_func
result = get_ids({'playlist_items': '10'})
self.assertEqual(result, [])
+ def test_urlopen_no_file_protocol(self):
+ # see https://github.com/rg3/youtube-dl/issues/8227
+ ydl = YDL()
+ self.assertRaises(compat_urllib_error.URLError, ydl.urlopen, 'file:///etc/passwd')
+
if __name__ == '__main__':
unittest.main()
--- /dev/null
+#!/usr/bin/env python
+
+from __future__ import unicode_literals
+
+# Allow direct execution
+import os
+import sys
+import unittest
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+
+import json
+from youtube_dl.update import rsa_verify
+
+
+class TestUpdate(unittest.TestCase):
+ def test_rsa_verify(self):
+ UPDATES_RSA_KEY = (0x9d60ee4d8f805312fdb15a62f87b95bd66177b91df176765d13514a0f1754bcd2057295c5b6f1d35daa6742c3ffc9a82d3e118861c207995a8031e151d863c9927e304576bc80692bc8e094896fcf11b66f3e29e04e3a71e9a11558558acea1840aec37fc396fb6b65dc81a1c4144e03bd1c011de62e3f1357b327d08426fe93, 65537)
+ with open(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'versions.json'), 'rb') as f:
+ versions_info = f.read().decode()
+ versions_info = json.loads(versions_info)
+ signature = versions_info['signature']
+ del versions_info['signature']
+ self.assertTrue(rsa_verify(
+ json.dumps(versions_info, sort_keys=True).encode('utf-8'),
+ signature, UPDATES_RSA_KEY))
+
+
+if __name__ == '__main__':
+ unittest.main()
textTag = a.find('TEXT')
text = textTag.text
self.assertTrue(text in expected) # assertIn only added in python 2.7
- # remove the first occurance, there could be more than one annotation with the same text
+ # remove the first occurrence, there could be more than one annotation with the same text
expected.remove(text)
# We should have seen (and removed) all the expected annotation texts.
self.assertEqual(len(expected), 0, 'Not all expected annotations were found.')
--- /dev/null
+{
+ "latest": "2013.01.06",
+ "signature": "72158cdba391628569ffdbea259afbcf279bbe3d8aeb7492690735dc1cfa6afa754f55c61196f3871d429599ab22f2667f1fec98865527b32632e7f4b3675a7ef0f0fbe084d359256ae4bba68f0d33854e531a70754712f244be71d4b92e664302aa99653ee4df19800d955b6c4149cd2b3f24288d6e4b40b16126e01f4c8ce6",
+ "versions": {
+ "2013.01.02": {
+ "bin": [
+ "http://youtube-dl.org/downloads/2013.01.02/youtube-dl",
+ "f5b502f8aaa77675c4884938b1e4871ebca2611813a0c0e74f60c0fbd6dcca6b"
+ ],
+ "exe": [
+ "http://youtube-dl.org/downloads/2013.01.02/youtube-dl.exe",
+ "75fa89d2ce297d102ff27675aa9d92545bbc91013f52ec52868c069f4f9f0422"
+ ],
+ "tar": [
+ "http://youtube-dl.org/downloads/2013.01.02/youtube-dl-2013.01.02.tar.gz",
+ "6a66d022ac8e1c13da284036288a133ec8dba003b7bd3a5179d0c0daca8c8196"
+ ]
+ },
+ "2013.01.06": {
+ "bin": [
+ "http://youtube-dl.org/downloads/2013.01.06/youtube-dl",
+ "64b6ed8865735c6302e836d4d832577321b4519aa02640dc508580c1ee824049"
+ ],
+ "exe": [
+ "http://youtube-dl.org/downloads/2013.01.06/youtube-dl.exe",
+ "58609baf91e4389d36e3ba586e21dab882daaaee537e4448b1265392ae86ff84"
+ ],
+ "tar": [
+ "http://youtube-dl.org/downloads/2013.01.06/youtube-dl-2013.01.06.tar.gz",
+ "fe77ab20a95d980ed17a659aa67e371fdd4d656d19c4c7950e7b720b0c2f1a86"
+ ]
+ }
+ }
+}
\ No newline at end of file
DateRange,
DEFAULT_OUTTMPL,
determine_ext,
+ determine_protocol,
DownloadError,
encode_compat_str,
encodeFilename,
STR_OPERATORS = {
'=': operator.eq,
'!=': operator.ne,
+ '^=': lambda attr, value: attr.startswith(value),
+ '$=': lambda attr, value: attr.endswith(value),
+ '*=': lambda attr, value: value in attr,
}
str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol)
except (ValueError, OverflowError, OSError):
pass
+ # Auto generate title fields corresponding to the *_number fields when missing
+ # in order to always have clean titles. This is very common for TV series.
+ for field in ('chapter', 'season', 'episode'):
+ if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
+ info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
+
subtitles = info_dict.get('subtitles')
if subtitles:
for _, subtitle in subtitles.items():
# Automatically determine file extension if missing
if 'ext' not in format:
format['ext'] = determine_ext(format['url']).lower()
+ # Automatically determine protocol if missing (useful for format
+ # selection purposes)
+ if 'protocol' not in format:
+ format['protocol'] = determine_protocol(format)
# Add HTTP headers, so that external programs can use them from the
# json output
full_format_info = info_dict.copy()
# only set the 'formats' fields if the original info_dict list them
# otherwise we end up with a circular reference, the first (and unique)
# element in the 'formats' field in info_dict is info_dict itself,
- # wich can't be exported to json
+ # which can't be exported to json
info_dict['formats'] = formats
if self.params.get('listformats'):
self.list_formats(info_dict)
https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
data_handler = compat_urllib_request_DataHandler()
+
+ # When passing our own FileHandler instance, build_opener won't add the
+ # default FileHandler and allows us to disable the file protocol, which
+ # can be used for malicious purposes (see
+ # https://github.com/rg3/youtube-dl/issues/8227)
+ file_handler = compat_urllib_request.FileHandler()
+
+ def file_open(*args, **kwargs):
+ raise compat_urllib_error.URLError('file:// scheme is explicitly disabled in youtube-dl for security reasons')
+ file_handler.file_open = file_open
+
opener = compat_urllib_request.build_opener(
- proxy_handler, https_handler, cookie_processor, ydlh, data_handler)
+ proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler)
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
else:
compat_getpass = getpass.getpass
-# Old 2.6 and 2.7 releases require kwargs to be bytes
+# Python < 2.6.5 require kwargs to be bytes
try:
def _testfunc(x):
pass
def report_retry(self, count, retries):
"""Report retry in case of HTTP error 5xx"""
- self.to_screen('[download] Got server HTTP error. Retrying (attempt %d of %d)...' % (count, retries))
+ self.to_screen('[download] Got server HTTP error. Retrying (attempt %d of %.0f)...' % (count, retries))
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""
'filename': ctx['filename'],
'tmpfilename': ctx['tmpfilename'],
}
+
start = time.time()
- ctx['started'] = start
+ ctx.update({
+ 'started': start,
+ # Total complete fragments downloaded so far in bytes
+ 'complete_frags_downloaded_bytes': 0,
+ # Amount of fragment's bytes downloaded by the time of the previous
+ # frag progress hook invocation
+ 'prev_frag_downloaded_bytes': 0,
+ })
def frag_progress_hook(s):
if s['status'] not in ('downloading', 'finished'):
return
- frag_total_bytes = s.get('total_bytes', 0)
- if s['status'] == 'finished':
- state['downloaded_bytes'] += frag_total_bytes
- state['frag_index'] += 1
+ frag_total_bytes = s.get('total_bytes') or 0
estimated_size = (
- (state['downloaded_bytes'] + frag_total_bytes) /
+ (ctx['complete_frags_downloaded_bytes'] + frag_total_bytes) /
(state['frag_index'] + 1) * total_frags)
time_now = time.time()
state['total_bytes_estimate'] = estimated_size
state['elapsed'] = time_now - start
if s['status'] == 'finished':
- progress = self.calc_percent(state['frag_index'], total_frags)
+ state['frag_index'] += 1
+ state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes']
+ ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes']
+ ctx['prev_frag_downloaded_bytes'] = 0
else:
frag_downloaded_bytes = s['downloaded_bytes']
- frag_progress = self.calc_percent(frag_downloaded_bytes,
- frag_total_bytes)
- progress = self.calc_percent(state['frag_index'], total_frags)
- progress += frag_progress / float(total_frags)
-
+ state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
state['eta'] = self.calc_eta(
- start, time_now, estimated_size, state['downloaded_bytes'] + frag_downloaded_bytes)
+ start, time_now, estimated_size,
+ state['downloaded_bytes'])
state['speed'] = s.get('speed')
+ ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
self._hook_progress(state)
ctx['dl'].add_progress_hook(frag_progress_hook)
self._debug_cmd(args)
- retval = subprocess.call(args)
+ proc = subprocess.Popen(args, stdin=subprocess.PIPE)
+ try:
+ retval = proc.wait()
+ except KeyboardInterrupt:
+ # subprocces.run would send the SIGKILL signal to ffmpeg and the
+ # mp4 file couldn't be played, but if we ask ffmpeg to quit it
+ # produces a file that is playable (this is mostly useful for live
+ # streams)
+ proc.communicate(b'q')
+ raise
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('\r[%s] %s bytes' % (args[0], fsize))
AdobeTVVideoIE,
)
from .adultswim import AdultSwimIE
+from .aenetworks import AENetworksIE
from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVFutureIE,
+ ArteTVCinemaIE,
ArteTVDDCIE,
ArteTVEmbedIE,
)
from .behindkink import BehindKinkIE
from .beatportpro import BeatportProIE
from .bet import BetIE
+from .bigflix import BigflixIE
from .bild import BildIE
from .bilibili import BiliBiliIE
from .bleacherreport import (
)
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
+from .canvas import CanvasIE
from .cbs import CBSIE
from .cbsnews import CBSNewsIE
from .cbssports import CBSSportsIE
)
from .cspan import CSpanIE
from .ctsnews import CtsNewsIE
+from .cultureunplugged import CultureUnpluggedIE
+from .cwtv import CWTVIE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .historicfilms import HistoricFilmsIE
-from .history import HistoryIE
from .hitbox import HitboxIE, HitboxLiveIE
from .hornbunny import HornBunnyIE
from .hotnewhiphop import HotNewHipHopIE
IviIE,
IviCompilationIE
)
+from .ivideon import IvideonIE
from .izlesene import IzleseneIE
from .jadorecettepub import JadoreCettePubIE
from .jeuxvideo import JeuxVideoIE
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .lecture2go import Lecture2GoIE
+from .lemonde import LemondeIE
from .letv import (
LetvIE,
LetvTvIE,
- LetvPlaylistIE
+ LetvPlaylistIE,
+ LetvCloudIE,
)
from .libsyn import LibsynIE
from .lifenews import (
LivestreamShortenerIE,
)
from .lnkgo import LnkGoIE
+from .lovehomeporn import LoveHomePornIE
from .lrt import LRTIE
from .lynda import (
LyndaIE,
VPROIE,
WNLIE
)
+from .npr import NprIE
from .nrk import (
NRKIE,
NRKPlaylistIE,
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtbf import RTBFIE
-from .rte import RteIE
+from .rte import RteIE, RteRadioIE
from .rtlnl import RtlNlIE
from .rtl2 import RTL2IE
from .rtp import RTPIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
from .rtvnh import RTVNHIE
from .ruhd import RUHDIE
+from .ruleporn import RulePornIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
+from .trollvids import TrollvidsIE
from .trutube import TruTubeIE
from .tube8 import Tube8IE
from .tubitv import TubiTvIE
-from .tudou import TudouIE
+from .tudou import (
+ TudouIE,
+ TudouPlaylistIE,
+ TudouAlbumIE,
+)
from .tumblr import TumblrIE
from .tunein import (
TuneInClipIE,
from .tvplay import TVPlayIE
from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE
+from .twentymin import TwentyMinutenIE
from .twentytwotracks import (
TwentyTwoTracksIE,
TwentyTwoTracksGenreIE
UdemyCourseIE
)
from .udn import UDNEmbedIE
-from .ultimedia import UltimediaIE
+from .digiteka import DigitekaIE
from .unistra import UnistraIE
from .urort import UrortIE
from .ustream import UstreamIE, UstreamChannelIE
WebOfStoriesPlaylistIE,
)
from .weibo import WeiboIE
+from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
ZingMp3SongIE,
ZingMp3AlbumIE,
)
+from .zippcast import ZippCastIE
_ALL_CLASSES = [
klass
media_url = file_el.text
if determine_ext(media_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
- media_url, segment_title, 'mp4', preference=0, m3u8_id='hls'))
+ media_url, segment_title, 'mp4', preference=0,
+ m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': '%s_%s' % (bitrate, ftype),
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import smuggle_url
+
+
+class AENetworksIE(InfoExtractor):
+ IE_NAME = 'aenetworks'
+ IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
+ _VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
+
+ _TESTS = [{
+ 'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
+ 'info_dict': {
+ 'id': 'g12m5Gyt3fdR',
+ 'ext': 'mp4',
+ 'title': "Bet You Didn't Know: Valentine's Day",
+ 'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
+ },
+ 'params': {
+ # m3u8 download
+ 'skip_download': True,
+ },
+ 'add_ie': ['ThePlatform'],
+ 'expected_warnings': ['JSON-LD'],
+ }, {
+ 'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
+ 'info_dict': {
+ 'id': 'eg47EERs_JsZ',
+ 'ext': 'mp4',
+ 'title': "Winter Is Coming",
+ 'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
+ },
+ 'params': {
+ # m3u8 download
+ 'skip_download': True,
+ },
+ 'add_ie': ['ThePlatform'],
+ }, {
+ 'url': 'http://www.aetv.com/shows/duck-dynasty/video/inlawful-entry',
+ 'only_matching': True
+ }, {
+ 'url': 'http://www.fyi.tv/shows/tiny-house-nation/videos/207-sq-ft-minnesota-prairie-cottage',
+ 'only_matching': True
+ }, {
+ 'url': 'http://www.mylifetime.com/shows/project-runway-junior/video/season-1/episode-6/superstar-clients',
+ 'only_matching': True
+ }]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, video_id)
+
+ video_url_re = [
+ r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
+ r"media_url\s*=\s*'([^']+)'"
+ ]
+ video_url = self._search_regex(video_url_re, webpage, 'video url')
+
+ info = self._search_json_ld(webpage, video_id, fatal=False)
+ info.update({
+ '_type': 'url_transparent',
+ 'url': smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}),
+ })
+ return info
'thumbnails': thumbnails,
'timestamp': parse_iso8601(item.get('pubDate'), ' '),
'duration': int_or_none(media_content[0].get('@attributes', {}).get('duration')),
+ 'subtitles': subtitles,
'formats': formats,
}
from __future__ import unicode_literals
-import re
+from .nuevo import NuevoBaseIE
-from .common import InfoExtractor
-
-class AnitubeIE(InfoExtractor):
+class AnitubeIE(NuevoBaseIE):
IE_NAME = 'anitube.se'
_VALID_URL = r'https?://(?:www\.)?anitube\.se/video/(?P<id>\d+)'
}
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- video_id = mobj.group('id')
+ video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
key = self._search_regex(
r'src=["\']https?://[^/]+/embed/([A-Za-z0-9_-]+)', webpage, 'key')
- config_xml = self._download_xml(
- 'http://www.anitube.se/nuevo/econfig.php?key=%s' % key, key)
-
- video_title = config_xml.find('title').text
- thumbnail = config_xml.find('image').text
- duration = float(config_xml.find('duration').text)
-
- formats = []
- video_url = config_xml.find('file')
- if video_url is not None:
- formats.append({
- 'format_id': 'sd',
- 'url': video_url.text,
- })
- video_url = config_xml.find('filehd')
- if video_url is not None:
- formats.append({
- 'format_id': 'hd',
- 'url': video_url.text,
- })
-
- return {
- 'id': video_id,
- 'title': video_title,
- 'thumbnail': thumbnail,
- 'duration': duration,
- 'formats': formats
- }
+ return self._extract_nuevo(
+ 'http://www.anitube.se/nuevo/econfig.php?key=%s' % key, video_id)
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future'
- _VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(thema|sujet)/.*?#article-anchor-(?P<id>\d+)'
+ _VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(?P<id>.+)'
- _TEST = {
- 'url': 'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
+ _TESTS = [{
+ 'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
'info_dict': {
- 'id': '5201',
+ 'id': '050940-028-A',
'ext': 'mp4',
- 'title': 'Les champignons au secours de la planète',
- 'upload_date': '20131101',
+ 'title': 'Les écrevisses aussi peuvent être anxieuses',
},
- }
-
- def _real_extract(self, url):
- anchor_id, lang = self._extract_url_info(url)
- webpage = self._download_webpage(url, anchor_id)
- row = self._search_regex(
- r'(?s)id="%s"[^>]*>.+?(<div[^>]*arte_vp_url[^>]*>)' % anchor_id,
- webpage, 'row')
- return self._extract_from_webpage(row, anchor_id, lang)
+ }, {
+ 'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
+ 'only_matching': True,
+ }]
class ArteTVDDCIE(ArteTVPlus7IE):
}
+class ArteTVCinemaIE(ArteTVPlus7IE):
+ IE_NAME = 'arte.tv:cinema'
+ _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
+
+ _TEST = {
+ 'url': 'http://cinema.arte.tv/de/node/38291',
+ 'md5': '6b275511a5107c60bacbeeda368c3aa1',
+ 'info_dict': {
+ 'id': '055876-000_PWA12025-D',
+ 'ext': 'mp4',
+ 'title': 'Tod auf dem Nil',
+ 'upload_date': '20160122',
+ 'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
+ },
+ }
+
+
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
})
formats.append(format_info)
- m3u8_url = player.get('urlVideoHls')
- if m3u8_url:
- formats.extend(self._extract_m3u8_formats(
- m3u8_url, episode_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
-
timestamp = int_or_none(self._download_webpage(
self._TIME_API_URL,
video_id, 'Downloading timestamp', fatal=False), 1000, time.time())
},
'skip': 'Episode is no longer available on BBC iPlayer Radio',
}, {
- 'url': 'http://www.bbc.co.uk/music/clips/p02frcc3',
+ 'url': 'http://www.bbc.co.uk/music/clips/p022h44b',
'note': 'Audio',
'info_dict': {
- 'id': 'p02frcch',
+ 'id': 'p022h44j',
'ext': 'flv',
- 'title': 'Pete Tong, Past, Present and Future Special, Madeon - After Hours mix',
- 'description': 'French house superstar Madeon takes us out of the club and onto the after party.',
- 'duration': 3507,
+ 'title': 'BBC Proms Music Guides, Rachmaninov: Symphonic Dances',
+ 'description': "In this Proms Music Guide, Andrew McGregor looks at Rachmaninov's Symphonic Dances.",
+ 'duration': 227,
},
'params': {
# rtmp download
}, {
# iptv-all mediaset fails with geolocation however there is no geo restriction
# for this programme at all
- 'url': 'http://www.bbc.co.uk/programmes/b06bp7lf',
+ 'url': 'http://www.bbc.co.uk/programmes/b06rkn85',
'info_dict': {
- 'id': 'b06bp7kf',
+ 'id': 'b06rkms3',
'ext': 'flv',
- 'title': "Annie Mac's Friday Night, B.Traits sits in for Annie",
- 'description': 'B.Traits sits in for Annie Mac with a Mini-Mix from Disclosure.',
- 'duration': 10800,
+ 'title': "Best of the Mini-Mixes 2015: Part 3, Annie Mac's Friday Night - BBC Radio 1",
+ 'description': "Annie has part three in the Best of the Mini-Mixes 2015, plus the year's Most Played!",
},
'params': {
# rtmp download
webpage = self._download_webpage(url, playlist_id)
- timestamp = None
- playlist_title = None
- playlist_description = None
-
- ld = self._parse_json(
- self._search_regex(
- r'(?s)<script type="application/ld\+json">(.+?)</script>',
- webpage, 'ld json', default='{}'),
- playlist_id, fatal=False)
- if ld:
- timestamp = parse_iso8601(ld.get('datePublished'))
- playlist_title = ld.get('headline')
- playlist_description = ld.get('articleBody')
+ json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
+ timestamp = json_ld_info.get('timestamp')
+ playlist_title = json_ld_info.get('title')
+ playlist_description = json_ld_info.get('description')
if not timestamp:
timestamp = parse_iso8601(self._search_regex(
video_id = self._match_id(url)
video = self._download_json(
- 'http://beeg.com/api/v5/video/%s' % video_id, video_id)
+ 'https://api.beeg.com/api/v5/video/%s' % video_id, video_id)
def split(o, e):
def cut(s, x):
def decrypt_url(encrypted_url):
encrypted_url = self._proto_relative_url(
- encrypted_url.replace('{DATA_MARKERS}', ''), 'http:')
+ encrypted_url.replace('{DATA_MARKERS}', ''), 'https:')
key = self._search_regex(
r'/key=(.*?)%2Cend=', encrypted_url, 'key', default=None)
if not key:
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import base64
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_unquote
+
+
+class BigflixIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)'
+ _TESTS = [{
+ 'url': 'http://www.bigflix.com/Hindi-movies/Action-movies/Singham-Returns/16537',
+ 'md5': 'ec76aa9b1129e2e5b301a474e54fab74',
+ 'info_dict': {
+ 'id': '16537',
+ 'ext': 'mp4',
+ 'title': 'Singham Returns',
+ 'description': 'md5:3d2ba5815f14911d5cc6a501ae0cf65d',
+ }
+ }, {
+ # 2 formats
+ 'url': 'http://www.bigflix.com/Tamil-movies/Drama-movies/Madarasapatinam/16070',
+ 'info_dict': {
+ 'id': '16070',
+ 'ext': 'mp4',
+ 'title': 'Madarasapatinam',
+ 'description': 'md5:63b9b8ed79189c6f0418c26d9a3452ca',
+ 'formats': 'mincount:2',
+ },
+ 'params': {
+ 'skip_download': True,
+ }
+ }, {
+ # multiple formats
+ 'url': 'http://www.bigflix.com/Malayalam-movies/Drama-movies/Indian-Rupee/15967',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, video_id)
+
+ title = self._html_search_regex(
+ r'<div[^>]+class=["\']pagetitle["\'][^>]*>(.+?)</div>',
+ webpage, 'title')
+
+ def decode_url(quoted_b64_url):
+ return base64.b64decode(compat_urllib_parse_unquote(
+ quoted_b64_url).encode('ascii')).decode('utf-8')
+
+ formats = []
+ for height, encoded_url in re.findall(
+ r'ContentURL_(\d{3,4})[pP][^=]+=([^&]+)', webpage):
+ video_url = decode_url(encoded_url)
+ f = {
+ 'url': video_url,
+ 'format_id': '%sp' % height,
+ 'height': int(height),
+ }
+ if video_url.startswith('rtmp'):
+ f['ext'] = 'flv'
+ formats.append(f)
+
+ file_url = self._search_regex(
+ r'file=([^&]+)', webpage, 'video url', default=None)
+ if file_url:
+ video_url = decode_url(file_url)
+ if all(f['url'] != video_url for f in formats):
+ formats.append({
+ 'url': decode_url(file_url),
+ })
+
+ self._sort_formats(formats)
+
+ description = self._html_search_meta('description', webpage)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ 'formats': formats
+ }
class Canalc2IE(InfoExtractor):
IE_NAME = 'canalc2.tv'
- _VALID_URL = r'https?://(?:www\.)?canalc2\.tv/video/(?P<id>\d+)'
+ _VALID_URL = r'https?://(?:(?:www\.)?canalc2\.tv/video/|archives-canalc2\.u-strasbg\.fr/video\.asp\?.*\bidVideo=)(?P<id>\d+)'
- _TEST = {
+ _TESTS = [{
'url': 'http://www.canalc2.tv/video/12163',
'md5': '060158428b650f896c542dfbb3d6487f',
'info_dict': {
'params': {
'skip_download': True, # Requires rtmpdump
}
- }
+ }, {
+ 'url': 'http://archives-canalc2.u-strasbg.fr/video.asp?idVideo=11427&voir=oui',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
video_id = self._match_id(url)
- webpage = self._download_webpage(url, video_id)
- video_url = self._search_regex(
- r'jwplayer\((["\'])Player\1\)\.setup\({[^}]*file\s*:\s*(["\'])(?P<file>.+?)\2',
- webpage, 'video_url', group='file')
- formats = [{'url': video_url}]
- if video_url.startswith('rtmp://'):
- rtmp = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>.+/))(?P<play_path>mp4:.+)$', video_url)
- formats[0].update({
- 'url': rtmp.group('url'),
- 'ext': 'flv',
- 'app': rtmp.group('app'),
- 'play_path': rtmp.group('play_path'),
- 'page_url': url,
- })
+
+ webpage = self._download_webpage(
+ 'http://www.canalc2.tv/video/%s' % video_id, video_id)
+
+ formats = []
+ for _, video_url in re.findall(r'file\s*=\s*(["\'])(.+?)\1', webpage):
+ if video_url.startswith('rtmp://'):
+ rtmp = re.search(
+ r'^(?P<url>rtmp://[^/]+/(?P<app>.+/))(?P<play_path>mp4:.+)$', video_url)
+ formats.append({
+ 'url': rtmp.group('url'),
+ 'format_id': 'rtmp',
+ 'ext': 'flv',
+ 'app': rtmp.group('app'),
+ 'play_path': rtmp.group('play_path'),
+ 'page_url': url,
+ })
+ else:
+ formats.append({
+ 'url': video_url,
+ 'format_id': 'http',
+ })
+ self._sort_formats(formats)
title = self._html_search_regex(
r'(?s)class="[^"]*col_description[^"]*">.*?<h3>(.*?)</h3>', webpage, 'title')
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import float_or_none
+
+
+class CanvasIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+ _TEST = {
+ 'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
+ 'md5': 'ea838375a547ac787d4064d8c7860a6c',
+ 'info_dict': {
+ 'id': 'mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
+ 'display_id': 'de-afspraak-veilt-voor-de-warmste-week',
+ 'ext': 'mp4',
+ 'title': 'De afspraak veilt voor de Warmste Week',
+ 'description': 'md5:24cb860c320dc2be7358e0e5aa317ba6',
+ 'thumbnail': 're:^https?://.*\.jpg$',
+ 'duration': 49.02,
+ }
+ }
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, display_id)
+
+ title = self._search_regex(
+ r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
+ webpage, 'title', default=None) or self._og_search_title(webpage)
+
+ video_id = self._html_search_regex(
+ r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id')
+
+ data = self._download_json(
+ 'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id)
+
+ formats = []
+ for target in data['targetUrls']:
+ format_url, format_type = target.get('url'), target.get('type')
+ if not format_url or not format_type:
+ continue
+ if format_type == 'HLS':
+ formats.extend(self._extract_m3u8_formats(
+ format_url, display_id, entry_protocol='m3u8_native',
+ ext='mp4', preference=0, fatal=False, m3u8_id=format_type))
+ elif format_type == 'HDS':
+ formats.extend(self._extract_f4m_formats(
+ format_url, display_id, f4m_id=format_type, fatal=False))
+ else:
+ formats.append({
+ 'format_id': format_type,
+ 'url': format_url,
+ })
+ self._sort_formats(formats)
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'title': title,
+ 'description': self._og_search_description(webpage),
+ 'formats': formats,
+ 'duration': float_or_none(data.get('duration'), 1000),
+ 'thumbnail': data.get('posterImageUrl'),
+ }
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205,
+ 'subtitles': {
+ 'en': [{
+ 'ext': 'ttml',
+ }],
+ },
},
'params': {
# rtmp download
fmt['ext'] = 'mp4'
formats.append(fmt)
+ subtitles = {}
+ if 'mpxRefId' in video_info:
+ subtitles['en'] = [{
+ 'ext': 'ttml',
+ 'url': 'http://www.cbsnews.com/videos/captions/%s.adb_xml' % video_info['mpxRefId'],
+ }]
+
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
+ 'subtitles': subtitles,
}
fix_xml_ampersands,
float_or_none,
int_or_none,
+ parse_iso8601,
RegexNotFoundError,
sanitize_filename,
sanitized_Request,
except ExtractorError:
raise
except compat_http_client.IncompleteRead as e:
- raise ExtractorError('A network error has occured.', cause=e, expected=True)
+ raise ExtractorError('A network error has occurred.', cause=e, expected=True)
except (KeyError, StopIteration) as e:
- raise ExtractorError('An extractor error has occured.', cause=e)
+ raise ExtractorError('An extractor error has occurred.', cause=e)
def set_downloader(self, downloader):
"""Sets the downloader for this IE."""
return self._html_search_meta('twitter:player', html,
'twitter card player')
+ def _search_json_ld(self, html, video_id, **kwargs):
+ json_ld = self._search_regex(
+ r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
+ html, 'JSON-LD', group='json_ld', **kwargs)
+ if not json_ld:
+ return {}
+ return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True))
+
+ def _json_ld(self, json_ld, video_id, fatal=True):
+ if isinstance(json_ld, compat_str):
+ json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
+ if not json_ld:
+ return {}
+ info = {}
+ if json_ld.get('@context') == 'http://schema.org':
+ item_type = json_ld.get('@type')
+ if item_type == 'TVEpisode':
+ info.update({
+ 'episode': unescapeHTML(json_ld.get('name')),
+ 'episode_number': int_or_none(json_ld.get('episodeNumber')),
+ 'description': unescapeHTML(json_ld.get('description')),
+ })
+ part_of_season = json_ld.get('partOfSeason')
+ if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
+ info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
+ part_of_series = json_ld.get('partOfSeries')
+ if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
+ info['series'] = unescapeHTML(part_of_series.get('name'))
+ elif item_type == 'Article':
+ info.update({
+ 'timestamp': parse_iso8601(json_ld.get('datePublished')),
+ 'title': unescapeHTML(json_ld.get('headline')),
+ 'description': unescapeHTML(json_ld.get('articleBody')),
+ })
+ return dict((k, v) for k, v in info.items() if v is not None)
+
@staticmethod
def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
# TODO: looks like video codec is not always necessarily goes first
va_codecs = codecs.split(',')
if va_codecs[0]:
- f['vcodec'] = va_codecs[0].partition('.')[0]
+ f['vcodec'] = va_codecs[0]
if len(va_codecs) > 1 and va_codecs[1]:
- f['acodec'] = va_codecs[1].partition('.')[0]
+ f['acodec'] = va_codecs[1]
resolution = last_info.get('RESOLUTION')
if resolution:
width_str, height_str = resolution.split('x')
streamdata_req, video_id,
note='Downloading media info for %s' % video_format)
stream_info = streamdata.find('./{default}preload/stream_info')
- video_url = stream_info.find('./host').text
- video_play_path = stream_info.find('./file').text
+ video_url = xpath_text(stream_info, './host')
+ video_play_path = xpath_text(stream_info, './file')
+ if not video_url or not video_play_path:
+ continue
metadata = stream_info.find('./metadata')
format_info = {
'format': video_format,
--- /dev/null
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class CultureUnpluggedIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?cultureunplugged\.com/documentary/watch-online/play/(?P<id>\d+)(?:/(?P<display_id>[^/]+))?'
+ _TESTS = [{
+ 'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662/The-Next--Best-West',
+ 'md5': 'ac6c093b089f7d05e79934dcb3d228fc',
+ 'info_dict': {
+ 'id': '53662',
+ 'display_id': 'The-Next--Best-West',
+ 'ext': 'mp4',
+ 'title': 'The Next, Best West',
+ 'description': 'md5:0423cd00833dea1519cf014e9d0903b1',
+ 'thumbnail': 're:^https?://.*\.jpg$',
+ 'creator': 'Coldstream Creative',
+ 'duration': 2203,
+ 'view_count': int,
+ }
+ }, {
+ 'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ display_id = mobj.group('display_id') or video_id
+
+ movie_data = self._download_json(
+ 'http://www.cultureunplugged.com/movie-data/cu-%s.json' % video_id, display_id)
+
+ video_url = movie_data['url']
+ title = movie_data['title']
+
+ description = movie_data.get('synopsis')
+ creator = movie_data.get('producer')
+ duration = int_or_none(movie_data.get('duration'))
+ view_count = int_or_none(movie_data.get('views'))
+
+ thumbnails = [{
+ 'url': movie_data['%s_thumb' % size],
+ 'id': size,
+ 'preference': preference,
+ } for preference, size in enumerate((
+ 'small', 'large')) if movie_data.get('%s_thumb' % size)]
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'url': video_url,
+ 'title': title,
+ 'description': description,
+ 'creator': creator,
+ 'duration': duration,
+ 'view_count': view_count,
+ 'thumbnails': thumbnails,
+ }
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+ int_or_none,
+ parse_iso8601,
+)
+
+
+class CWTVIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/shows/(?:[^/]+/){2}\?play=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
+ _TESTS = [{
+ 'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
+ 'info_dict': {
+ 'id': '6b15e985-9345-4f60-baf8-56e96be57c63',
+ 'ext': 'mp4',
+ 'title': 'Legends of Yesterday',
+ 'description': 'Oliver and Barry Allen take Kendra Saunders and Carter Hall to a remote location to keep them hidden from Vandal Savage while they figure out how to defeat him.',
+ 'duration': 2665,
+ 'series': 'Arrow',
+ 'season_number': 4,
+ 'season': '4',
+ 'episode_number': 8,
+ 'upload_date': '20151203',
+ 'timestamp': 1449122100,
+ },
+ 'params': {
+ # m3u8 download
+ 'skip_download': True,
+ }
+ }, {
+ 'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088',
+ 'info_dict': {
+ 'id': '24282b12-ead2-42f2-95ad-26770c2c6088',
+ 'ext': 'mp4',
+ 'title': 'Jeff Davis 4',
+ 'description': 'Jeff Davis is back to make you laugh.',
+ 'duration': 1263,
+ 'series': 'Whose Line Is It Anyway?',
+ 'season_number': 11,
+ 'season': '11',
+ 'episode_number': 20,
+ 'upload_date': '20151006',
+ 'timestamp': 1444107300,
+ },
+ 'params': {
+ # m3u8 download
+ 'skip_download': True,
+ }
+ }]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ video_data = self._download_json(
+ 'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/132?format=json' % video_id, video_id)
+
+ formats = self._extract_m3u8_formats(
+ video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
+
+ thumbnails = [{
+ 'url': image['uri'],
+ 'width': image.get('width'),
+ 'height': image.get('height'),
+ } for image_id, image in video_data['images'].items() if image.get('uri')] if video_data.get('images') else None
+
+ video_metadata = video_data['assetFields']
+
+ subtitles = {
+ 'en': [{
+ 'url': video_metadata['UnicornCcUrl'],
+ }],
+ } if video_metadata.get('UnicornCcUrl') else None
+
+ return {
+ 'id': video_id,
+ 'title': video_metadata['title'],
+ 'description': video_metadata.get('description'),
+ 'duration': int_or_none(video_metadata.get('duration')),
+ 'series': video_metadata.get('seriesName'),
+ 'season_number': int_or_none(video_metadata.get('seasonNumber')),
+ 'season': video_metadata.get('seasonName'),
+ 'episode_number': int_or_none(video_metadata.get('episodeNumber')),
+ 'timestamp': parse_iso8601(video_data.get('startTime')),
+ 'thumbnails': thumbnails,
+ 'formats': formats,
+ 'subtitles': subtitles,
+ }
class DailymotionIE(DailymotionBaseInfoExtractor):
- _VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(embed|#)/)?video/(?P<id>[^/?_]+)'
+ _VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:embed|swf|#)/)?video/(?P<id>[^/?_]+)'
IE_NAME = 'dailymotion'
_FORMATS = [
{
'url': 'http://www.dailymotion.com/video/x20su5f_the-power-of-nightmares-1-the-rise-of-the-politics-of-fear-bbc-2004_news',
'only_matching': True,
+ },
+ {
+ 'url': 'http://www.dailymotion.com/swf/video/x3n92nf',
+ 'only_matching': True,
}
]
ext = determine_ext(media_url)
if type_ == 'application/x-mpegURL' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
- media_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+ media_url, video_id, 'mp4', preference=-1,
+ m3u8_id='hls', fatal=False))
elif type_ == 'application/f4m' or ext == 'f4m':
formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
else:
f = {
'url': media_url,
- 'format_id': quality,
+ 'format_id': 'http-%s' % quality,
}
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m:
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = 'dailymotion:user'
- _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
+ _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',
import base64
from .common import InfoExtractor
-from ..compat import compat_urllib_parse
+from ..compat import (
+ compat_urllib_parse,
+ compat_str,
+)
from ..utils import (
int_or_none,
parse_iso8601,
entries = []
for video in show['videos']:
+ video_id = compat_str(video['id'])
entries.append(self.url_result(
- 'http://www.dcndigital.ae/media/%s' % video['id'], 'DCNVideo'))
+ 'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo', video_id))
return self.playlist_result(entries, season_id, title)
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class DigitekaIE(InfoExtractor):
+ _VALID_URL = r'''(?x)
+ https?://(?:www\.)?(?:digiteka\.net|ultimedia\.com)/
+ (?:
+ deliver/
+ (?P<embed_type>
+ generic|
+ musique
+ )
+ (?:/[^/]+)*/
+ (?:
+ src|
+ article
+ )|
+ default/index/video
+ (?P<site_type>
+ generic|
+ music
+ )
+ /id
+ )/(?P<id>[\d+a-z]+)'''
+ _TESTS = [{
+ # news
+ 'url': 'https://www.ultimedia.com/default/index/videogeneric/id/s8uk0r',
+ 'md5': '276a0e49de58c7e85d32b057837952a2',
+ 'info_dict': {
+ 'id': 's8uk0r',
+ 'ext': 'mp4',
+ 'title': 'Loi sur la fin de vie: le texte prévoit un renforcement des directives anticipées',
+ 'thumbnail': 're:^https?://.*\.jpg',
+ 'duration': 74,
+ 'upload_date': '20150317',
+ 'timestamp': 1426604939,
+ 'uploader_id': '3fszv',
+ },
+ }, {
+ # music
+ 'url': 'https://www.ultimedia.com/default/index/videomusic/id/xvpfp8',
+ 'md5': '2ea3513813cf230605c7e2ffe7eca61c',
+ 'info_dict': {
+ 'id': 'xvpfp8',
+ 'ext': 'mp4',
+ 'title': 'Two - C\'est La Vie (clip)',
+ 'thumbnail': 're:^https?://.*\.jpg',
+ 'duration': 233,
+ 'upload_date': '20150224',
+ 'timestamp': 1424760500,
+ 'uploader_id': '3rfzk',
+ },
+ }, {
+ 'url': 'https://www.digiteka.net/deliver/generic/iframe/mdtk/01637594/src/lqm3kl/zone/1/showtitle/1/autoplay/yes',
+ 'only_matching': True,
+ }]
+
+ @staticmethod
+ def _extract_url(webpage):
+ mobj = re.search(
+ r'<(?:iframe|script)[^>]+src=["\'](?P<url>(?:https?:)?//(?:www\.)?ultimedia\.com/deliver/(?:generic|musique)(?:/[^/]+)*/(?:src|article)/[\d+a-z]+)',
+ webpage)
+ if mobj:
+ return mobj.group('url')
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ video_type = mobj.group('embed_type') or mobj.group('site_type')
+ if video_type == 'music':
+ video_type = 'musique'
+
+ deliver_info = self._download_json(
+ 'http://www.ultimedia.com/deliver/video?video=%s&topic=%s' % (video_id, video_type),
+ video_id)
+
+ yt_id = deliver_info.get('yt_id')
+ if yt_id:
+ return self.url_result(yt_id, 'Youtube')
+
+ jwconf = deliver_info['jwconf']
+
+ formats = []
+ for source in jwconf['playlist'][0]['sources']:
+ formats.append({
+ 'url': source['file'],
+ 'format_id': source.get('label'),
+ })
+
+ self._sort_formats(formats)
+
+ title = deliver_info['title']
+ thumbnail = jwconf.get('image')
+ duration = int_or_none(deliver_info.get('duration'))
+ timestamp = int_or_none(deliver_info.get('release_time'))
+ uploader_id = deliver_info.get('owner_id')
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'thumbnail': thumbnail,
+ 'duration': duration,
+ 'timestamp': timestamp,
+ 'uploader_id': uploader_id,
+ 'formats': formats,
+ }
from ..utils import (
ExtractorError,
clean_html,
+ int_or_none,
sanitized_Request,
)
class DramaFeverIE(DramaFeverBaseIE):
IE_NAME = 'dramafever'
_VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
- _TEST = {
+ _TESTS = [{
'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
'info_dict': {
'id': '4512.1',
- 'ext': 'flv',
+ 'ext': 'mp4',
'title': 'Cooking with Shin 4512.1',
'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0',
+ 'episode': 'Episode 1',
+ 'episode_number': 1,
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1404336058,
'upload_date': '20140702',
# m3u8 download
'skip_download': True,
},
- }
+ }, {
+ 'url': 'http://www.dramafever.com/drama/4826/4/Mnet_Asian_Music_Awards_2015/?ap=1',
+ 'info_dict': {
+ 'id': '4826.4',
+ 'ext': 'mp4',
+ 'title': 'Mnet Asian Music Awards 2015 4826.4',
+ 'description': 'md5:3ff2ee8fedaef86e076791c909cf2e91',
+ 'episode': 'Mnet Asian Music Awards 2015 - Part 3',
+ 'episode_number': 4,
+ 'thumbnail': 're:^https?://.*\.jpg',
+ 'timestamp': 1450213200,
+ 'upload_date': '20151215',
+ 'duration': 5602,
+ },
+ 'params': {
+ # m3u8 download
+ 'skip_download': True,
+ },
+ }]
def _real_extract(self, url):
video_id = self._match_id(url).replace('/', '.')
video_id, 'Downloading episode info JSON', fatal=False)
if episode_info:
value = episode_info.get('value')
- if value:
- subfile = value[0].get('subfile') or value[0].get('new_subfile')
- if subfile and subfile != 'http://www.dramafever.com/st/':
- info.setdefault('subtitles', {}).setdefault('English', []).append({
- 'ext': 'srt',
- 'url': subfile,
- })
+ if isinstance(value, list):
+ for v in value:
+ if v.get('type') == 'Episode':
+ subfile = v.get('subfile') or v.get('new_subfile')
+ if subfile and subfile != 'http://www.dramafever.com/st/':
+ info.setdefault('subtitles', {}).setdefault('English', []).append({
+ 'ext': 'srt',
+ 'url': subfile,
+ })
+ episode_number = int_or_none(v.get('number'))
+ episode_fallback = 'Episode'
+ if episode_number:
+ episode_fallback += ' %d' % episode_number
+ info['episode'] = v.get('title') or episode_fallback
+ info['episode_number'] = episode_number
+ break
return info
subtitles_list = asset.get('SubtitlesList')
if isinstance(subtitles_list, list):
LANGS = {
- 'Danish': 'dk',
+ 'Danish': 'da',
}
for subs in subtitles_list:
lang = subs['Language']
login_results, 'login error', default=None, group='error')
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
- self._downloader.report_warning('unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.')
+ self._downloader.report_warning('unable to log in: bad username/password, or exceeded login rate limit (~3/min). Check credentials or wait.')
return
fb_dtsg = self._search_regex(
check_response = self._download_webpage(check_req, None,
note='Confirming login')
if re.search(r'id="checkpointSubmitButton"', check_response) is not None:
- self._downloader.report_warning('Unable to confirm login, you have to login in your brower and authorize the login.')
+ self._downloader.report_warning('Unable to confirm login, you have to login in your browser and authorize the login.')
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self._downloader.report_warning('unable to log in: %s' % error_to_compat_str(err))
return
from .videomore import VideomoreIE
from .googledrive import GoogleDriveIE
from .jwplatform import JWPlatformIE
-from .ultimedia import UltimediaIE
+from .digiteka import DigitekaIE
class GenericIE(InfoExtractor):
'description': 'md5:8145d19d320ff3e52f28401f4c4283b9',
}
},
- # Embeded Ustream video
+ # Embedded Ustream video
{
'url': 'http://www.american.edu/spa/pti/nsa-privacy-janus-2014.cfm',
'md5': '27b99cdb639c9b12a79bca876a073417',
# Look for embedded Dailymotion player
matches = re.findall(
- r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/embed/video/.+?)\1', webpage)
+ r'<(?:embed|iframe)[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
if matches:
return _playlist_from_matches(
matches, lambda m: unescapeHTML(m[1]))
if myvi_url:
return self.url_result(myvi_url)
- # Look for embeded soundcloud player
+ # Look for embedded soundcloud player
mobj = re.search(
r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
webpage)
if mobj is not None:
return self.url_result(unescapeHTML(mobj.group('url')), 'ScreenwaveMedia')
- # Look for Ulltimedia embeds
- ultimedia_url = UltimediaIE._extract_url(webpage)
- if ultimedia_url:
- return self.url_result(self._proto_relative_url(ultimedia_url), 'Ultimedia')
+ # Look for Digiteka embeds
+ digiteka_url = DigitekaIE._extract_url(webpage)
+ if digiteka_url:
+ return self.url_result(self._proto_relative_url(digiteka_url), DigitekaIE.ie_key())
# Look for AdobeTVVideo embeds
mobj = re.search(
+++ /dev/null
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import smuggle_url
-
-
-class HistoryIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?history\.com/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
-
- _TESTS = [{
- 'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
- 'md5': '6fe632d033c92aa10b8d4a9be047a7c5',
- 'info_dict': {
- 'id': 'bLx5Dv5Aka1G',
- 'ext': 'mp4',
- 'title': "Bet You Didn't Know: Valentine's Day",
- 'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
- },
- 'add_ie': ['ThePlatform'],
- }]
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- webpage = self._download_webpage(url, video_id)
-
- video_url = self._search_regex(
- r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
- webpage, 'video url')
-
- return self.url_result(smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}))
cdns = player_config.get('cdns')
servers = []
for cdn in cdns:
+ # Subscribe URLs are not playable
+ if cdn.get('rtmpSubscribe') is True:
+ continue
base_url = cdn.get('netConnectionUrl')
host = re.search('.+\.([^\.]+\.[^\./]+)/.+', base_url).group(1)
if base_url not in servers:
class IPrimaIE(InfoExtractor):
+ _WORKING = False
_VALID_URL = r'https?://play\.iprima\.cz/(?:[^/]+/)*(?P<id>[^?#]+)'
_TESTS = [{
def get_enc_key(self, swf_url, video_id):
# TODO: automatic key extraction
- # last update at 2015-12-18 for Zombie::bite
- enc_key = '8b6b683780897eb8d9a48a02ccc4817d'[::-1]
+ # last update at 2016-01-22 for Zombie::bite
+ enc_key = '6ab6d0280511493ba85594779759d4ed'
return enc_key
def _real_extract(self, url):
from .common import InfoExtractor
from ..utils import (
ExtractorError,
+ int_or_none,
sanitized_Request,
)
'title': 'Иван Васильевич меняет профессию',
'description': 'md5:b924063ea1677c8fe343d8a72ac2195f',
'duration': 5498,
- 'thumbnail': 'http://thumbs.ivi.ru/f20.vcp.digitalaccess.ru/contents/d/1/c3c885163a082c29bceeb7b5a267a6.jpg',
+ 'thumbnail': 're:^https?://.*\.jpg$',
},
'skip': 'Only works from Russia',
},
- # Serial's serie
+ # Serial's series
{
'url': 'http://www.ivi.ru/watch/dvoe_iz_lartsa/9549',
'md5': '221f56b35e3ed815fde2df71032f4b3e',
'info_dict': {
'id': '9549',
'ext': 'mp4',
- 'title': 'Двое из ларца - Серия 1',
+ 'title': 'Двое из ларца - Дело Гольдберга (1 часть)',
+ 'series': 'Двое из ларца',
+ 'season': 'Сезон 1',
+ 'season_number': 1,
+ 'episode': 'Дело Гольдберга (1 часть)',
+ 'episode_number': 1,
'duration': 2655,
- 'thumbnail': 'http://thumbs.ivi.ru/f15.vcp.digitalaccess.ru/contents/8/4/0068dc0677041f3336b7c2baad8fc0.jpg',
+ 'thumbnail': 're:^https?://.*\.jpg$',
},
'skip': 'Only works from Russia',
}
]
# Sorted by quality
- _known_formats = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
-
- # Sorted by size
- _known_thumbnails = ['Thumb-120x90', 'Thumb-160', 'Thumb-640x480']
-
- def _extract_description(self, html):
- m = re.search(r'<meta name="description" content="(?P<description>[^"]+)"/>', html)
- return m.group('description') if m is not None else None
-
- def _extract_comment_count(self, html):
- m = re.search('(?s)<a href="#" id="view-comments" class="action-button dim gradient">\s*Комментарии:\s*(?P<commentcount>\d+)\s*</a>', html)
- return int(m.group('commentcount')) if m is not None else 0
+ _KNOWN_FORMATS = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
def _real_extract(self, url):
video_id = self._match_id(url)
- api_url = 'http://api.digitalaccess.ru/api/json/'
-
data = {
'method': 'da.content.get',
'params': [
]
}
- request = sanitized_Request(api_url, json.dumps(data))
-
- video_json_page = self._download_webpage(
+ request = sanitized_Request(
+ 'http://api.digitalaccess.ru/api/json/', json.dumps(data))
+ video_json = self._download_json(
request, video_id, 'Downloading video JSON')
- video_json = json.loads(video_json_page)
if 'error' in video_json:
error = video_json['error']
formats = [{
'url': x['url'],
'format_id': x['content_format'],
- 'preference': self._known_formats.index(x['content_format']),
- } for x in result['files'] if x['content_format'] in self._known_formats]
+ 'preference': self._KNOWN_FORMATS.index(x['content_format']),
+ } for x in result['files'] if x['content_format'] in self._KNOWN_FORMATS]
self._sort_formats(formats)
- if not formats:
- raise ExtractorError('No media links available for %s' % video_id)
-
- duration = result['duration']
- compilation = result['compilation']
title = result['title']
+ duration = int_or_none(result.get('duration'))
+ compilation = result.get('compilation')
+ episode = title if compilation else None
+
title = '%s - %s' % (compilation, title) if compilation is not None else title
- previews = result['preview']
- previews.sort(key=lambda fmt: self._known_thumbnails.index(fmt['content_format']))
- thumbnail = previews[-1]['url'] if len(previews) > 0 else None
+ thumbnails = [{
+ 'url': preview['url'],
+ 'id': preview.get('content_format'),
+ } for preview in result.get('preview', []) if preview.get('url')]
+
+ webpage = self._download_webpage(url, video_id)
+
+ season = self._search_regex(
+ r'<li[^>]+class="season active"[^>]*><a[^>]+>([^<]+)',
+ webpage, 'season', default=None)
+ season_number = int_or_none(self._search_regex(
+ r'<li[^>]+class="season active"[^>]*><a[^>]+data-season(?:-index)?="(\d+)"',
+ webpage, 'season number', default=None))
+
+ episode_number = int_or_none(self._search_regex(
+ r'<meta[^>]+itemprop="episode"[^>]*>\s*<meta[^>]+itemprop="episodeNumber"[^>]+content="(\d+)',
+ webpage, 'episode number', default=None))
- video_page = self._download_webpage(url, video_id, 'Downloading video page')
- description = self._extract_description(video_page)
- comment_count = self._extract_comment_count(video_page)
+ description = self._og_search_description(webpage, default=None) or self._html_search_meta(
+ 'description', webpage, 'description', default=None)
return {
'id': video_id,
'title': title,
- 'thumbnail': thumbnail,
+ 'series': compilation,
+ 'season': season,
+ 'season_number': season_number,
+ 'episode': episode,
+ 'episode_number': episode_number,
+ 'thumbnails': thumbnails,
'description': description,
'duration': duration,
- 'comment_count': comment_count,
'formats': formats,
}
}]
def _extract_entries(self, html, compilation_id):
- return [self.url_result('http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), 'Ivi')
- for serie in re.findall(r'<strong><a href="/watch/%s/(\d+)">(?:[^<]+)</a></strong>' % compilation_id, html)]
+ return [
+ self.url_result(
+ 'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
+ for serie in re.findall(
+ r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
season_id = mobj.group('seasonid')
if season_id is not None: # Season link
- season_page = self._download_webpage(url, compilation_id, 'Downloading season %s web page' % season_id)
+ season_page = self._download_webpage(
+ url, compilation_id, 'Downloading season %s web page' % season_id)
playlist_id = '%s/season%s' % (compilation_id, season_id)
playlist_title = self._html_search_meta('title', season_page, 'title')
entries = self._extract_entries(season_page, compilation_id)
compilation_page = self._download_webpage(url, compilation_id, 'Downloading compilation web page')
playlist_id = compilation_id
playlist_title = self._html_search_meta('title', compilation_page, 'title')
- seasons = re.findall(r'<a href="/watch/%s/season(\d+)">[^<]+</a>' % compilation_id, compilation_page)
- if len(seasons) == 0: # No seasons in this compilation
+ seasons = re.findall(
+ r'<a href="/watch/%s/season(\d+)' % compilation_id, compilation_page)
+ if not seasons: # No seasons in this compilation
entries = self._extract_entries(compilation_page, compilation_id)
else:
entries = []
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+ compat_urllib_parse,
+ compat_urlparse,
+)
+from ..utils import qualities
+
+
+class IvideonIE(InfoExtractor):
+ IE_NAME = 'ivideon'
+ IE_DESC = 'Ivideon TV'
+ _VALID_URL = r'https?://(?:www\.)?ivideon\.com/tv/(?:[^/]+/)*camera/(?P<id>\d+-[\da-f]+)/(?P<camera_id>\d+)'
+ _TESTS = [{
+ 'url': 'https://www.ivideon.com/tv/camera/100-916ca13b5c4ad9f564266424a026386d/0/',
+ 'info_dict': {
+ 'id': '100-916ca13b5c4ad9f564266424a026386d',
+ 'ext': 'flv',
+ 'title': 're:^Касса [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+ 'description': 'Основное предназначение - запись действий кассиров. Плюс общий вид.',
+ 'is_live': True,
+ },
+ 'params': {
+ 'skip_download': True,
+ }
+ }, {
+ 'url': 'https://www.ivideon.com/tv/camera/100-c4ee4cb9ede885cf62dfbe93d7b53783/589824/?lang=ru',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.ivideon.com/tv/map/22.917923/-31.816406/16/camera/100-e7bc16c7d4b5bbd633fd5350b66dfa9a/0',
+ 'only_matching': True,
+ }]
+
+ _QUALITIES = ('low', 'mid', 'hi')
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ server_id, camera_id = mobj.group('id'), mobj.group('camera_id')
+ camera_name, description = None, None
+ camera_url = compat_urlparse.urljoin(
+ url, '/tv/camera/%s/%s/' % (server_id, camera_id))
+
+ webpage = self._download_webpage(camera_url, server_id, fatal=False)
+ if webpage:
+ config_string = self._search_regex(
+ r'var\s+config\s*=\s*({.+?});', webpage, 'config', default=None)
+ if config_string:
+ config = self._parse_json(config_string, server_id, fatal=False)
+ camera_info = config.get('ivTvAppOptions', {}).get('currentCameraInfo')
+ if camera_info:
+ camera_name = camera_info.get('camera_name')
+ description = camera_info.get('misc', {}).get('description')
+ if not camera_name:
+ camera_name = self._html_search_meta(
+ 'name', webpage, 'camera name', default=None) or self._search_regex(
+ r'<h1[^>]+class="b-video-title"[^>]*>([^<]+)', webpage, 'camera name', default=None)
+
+ quality = qualities(self._QUALITIES)
+
+ formats = [{
+ 'url': 'https://streaming.ivideon.com/flv/live?%s' % compat_urllib_parse.urlencode({
+ 'server': server_id,
+ 'camera': camera_id,
+ 'sessionId': 'demo',
+ 'q': quality(format_id),
+ }),
+ 'format_id': format_id,
+ 'ext': 'flv',
+ 'quality': quality(format_id),
+ } for format_id in self._QUALITIES]
+ self._sort_formats(formats)
+
+ return {
+ 'id': server_id,
+ 'title': self._live_title(camera_name or server_id),
+ 'description': description,
+ 'is_live': True,
+ 'formats': formats,
+ }
subs = self._download_json(
'http://www.kanal%splay.se/api/subtitles/%s' % (channel_id, video_id),
video_id, 'Downloading subtitles JSON', fatal=False)
- return {'se': [{'ext': 'srt', 'data': self._fix_subtitles(subs)}]} if subs else {}
+ return {'sv': [{'ext': 'srt', 'data': self._fix_subtitles(subs)}]} if subs else {}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class LemondeIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:.+?\.)?lemonde\.fr/(?:[^/]+/)*(?P<id>[^/]+)\.html'
+ _TESTS = [{
+ 'url': 'http://www.lemonde.fr/police-justice/video/2016/01/19/comprendre-l-affaire-bygmalion-en-cinq-minutes_4849702_1653578.html',
+ 'md5': '01fb3c92de4c12c573343d63e163d302',
+ 'info_dict': {
+ 'id': 'lqm3kl',
+ 'ext': 'mp4',
+ 'title': "Comprendre l'affaire Bygmalion en 5 minutes",
+ 'thumbnail': 're:^https?://.*\.jpg',
+ 'duration': 320,
+ 'upload_date': '20160119',
+ 'timestamp': 1453194778,
+ 'uploader_id': '3pmkp',
+ },
+ }, {
+ 'url': 'http://redaction.actu.lemonde.fr/societe/video/2016/01/18/calais-debut-des-travaux-de-defrichement-dans-la-jungle_4849233_3224.html',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, display_id)
+
+ digiteka_url = self._proto_relative_url(self._search_regex(
+ r'url\s*:\s*(["\'])(?P<url>(?:https?://)?//(?:www\.)?(?:digiteka\.net|ultimedia\.com)/deliver/.+?)\1',
+ webpage, 'digiteka url', group='url'))
+ return self.url_result(digiteka_url, 'Digiteka')
import datetime
import re
import time
+import base64
from .common import InfoExtractor
from ..compat import (
parse_iso8601,
sanitized_Request,
int_or_none,
+ str_or_none,
encode_data_uri,
+ url_basename,
)
},
'playlist_mincount': 7
}]
+
+
+class LetvCloudIE(InfoExtractor):
+ IE_DESC = '乐视云'
+ _VALID_URL = r'https?://yuntv\.letv\.com/bcloud.html\?.+'
+
+ _TESTS = [{
+ 'url': 'http://yuntv.letv.com/bcloud.html?uu=p7jnfw5hw9&vu=467623dedf',
+ 'md5': '26450599afd64c513bc77030ad15db44',
+ 'info_dict': {
+ 'id': 'p7jnfw5hw9_467623dedf',
+ 'ext': 'mp4',
+ 'title': 'Video p7jnfw5hw9_467623dedf',
+ },
+ }, {
+ 'url': 'http://yuntv.letv.com/bcloud.html?uu=p7jnfw5hw9&vu=ec93197892&pu=2c7cd40209&auto_play=1&gpcflag=1&width=640&height=360',
+ 'info_dict': {
+ 'id': 'p7jnfw5hw9_ec93197892',
+ 'ext': 'mp4',
+ 'title': 'Video p7jnfw5hw9_ec93197892',
+ },
+ }, {
+ 'url': 'http://yuntv.letv.com/bcloud.html?uu=p7jnfw5hw9&vu=187060b6fd',
+ 'info_dict': {
+ 'id': 'p7jnfw5hw9_187060b6fd',
+ 'ext': 'mp4',
+ 'title': 'Video p7jnfw5hw9_187060b6fd',
+ },
+ }]
+
+ def _real_extract(self, url):
+ uu_mobj = re.search('uu=([\w]+)', url)
+ vu_mobj = re.search('vu=([\w]+)', url)
+
+ if not uu_mobj or not vu_mobj:
+ raise ExtractorError('Invalid URL: %s' % url, expected=True)
+
+ uu = uu_mobj.group(1)
+ vu = vu_mobj.group(1)
+ media_id = uu + '_' + vu
+
+ play_json_req = sanitized_Request(
+ 'http://api.letvcloud.com/gpc.php?cf=html5&sign=signxxxxx&ver=2.2&format=json&' +
+ 'uu=' + uu + '&vu=' + vu)
+ play_json = self._download_json(play_json_req, media_id, 'Downloading playJson data')
+
+ if not play_json.get('data'):
+ if play_json.get('message'):
+ raise ExtractorError('Letv cloud said: %s' % play_json['message'], expected=True)
+ elif play_json.get('code'):
+ raise ExtractorError('Letv cloud returned error %d' % play_json['code'], expected=True)
+ else:
+ raise ExtractorError('Letv cloud returned an unknwon error')
+
+ def b64decode(s):
+ return base64.b64decode(s.encode('utf-8')).decode('utf-8')
+
+ formats = []
+ for media in play_json['data']['video_info']['media'].values():
+ play_url = media['play_url']
+ url = b64decode(play_url['main_url'])
+ decoded_url = b64decode(url_basename(url))
+ formats.append({
+ 'url': url,
+ 'ext': determine_ext(decoded_url),
+ 'format_id': int_or_none(play_url.get('vtype')),
+ 'format_note': str_or_none(play_url.get('definition')),
+ 'width': int_or_none(play_url.get('vwidth')),
+ 'height': int_or_none(play_url.get('vheight')),
+ })
+ self._sort_formats(formats)
+
+ return {
+ 'id': media_id,
+ 'title': 'Video %s' % media_id,
+ 'formats': formats,
+ }
--- /dev/null
+from __future__ import unicode_literals
+
+import re
+
+from .nuevo import NuevoBaseIE
+
+
+class LoveHomePornIE(NuevoBaseIE):
+ _VALID_URL = r'https?://(?:www\.)?lovehomeporn\.com/video/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
+ _TEST = {
+ 'url': 'http://lovehomeporn.com/video/48483/stunning-busty-brunette-girlfriend-sucking-and-riding-a-big-dick#menu',
+ 'info_dict': {
+ 'id': '48483',
+ 'display_id': 'stunning-busty-brunette-girlfriend-sucking-and-riding-a-big-dick',
+ 'ext': 'mp4',
+ 'title': 'Stunning busty brunette girlfriend sucking and riding a big dick',
+ 'age_limit': 18,
+ 'duration': 238.47,
+ },
+ 'params': {
+ 'skip_download': True,
+ }
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ display_id = mobj.group('display_id')
+
+ info = self._extract_nuevo(
+ 'http://lovehomeporn.com/media/nuevo/config.php?key=%s' % video_id,
+ video_id)
+ info.update({
+ 'display_id': display_id,
+ 'age_limit': 18
+ })
+ return info
_VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z]+(?P<id>\d+)(?:_.+?)?\.html'
_TESTS = [{
- # MDR regularily deletes its videos
+ # MDR regularly deletes its videos
'url': 'http://www.mdr.de/fakt/video189002.html',
'only_matching': True,
}, {
class NBCSportsIE(InfoExtractor):
- # Does not include https becuase its certificate is invalid
+ # Does not include https because its certificate is invalid
_VALID_URL = r'http://www\.nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_TEST = {
compat_str,
compat_itertools_count,
)
-from ..utils import sanitized_Request
+from ..utils import (
+ sanitized_Request,
+ float_or_none,
+)
class NetEaseMusicBaseIE(InfoExtractor):
result = b64encode(m.digest()).decode('ascii')
return result.replace('/', '_').replace('+', '-')
- @classmethod
- def extract_formats(cls, info):
+ def extract_formats(self, info):
formats = []
- for song_format in cls._FORMATS:
+ for song_format in self._FORMATS:
details = info.get(song_format)
if not details:
continue
- formats.append({
- 'url': 'http://m5.music.126.net/%s/%s.%s' %
- (cls._encrypt(details['dfsId']), details['dfsId'],
- details['extension']),
- 'ext': details.get('extension'),
- 'abr': details.get('bitrate', 0) / 1000,
- 'format_id': song_format,
- 'filesize': details.get('size'),
- 'asr': details.get('sr')
- })
+ song_file_path = '/%s/%s.%s' % (
+ self._encrypt(details['dfsId']), details['dfsId'], details['extension'])
+
+ # 203.130.59.9, 124.40.233.182, 115.231.74.139, etc is a reverse proxy-like feature
+ # from NetEase's CDN provider that can be used if m5.music.126.net does not
+ # work, especially for users outside of Mainland China
+ # via: https://github.com/JixunMoe/unblock-163/issues/3#issuecomment-163115880
+ for host in ('http://m5.music.126.net', 'http://115.231.74.139/m1.music.126.net',
+ 'http://124.40.233.182/m1.music.126.net', 'http://203.130.59.9/m1.music.126.net'):
+ song_url = host + song_file_path
+ if self._is_valid_url(song_url, info['id'], 'song'):
+ formats.append({
+ 'url': song_url,
+ 'ext': details.get('extension'),
+ 'abr': float_or_none(details.get('bitrate'), scale=1000),
+ 'format_id': song_format,
+ 'filesize': details.get('size'),
+ 'asr': details.get('sr')
+ })
+ break
return formats
@classmethod
response = self._download_webpage(request_url, playlist_title)
response = self._fix_json(response)
if not response.strip():
- self._downloader.report_warning('Got an empty reponse, trying '
+ self._downloader.report_warning('Got an empty response, trying '
'adding the "newvideos" parameter')
response = self._download_webpage(request_url + '&newvideos=true',
playlist_title)
class NowTVIE(NowTVBaseIE):
+ _WORKING = False
_VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/(?:(?:list/[^/]+|jahr/\d{4}/\d{1,2})/)?(?P<id>[^/]+)/(?:player|preview)'
_TESTS = [{
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse
+from ..utils import (
+ int_or_none,
+ qualities,
+)
+
+
+class NprIE(InfoExtractor):
+ _VALID_URL = r'http://(?:www\.)?npr\.org/player/v2/mediaPlayer\.html\?.*\bid=(?P<id>\d+)'
+ _TESTS = [{
+ 'url': 'http://www.npr.org/player/v2/mediaPlayer.html?id=449974205',
+ 'info_dict': {
+ 'id': '449974205',
+ 'title': 'New Music From Beach House, Chairlift, CMJ Discoveries And More'
+ },
+ 'playlist_count': 7,
+ }, {
+ 'url': 'http://www.npr.org/player/v2/mediaPlayer.html?action=1&t=1&islist=false&id=446928052&m=446929930&live=1',
+ 'info_dict': {
+ 'id': '446928052',
+ 'title': "Songs We Love: Tigran Hamasyan, 'Your Mercy is Boundless'"
+ },
+ 'playlist': [{
+ 'md5': '12fa60cb2d3ed932f53609d4aeceabf1',
+ 'info_dict': {
+ 'id': '446929930',
+ 'ext': 'mp3',
+ 'title': 'Your Mercy is Boundless (Bazum en Qo gtutyunqd)',
+ 'duration': 402,
+ },
+ }],
+ }]
+
+ def _real_extract(self, url):
+ playlist_id = self._match_id(url)
+
+ config = self._download_json(
+ 'http://api.npr.org/query?%s' % compat_urllib_parse.urlencode({
+ 'id': playlist_id,
+ 'fields': 'titles,audio,show',
+ 'format': 'json',
+ 'apiKey': 'MDAzMzQ2MjAyMDEyMzk4MTU1MDg3ZmM3MQ010',
+ }), playlist_id)
+
+ story = config['list']['story'][0]
+
+ KNOWN_FORMATS = ('threegp', 'mp4', 'mp3')
+ quality = qualities(KNOWN_FORMATS)
+
+ entries = []
+ for audio in story.get('audio', []):
+ title = audio.get('title', {}).get('$text')
+ duration = int_or_none(audio.get('duration', {}).get('$text'))
+ formats = []
+ for format_id, formats_entry in audio.get('format', {}).items():
+ if not formats_entry:
+ continue
+ if isinstance(formats_entry, list):
+ formats_entry = formats_entry[0]
+ format_url = formats_entry.get('$text')
+ if not format_url:
+ continue
+ if format_id in KNOWN_FORMATS:
+ formats.append({
+ 'url': format_url,
+ 'format_id': format_id,
+ 'ext': formats_entry.get('type'),
+ 'quality': quality(format_id),
+ })
+ self._sort_formats(formats)
+ entries.append({
+ 'id': audio['id'],
+ 'title': title,
+ 'duration': duration,
+ 'formats': formats,
+ })
+
+ playlist_title = story.get('title', {}).get('$text')
+ return self.playlist_result(entries, playlist_id, playlist_title)
from __future__ import unicode_literals
from .common import InfoExtractor
+from ..compat import compat_urlparse
from ..utils import (
int_or_none,
js_to_json,
webpage = self._download_webpage(url, video_id)
info = self._parse_json(self._search_regex(
- r'(?s)ntv.pageInfo.article =\s(\{.*?\});', webpage, 'info'),
+ r'(?s)ntv\.pageInfo\.article\s*=\s*(\{.*?\});', webpage, 'info'),
video_id, transform_source=js_to_json)
timestamp = int_or_none(info.get('publishedDateAsUnixTimeStamp'))
vdata = self._parse_json(self._search_regex(
webpage, 'player data'),
video_id, transform_source=js_to_json)
duration = parse_duration(vdata.get('duration'))
- formats = [{
- 'format_id': 'flash',
- 'url': 'rtmp://fms.n-tv.de/' + vdata['video'],
- }, {
- 'format_id': 'mobile',
- 'url': 'http://video.n-tv.de' + vdata['videoMp4'],
- 'tbr': 400, # estimation
- }]
- m3u8_url = 'http://video.n-tv.de' + vdata['videoM3u8']
- formats.extend(self._extract_m3u8_formats(
- m3u8_url, video_id, ext='mp4',
- entry_protocol='m3u8_native', preference=0))
+
+ formats = []
+ if vdata.get('video'):
+ formats.append({
+ 'format_id': 'flash',
+ 'url': 'rtmp://fms.n-tv.de/%s' % vdata['video'],
+ })
+ if vdata.get('videoMp4'):
+ formats.append({
+ 'format_id': 'mobile',
+ 'url': compat_urlparse.urljoin('http://video.n-tv.de', vdata['videoMp4']),
+ 'tbr': 400, # estimation
+ })
+ if vdata.get('videoM3u8'):
+ m3u8_url = compat_urlparse.urljoin('http://video.n-tv.de', vdata['videoM3u8'])
+ formats.extend(self._extract_m3u8_formats(
+ m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
+ preference=0, m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
--- /dev/null
+# encoding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+from ..utils import (
+ float_or_none,
+ xpath_text
+)
+
+
+class NuevoBaseIE(InfoExtractor):
+ def _extract_nuevo(self, config_url, video_id):
+ config = self._download_xml(
+ config_url, video_id, transform_source=lambda s: s.strip())
+
+ title = xpath_text(config, './title', 'title', fatal=True).strip()
+ video_id = xpath_text(config, './mediaid', default=video_id)
+ thumbnail = xpath_text(config, ['./image', './thumb'])
+ duration = float_or_none(xpath_text(config, './duration'))
+
+ formats = []
+ for element_name, format_id in (('file', 'sd'), ('filehd', 'hd')):
+ video_url = xpath_text(config, element_name)
+ if video_url:
+ formats.append({
+ 'url': video_url,
+ 'format_id': format_id,
+ })
+ self._check_formats(formats, video_id)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'thumbnail': thumbnail,
+ 'duration': duration,
+ 'formats': formats
+ }
'ext': 'mp4',
'title': 'Vine & YouTube Stars Zach King & King Bach On Their Viral Videos!',
'description': 'md5:ebbc5b1424dd5dba7be7538148287ac1',
- 'duration': 1477,
}
}
webpage = self._download_webpage(url, display_id)
video_data = self._search_regex(
- r'"current"\s*:\s*({[^}]+?})', webpage, 'current video')
+ r'"(?:video|current)"\s*:\s*({[^}]+?})', webpage, 'current video')
m3u8_url = self._search_regex(
- r'"hls_stream"\s*:\s*"([^"]+)', video_data, 'm3u8 url', None)
+ r'hls_stream"?\s*:\s*"([^"]+)', video_data, 'm3u8 url', None)
if m3u8_url:
formats = self._extract_m3u8_formats(
m3u8_url, display_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)
- # simular to GameSpotIE
+ # similar to GameSpotIE
m3u8_path = compat_urlparse.urlparse(m3u8_url).path
QUALITIES_RE = r'((,[a-z]+\d+)+,?)'
available_qualities = self._search_regex(
return {
'id': self._search_regex(
- r'"video_id"\s*:\s*(\d+)', video_data, 'video id'),
+ r'"id"\s*:\s*(\d+)', video_data, 'video id', default=display_id),
'display_id': display_id,
'title': unescapeHTML(self._og_search_title(webpage)),
'description': get_element_by_attribute(
'class', 'video_txt_decription', webpage),
'thumbnail': self._proto_relative_url(self._search_regex(
r'"thumb"\s*:\s*"([^"]+)', video_data, 'thumbnail', None)),
- 'duration': int(self._search_regex(
- r'"duration"\s*:\s*(\d+)', video_data, 'duration')),
'formats': formats,
}
class ORFFM4IE(InfoExtractor):
IE_NAME = 'orf:fm4'
IE_DESC = 'radio FM4'
- _VALID_URL = r'http://fm4\.orf\.at/7tage/?#(?P<date>[0-9]+)/(?P<show>\w+)'
+ _VALID_URL = r'http://fm4\.orf\.at/(?:7tage/?#|player/)(?P<date>[0-9]+)/(?P<show>\w+)'
+
+ _TEST = {
+ 'url': 'http://fm4.orf.at/player/20160110/IS/',
+ 'md5': '01e736e8f1cef7e13246e880a59ad298',
+ 'info_dict': {
+ 'id': '2016-01-10_2100_tl_54_7DaysSun13_11244',
+ 'ext': 'mp3',
+ 'title': 'Im Sumpf',
+ 'description': 'md5:384c543f866c4e422a55f66a62d669cd',
+ 'duration': 7173,
+ 'timestamp': 1452456073,
+ 'upload_date': '20160110',
+ },
+ }
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
# { a = author, cn = clip_id, lc = end, m = name }
return {
- 'id': clip['clipName'],
+ 'id': clip.get('clipName') or clip['name'],
'title': '%s - %s' % (module['title'], clip['title']),
'duration': int_or_none(clip.get('duration')) or parse_duration(clip.get('formattedDuration')),
'creator': author,
class ProSiebenSat1IE(InfoExtractor):
IE_NAME = 'prosiebensat1'
IE_DESC = 'ProSiebenSat.1 Digital'
- _VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
+ _VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany|7tv)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
_TESTS = [
{
'url': 'http://www.prosieben.de/tv/circus-halligalli/videos/218-staffel-2-episode-18-jahresrueckblick-ganze-folge',
'info_dict': {
'id': '2104602',
- 'ext': 'mp4',
+ 'ext': 'flv',
'title': 'Episode 18 - Staffel 2',
'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
'upload_date': '20131231',
'url': 'http://www.the-voice-of-germany.de/video/31-andreas-kuemmert-rocket-man-clip',
'info_dict': {
'id': '2572814',
- 'ext': 'mp4',
+ 'ext': 'flv',
'title': 'Andreas Kümmert: Rocket Man',
'description': 'md5:6ddb02b0781c6adf778afea606652e38',
'upload_date': '20131017',
'duration': 469.88,
},
'params': {
- # rtmp download
'skip_download': True,
},
},
'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
'info_dict': {
'id': '2156342',
- 'ext': 'mp4',
+ 'ext': 'flv',
'title': 'Kurztrips zum Valentinstag',
- 'description': 'Romantischer Kurztrip zum Valentinstag? Wir verraten, was sich hier wirklich lohnt.',
+ 'description': 'Romantischer Kurztrip zum Valentinstag? Nina Heinemann verrät, was sich hier wirklich lohnt.',
'duration': 307.24,
},
'params': {
- # rtmp download
'skip_download': True,
},
},
},
'playlist_count': 2,
},
+ {
+ 'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
+ 'info_dict': {
+ 'id': '4187506',
+ 'ext': 'flv',
+ 'title': 'Best of Circus HalliGalli',
+ 'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
+ 'upload_date': '20151229',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ },
]
_CLIPID_REGEXES = [
r'"clip_id"\s*:\s+"(\d+)"',
r'clipid: "(\d+)"',
r'clip[iI]d=(\d+)',
+ r'clip[iI]d\s*=\s*["\'](\d+)',
r"'itemImageUrl'\s*:\s*'/dynamic/thumbnails/full/\d+/(\d+)",
]
_TITLE_REGEXES = [
r'<!-- start video -->\s*<h1>(.+?)</h1>',
r'<h1 class="att-name">\s*(.+?)</h1>',
r'<header class="module_header">\s*<h2>([^<]+)</h2>\s*</header>',
+ r'<h2 class="video-title" itemprop="name">\s*(.+?)</h2>',
+ r'<div[^>]+id="veeseoTitle"[^>]*>(.+?)</div>',
]
_DESCRIPTION_REGEXES = [
r'<p itemprop="description">\s*(.+?)</p>',
r'<div class="videoDecription">\s*<p><strong>Beschreibung</strong>: (.+?)</p>',
r'<div class="g-plusone" data-size="medium"></div>\s*</div>\s*</header>\s*(.+?)\s*<footer>',
r'<p class="att-description">\s*(.+?)\s*</p>',
+ r'<p class="video-description" itemprop="description">\s*(.+?)</p>',
+ r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
]
_UPLOAD_DATE_REGEXES = [
r'<meta property="og:published_time" content="(.+?)">',
from __future__ import unicode_literals
from .common import InfoExtractor
-
from ..utils import (
float_or_none,
+ parse_iso8601,
+ unescapeHTML,
)
class RteIE(InfoExtractor):
+ IE_NAME = 'rte'
+ IE_DESC = 'Raidió Teilifís Éireann TV'
_VALID_URL = r'https?://(?:www\.)?rte\.ie/player/[^/]{2,3}/show/[^/]+/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.rte.ie/player/ie/show/iwitness-862/10478715/',
'info_dict': {
'id': '10478715',
- 'ext': 'mp4',
+ 'ext': 'flv',
'title': 'Watch iWitness online',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'iWitness : The spirit of Ireland, one voice and one minute at a time.',
# f4m_url = server + relative_url
f4m_url = json_string['shows'][0]['media:group'][0]['rte:server'] + json_string['shows'][0]['media:group'][0]['url']
f4m_formats = self._extract_f4m_formats(f4m_url, video_id)
- f4m_formats = [{
- 'format_id': f['format_id'],
- 'url': f['url'],
- 'ext': 'mp4',
- 'width': f['width'],
- 'height': f['height'],
- } for f in f4m_formats]
return {
'id': video_id,
'thumbnail': thumbnail,
'duration': duration,
}
+
+
+class RteRadioIE(InfoExtractor):
+ IE_NAME = 'rte:radio'
+ IE_DESC = 'Raidió Teilifís Éireann radio'
+ # Radioplayer URLs have the specifier #!rii=<channel_id>:<id>:<playable_item_id>:<date>:
+ # where the IDs are int/empty, the date is DD-MM-YYYY, and the specifier may be truncated.
+ # An <id> uniquely defines an individual recording, and is the only part we require.
+ _VALID_URL = r'https?://(?:www\.)?rte\.ie/radio/utils/radioplayer/rteradioweb\.html#!rii=(?:[0-9]*)(?:%3A|:)(?P<id>[0-9]+)'
+
+ _TEST = {
+ 'url': 'http://www.rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=16:10507902:2414:27-12-2015:',
+ 'info_dict': {
+ 'id': '10507902',
+ 'ext': 'mp4',
+ 'title': 'Gloria',
+ 'thumbnail': 're:^https?://.*\.jpg$',
+ 'description': 'md5:9ce124a7fb41559ec68f06387cabddf0',
+ 'timestamp': 1451203200,
+ 'upload_date': '20151227',
+ 'duration': 7230.0,
+ },
+ 'params': {
+ 'skip_download': 'f4m fails with --test atm'
+ }
+ }
+
+ def _real_extract(self, url):
+ item_id = self._match_id(url)
+
+ json_string = self._download_json(
+ 'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=' + item_id,
+ item_id)
+
+ # NB the string values in the JSON are stored using XML escaping(!)
+ show = json_string['shows'][0]
+ title = unescapeHTML(show['title'])
+ description = unescapeHTML(show.get('description'))
+ thumbnail = show.get('thumbnail')
+ duration = float_or_none(show.get('duration'), 1000)
+ timestamp = parse_iso8601(show.get('published'))
+
+ mg = show['media:group'][0]
+
+ formats = []
+
+ if mg.get('url') and not mg['url'].startswith('rtmpe:'):
+ formats.append({'url': mg['url']})
+
+ if mg.get('hls_server') and mg.get('hls_url'):
+ formats.extend(self._extract_m3u8_formats(
+ mg['hls_server'] + mg['hls_url'], item_id, 'mp4',
+ entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+
+ if mg.get('hds_server') and mg.get('hds_url'):
+ formats.extend(self._extract_f4m_formats(
+ mg['hds_server'] + mg['hds_url'], item_id,
+ f4m_id='hds', fatal=False))
+
+ self._sort_formats(formats)
+
+ return {
+ 'id': item_id,
+ 'title': title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ 'timestamp': timestamp,
+ 'duration': duration,
+ 'formats': formats,
+ }
--- /dev/null
+from __future__ import unicode_literals
+
+from .nuevo import NuevoBaseIE
+
+
+class RulePornIE(NuevoBaseIE):
+ _VALID_URL = r'https?://(?:www\.)?ruleporn\.com/(?:[^/?#&]+/)*(?P<id>[^/?#&]+)'
+ _TEST = {
+ 'url': 'http://ruleporn.com/brunette-nympho-chick-takes-her-boyfriend-in-every-angle/',
+ 'md5': '86861ebc624a1097c7c10eaf06d7d505',
+ 'info_dict': {
+ 'id': '48212',
+ 'display_id': 'brunette-nympho-chick-takes-her-boyfriend-in-every-angle',
+ 'ext': 'mp4',
+ 'title': 'Brunette Nympho Chick Takes Her Boyfriend In Every Angle',
+ 'description': 'md5:6d28be231b981fff1981deaaa03a04d5',
+ 'age_limit': 18,
+ 'duration': 635.1,
+ }
+ }
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, display_id)
+
+ video_id = self._search_regex(
+ r'lovehomeporn\.com/embed/(\d+)', webpage, 'video id')
+
+ title = self._search_regex(
+ r'<h2[^>]+title=(["\'])(?P<url>.+?)\1',
+ webpage, 'title', group='url')
+ description = self._html_search_meta('description', webpage)
+
+ info = self._extract_nuevo(
+ 'http://lovehomeporn.com/media/nuevo/econfig.php?key=%s&rp=true' % video_id,
+ video_id)
+ info.update({
+ 'display_id': display_id,
+ 'title': title,
+ 'description': description,
+ 'age_limit': 18
+ })
+ return info
'https://shahid.mbc.net/arContent/getPlayerContent-param-.id-%s.type-%s.html'
% (video_id, api_vars['type']), video_id, 'Downloading player JSON')
+ if player.get('drm'):
+ raise ExtractorError('This video is DRM protected.', expected=True)
+
formats = self._extract_m3u8_formats(player['url'], video_id, 'mp4')
video = self._download_json(
resource = mobj.group('rsrc') or 'all'
base_url = self._BASE_URL_MAP[resource] % user['id']
- next_href = None
+ COMMON_QUERY = {
+ 'limit': 50,
+ 'client_id': self._CLIENT_ID,
+ 'linked_partitioning': '1',
+ }
+
+ query = COMMON_QUERY.copy()
+ query['offset'] = 0
+
+ next_href = base_url + '?' + compat_urllib_parse.urlencode(query)
entries = []
for i in itertools.count():
- if not next_href:
- data = compat_urllib_parse.urlencode({
- 'offset': i * 50,
- 'limit': 50,
- 'client_id': self._CLIENT_ID,
- 'linked_partitioning': '1',
- 'representation': 'speedy',
- })
- next_href = base_url + '?' + data
-
response = self._download_json(
next_href, uploader, 'Downloading track page %s' % (i + 1))
collection = response['collection']
-
if not collection:
- self.to_screen('%s: End page received' % uploader)
break
def resolve_permalink_url(candidates):
if permalink_url:
entries.append(self.url_result(permalink_url))
- if 'next_href' in response:
- next_href = response['next_href']
- if not next_href:
- break
- else:
- next_href = None
+ next_href = response.get('next_href')
+ if not next_href:
+ break
+
+ parsed_next_href = compat_urlparse.urlparse(response['next_href'])
+ qs = compat_urlparse.parse_qs(parsed_next_href.query)
+ qs.update(COMMON_QUERY)
+ next_href = compat_urlparse.urlunparse(
+ parsed_next_href._replace(query=compat_urllib_parse.urlencode(qs, True)))
return {
'_type': 'playlist',
})
self._sort_formats(formats)
+ subtitles = {}
+ subtitle_references = video_info.get('subtitleReferences')
+ if isinstance(subtitle_references, list):
+ for sr in subtitle_references:
+ subtitle_url = sr.get('url')
+ if subtitle_url:
+ subtitles.setdefault('sv', []).append({'url': subtitle_url})
+
duration = video_info.get('materialLength')
age_limit = 18 if video_info.get('inappropriateForChildren') else 0
'id': video_id,
'title': title,
'formats': formats,
+ 'subtitles': subtitles,
'thumbnail': thumbnail,
'duration': duration,
'age_limit': age_limit,
class SVTPlayIE(SVTBaseIE):
IE_DESC = 'SVT Play and Öppet arkiv'
_VALID_URL = r'https?://(?:www\.)?(?P<host>svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)'
- _TESTS = [{
- 'url': 'http://www.svtplay.se/video/2609989/sm-veckan/sm-veckan-rally-final-sasong-1-sm-veckan-rally-final',
- 'md5': 'ade3def0643fa1c40587a422f98edfd9',
- 'info_dict': {
- 'id': '2609989',
- 'ext': 'flv',
- 'title': 'SM veckan vinter, Örebro - Rally, final',
- 'duration': 4500,
- 'thumbnail': 're:^https?://.*[\.-]jpg$',
- 'age_limit': 0,
- },
- }, {
- 'url': 'http://www.oppetarkiv.se/video/1058509/rederiet-sasong-1-avsnitt-1-av-318',
- 'md5': 'c3101a17ce9634f4c1f9800f0746c187',
+ _TEST = {
+ 'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2',
+ 'md5': '2b6704fe4a28801e1a098bbf3c5ac611',
'info_dict': {
- 'id': '1058509',
- 'ext': 'flv',
- 'title': 'Farlig kryssning',
- 'duration': 2566,
+ 'id': '5996901',
+ 'ext': 'mp4',
+ 'title': 'Flygplan till Haile Selassie',
+ 'duration': 3527,
'thumbnail': 're:^https?://.*[\.-]jpg$',
'age_limit': 0,
+ 'subtitles': {
+ 'sv': [{
+ 'ext': 'wsrt',
+ }]
+ },
},
- 'skip': 'Only works from Sweden',
- }]
+ }
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
class TestURLIE(InfoExtractor):
- """ Allows adressing of the test cases as test:yout.*be_1 """
+ """ Allows addressing of the test cases as test:yout.*be_1 """
IE_DESC = False # Do not list
_VALID_URL = r'test(?:url)?:(?P<id>(?P<extractor>.+?)(?:_(?P<num>[0-9]+))?)$'
class ThePlatformIE(ThePlatformBaseIE):
_VALID_URL = r'''(?x)
(?:https?://(?:link|player)\.theplatform\.com/[sp]/(?P<provider_id>[^/]+)/
- (?:(?P<media>(?:[^/]+/)+select/media/)|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
+ (?:(?P<media>(?:(?:[^/]+/)+select/)?media/)|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
|theplatform:)(?P<id>[^/\?&]+)'''
_TESTS = [{
--- /dev/null
+# encoding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .nuevo import NuevoBaseIE
+
+
+class TrollvidsIE(NuevoBaseIE):
+ _VALID_URL = r'http://(?:www\.)?trollvids\.com/video/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
+ IE_NAME = 'trollvids'
+ _TEST = {
+ 'url': 'http://trollvids.com/video/2349002/%E3%80%90MMD-R-18%E3%80%91%E3%82%AC%E3%83%BC%E3%83%AB%E3%83%95%E3%83%AC%E3%83%B3%E3%83%89-carrymeoff',
+ 'md5': '1d53866b2c514b23ed69e4352fdc9839',
+ 'info_dict': {
+ 'id': '2349002',
+ 'ext': 'mp4',
+ 'title': '【MMD R-18】ガールフレンド carry_me_off',
+ 'age_limit': 18,
+ 'duration': 216.78,
+ },
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ display_id = mobj.group('display_id')
+
+ info = self._extract_nuevo(
+ 'http://trollvids.com/nuevo/player/config.php?v=%s' % video_id,
+ video_id)
+ info.update({
+ 'display_id': display_id,
+ 'age_limit': 18
+ })
+ return info
from __future__ import unicode_literals
-from .common import InfoExtractor
-from ..utils import xpath_text
+from .nuevo import NuevoBaseIE
-class TruTubeIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?trutube\.tv/(?:video/|nuevo/player/embed\.php\?v=)(?P<id>[0-9]+)'
+class TruTubeIE(NuevoBaseIE):
+ _VALID_URL = r'https?://(?:www\.)?trutube\.tv/(?:video/|nuevo/player/embed\.php\?v=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://trutube.tv/video/14880/Ramses-II-Proven-To-Be-A-Red-Headed-Caucasoid-',
'md5': 'c5b6e301b0a2040b074746cbeaa26ca1',
def _real_extract(self, url):
video_id = self._match_id(url)
-
- config = self._download_xml(
+ return self._extract_nuevo(
'https://trutube.tv/nuevo/player/config.php?v=%s' % video_id,
- video_id, transform_source=lambda s: s.strip())
-
- # filehd is always 404
- video_url = xpath_text(config, './file', 'video URL', fatal=True)
- title = xpath_text(config, './title', 'title').strip()
- thumbnail = xpath_text(config, './image', ' thumbnail')
-
- return {
- 'id': video_id,
- 'url': video_url,
- 'title': title,
- 'thumbnail': thumbnail,
- }
+ video_id)
from __future__ import unicode_literals
-import json
import re
from .common import InfoExtractor
-from ..compat import compat_urllib_parse_urlparse
+from ..compat import compat_str
from ..utils import (
int_or_none,
sanitized_Request,
class Tube8IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tube8\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/(?P<id>\d+)'
- _TESTS = [
- {
- 'url': 'http://www.tube8.com/teen/kasia-music-video/229795/',
- 'md5': '44bf12b98313827dd52d35b8706a4ea0',
- 'info_dict': {
- 'id': '229795',
- 'display_id': 'kasia-music-video',
- 'ext': 'mp4',
- 'description': 'hot teen Kasia grinding',
- 'uploader': 'unknown',
- 'title': 'Kasia music video',
- 'age_limit': 18,
- }
- },
- {
- 'url': 'http://www.tube8.com/shemale/teen/blonde-cd-gets-kidnapped-by-two-blacks-and-punished-for-being-a-slutty-girl/19569151/',
- 'only_matching': True,
- },
- ]
+ _TESTS = [{
+ 'url': 'http://www.tube8.com/teen/kasia-music-video/229795/',
+ 'md5': '65e20c48e6abff62ed0c3965fff13a39',
+ 'info_dict': {
+ 'id': '229795',
+ 'display_id': 'kasia-music-video',
+ 'ext': 'mp4',
+ 'description': 'hot teen Kasia grinding',
+ 'uploader': 'unknown',
+ 'title': 'Kasia music video',
+ 'age_limit': 18,
+ 'duration': 230,
+ }
+ }, {
+ 'url': 'http://www.tube8.com/shemale/teen/blonde-cd-gets-kidnapped-by-two-blacks-and-punished-for-being-a-slutty-girl/19569151/',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, display_id)
- flashvars = json.loads(self._html_search_regex(
- r'flashvars\s*=\s*({.+?});\r?\n', webpage, 'flashvars'))
+ flashvars = self._parse_json(
+ self._search_regex(
+ r'flashvars\s*=\s*({.+?});\r?\n', webpage, 'flashvars'),
+ video_id)
- video_url = flashvars['video_url']
- if flashvars.get('encrypted') is True:
- video_url = aes_decrypt_text(video_url, flashvars['video_title'], 32).decode('utf-8')
- path = compat_urllib_parse_urlparse(video_url).path
- format_id = '-'.join(path.split('/')[4].split('_')[:2])
+ formats = []
+ for key, video_url in flashvars.items():
+ if not isinstance(video_url, compat_str) or not video_url.startswith('http'):
+ continue
+ height = self._search_regex(
+ r'quality_(\d+)[pP]', key, 'height', default=None)
+ if not height:
+ continue
+ if flashvars.get('encrypted') is True:
+ video_url = aes_decrypt_text(
+ video_url, flashvars['video_title'], 32).decode('utf-8')
+ formats.append({
+ 'url': video_url,
+ 'format_id': '%sp' % height,
+ 'height': int(height),
+ })
+ self._sort_formats(formats)
thumbnail = flashvars.get('image_url')
uploader = self._html_search_regex(
r'<span class="username">\s*(.+?)\s*<',
webpage, 'uploader', fatal=False)
+ duration = int_or_none(flashvars.get('video_duration'))
- like_count = int_or_none(self._html_search_regex(
+ like_count = int_or_none(self._search_regex(
r'rupVar\s*=\s*"(\d+)"', webpage, 'like count', fatal=False))
- dislike_count = int_or_none(self._html_search_regex(
+ dislike_count = int_or_none(self._search_regex(
r'rdownVar\s*=\s*"(\d+)"', webpage, 'dislike count', fatal=False))
- view_count = self._html_search_regex(
- r'<strong>Views: </strong>([\d,\.]+)\s*</li>', webpage, 'view count', fatal=False)
- if view_count:
- view_count = str_to_int(view_count)
- comment_count = self._html_search_regex(
- r'<span id="allCommentsCount">(\d+)</span>', webpage, 'comment count', fatal=False)
- if comment_count:
- comment_count = str_to_int(comment_count)
+ view_count = str_to_int(self._search_regex(
+ r'<strong>Views: </strong>([\d,\.]+)\s*</li>',
+ webpage, 'view count', fatal=False))
+ comment_count = str_to_int(self._search_regex(
+ r'<span id="allCommentsCount">(\d+)</span>',
+ webpage, 'comment count', fatal=False))
return {
'id': video_id,
'display_id': display_id,
- 'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
- 'format_id': format_id,
+ 'duration': duration,
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
'comment_count': comment_count,
'age_limit': 18,
+ 'formats': formats,
}
from .common import InfoExtractor
from ..compat import compat_str
+from ..utils import (
+ int_or_none,
+ float_or_none,
+ unescapeHTML,
+)
class TudouIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?tudou\.com/(?:listplay|programs(?:/view)?|albumplay)/([^/]+/)*(?P<id>[^/?#]+?)(?:\.html)?/?(?:$|[?#])'
+ IE_NAME = 'tudou'
+ _VALID_URL = r'https?://(?:www\.)?tudou\.com/(?:(?:programs|wlplay)/view|(?:listplay|albumplay)/[\w-]{11})/(?P<id>[\w-]{11})'
_TESTS = [{
'url': 'http://www.tudou.com/listplay/zzdE77v6Mmo/2xN2duXMxmw.html',
'md5': '140a49ed444bd22f93330985d8475fcb',
'ext': 'f4v',
'title': '卡马乔国足开大脚长传冲吊集锦',
'thumbnail': 're:^https?://.*\.jpg$',
+ 'timestamp': 1372113489000,
+ 'description': '卡马乔卡家军,开大脚先进战术不完全集锦!',
+ 'duration': 289.04,
+ 'view_count': int,
+ 'filesize': int,
}
}, {
'url': 'http://www.tudou.com/programs/view/ajX3gyhL0pc/',
'ext': 'f4v',
'title': 'La Sylphide-Bolshoi-Ekaterina Krysanova & Vyacheslav Lopatin 2012',
'thumbnail': 're:^https?://.*\.jpg$',
+ 'timestamp': 1349207518000,
+ 'description': 'md5:294612423894260f2dcd5c6c04fe248b',
+ 'duration': 5478.33,
+ 'view_count': int,
+ 'filesize': int,
}
- }, {
- 'url': 'http://www.tudou.com/albumplay/cJAHGih4yYg.html',
- 'only_matching': True,
}]
_PLAYER_URL = 'http://js.tudouui.com/bin/lingtong/PortalPlayer_177.swf'
def _real_extract(self, url):
video_id = self._match_id(url)
- webpage = self._download_webpage(url, video_id)
+ item_data = self._download_json(
+ 'http://www.tudou.com/tvp/getItemInfo.action?ic=%s' % video_id, video_id)
- youku_vcode = self._search_regex(
- r'vcode\s*:\s*[\'"]([^\'"]*)[\'"]', webpage, 'youku vcode', default=None)
+ youku_vcode = item_data.get('vcode')
if youku_vcode:
return self.url_result('youku:' + youku_vcode, ie='Youku')
- title = self._search_regex(
- r',kw\s*:\s*[\'"]([^\'"]+)[\'"]', webpage, 'title')
- thumbnail_url = self._search_regex(
- r',pic\s*:\s*[\'"]([^\'"]+)[\'"]', webpage, 'thumbnail URL', fatal=False)
-
- player_url = self._search_regex(
- r'playerUrl\s*:\s*[\'"]([^\'"]+\.swf)[\'"]',
- webpage, 'player URL', default=self._PLAYER_URL)
+ title = unescapeHTML(item_data['kw'])
+ description = item_data.get('desc')
+ thumbnail_url = item_data.get('pic')
+ view_count = int_or_none(item_data.get('playTimes'))
+ timestamp = int_or_none(item_data.get('pt'))
- segments = self._parse_json(self._search_regex(
- r'segs: \'([^\']+)\'', webpage, 'segments'), video_id)
+ segments = self._parse_json(item_data['itemSegs'], video_id)
# It looks like the keys are the arguments that have to be passed as
# the hd field in the request url, we pick the higher
# Also, filter non-number qualities (see issue #3643).
'ext': ext,
'title': title,
'thumbnail': thumbnail_url,
+ 'description': description,
+ 'view_count': view_count,
+ 'timestamp': timestamp,
+ 'duration': float_or_none(part.get('seconds'), 1000),
+ 'filesize': int_or_none(part.get('size')),
'http_headers': {
- 'Referer': player_url,
+ 'Referer': self._PLAYER_URL,
},
}
result.append(part_info)
'id': video_id,
'title': title,
}
+
+
+class TudouPlaylistIE(InfoExtractor):
+ IE_NAME = 'tudou:playlist'
+ _VALID_URL = r'https?://(?:www\.)?tudou\.com/listplay/(?P<id>[\w-]{11})\.html'
+ _TESTS = [{
+ 'url': 'http://www.tudou.com/listplay/zzdE77v6Mmo.html',
+ 'info_dict': {
+ 'id': 'zzdE77v6Mmo',
+ },
+ 'playlist_mincount': 209,
+ }]
+
+ def _real_extract(self, url):
+ playlist_id = self._match_id(url)
+ playlist_data = self._download_json(
+ 'http://www.tudou.com/tvp/plist.action?lcode=%s' % playlist_id, playlist_id)
+ entries = [self.url_result(
+ 'http://www.tudou.com/programs/view/%s' % item['icode'],
+ 'Tudou', item['icode'],
+ item['kw']) for item in playlist_data['items']]
+ return self.playlist_result(entries, playlist_id)
+
+
+class TudouAlbumIE(InfoExtractor):
+ IE_NAME = 'tudou:album'
+ _VALID_URL = r'https?://(?:www\.)?tudou\.com/album(?:cover|play)/(?P<id>[\w-]{11})'
+ _TESTS = [{
+ 'url': 'http://www.tudou.com/albumplay/v5qckFJvNJg.html',
+ 'info_dict': {
+ 'id': 'v5qckFJvNJg',
+ },
+ 'playlist_mincount': 45,
+ }]
+
+ def _real_extract(self, url):
+ album_id = self._match_id(url)
+ album_data = self._download_json(
+ 'http://www.tudou.com/tvp/alist.action?acode=%s' % album_id, album_id)
+ entries = [self.url_result(
+ 'http://www.tudou.com/programs/view/%s' % item['icode'],
+ 'Tudou', item['icode'],
+ item['kw']) for item in album_data['items']]
+ return self.playlist_result(entries, album_id)
info = self._download_json(
'http://www.tv4play.se/player/assets/%s.json' % video_id, video_id, 'Downloading video info JSON')
- # If is_geo_restricted is true, it doesn't neceserally mean we can't download it
+ # If is_geo_restricted is true, it doesn't necessarily mean we can't download it
if info['is_geo_restricted']:
self.report_warning('This content might not be available in your country due to licensing restrictions.')
if info['requires_subscription']:
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import remove_end
+
+
+class TwentyMinutenIE(InfoExtractor):
+ IE_NAME = '20min'
+ _VALID_URL = r'https?://(?:www\.)?20min\.ch/(?:videotv/*\?.*\bvid=(?P<id>\d+)|(?:[^/]+/)*(?P<display_id>[^/#?]+))'
+ _TESTS = [{
+ # regular video
+ 'url': 'http://www.20min.ch/videotv/?vid=469148&cid=2',
+ 'md5': 'b52d6bc6ea6398e6a38f12cfd418149c',
+ 'info_dict': {
+ 'id': '469148',
+ 'ext': 'flv',
+ 'title': '85 000 Franken für 15 perfekte Minuten',
+ 'description': 'Was die Besucher vom Silvesterzauber erwarten können. (Video: Alice Grosjean/Murat Temel)',
+ 'thumbnail': 'http://thumbnails.20min-tv.ch/server063/469148/frame-72-469148.jpg'
+ }
+ }, {
+ # news article with video
+ 'url': 'http://www.20min.ch/schweiz/news/story/-Wir-muessen-mutig-nach-vorne-schauen--22050469',
+ 'md5': 'cd4cbb99b94130cff423e967cd275e5e',
+ 'info_dict': {
+ 'id': '469408',
+ 'display_id': '-Wir-muessen-mutig-nach-vorne-schauen--22050469',
+ 'ext': 'flv',
+ 'title': '«Wir müssen mutig nach vorne schauen»',
+ 'description': 'Kein Land sei innovativer als die Schweiz, sagte Johann Schneider-Ammann in seiner Neujahrsansprache. Das Land müsse aber seine Hausaufgaben machen.',
+ 'thumbnail': 'http://www.20min.ch/images/content/2/2/0/22050469/10/teaserbreit.jpg'
+ }
+ }, {
+ 'url': 'http://www.20min.ch/videotv/?cid=44&vid=468738',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.20min.ch/ro/sortir/cinema/story/Grandir-au-bahut--c-est-dur-18927411',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ display_id = mobj.group('display_id') or video_id
+
+ webpage = self._download_webpage(url, display_id)
+
+ title = self._html_search_regex(
+ r'<h1>.*?<span>(.+?)</span></h1>',
+ webpage, 'title', default=None)
+ if not title:
+ title = remove_end(re.sub(
+ r'^20 [Mm]inuten.*? -', '', self._og_search_title(webpage)), ' - News')
+
+ if not video_id:
+ video_id = self._search_regex(
+ r'"file\d?"\s*,\s*\"(\d+)', webpage, 'video id')
+
+ description = self._html_search_meta(
+ 'description', webpage, 'description')
+ thumbnail = self._og_search_thumbnail(webpage)
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'url': 'http://speed.20min-tv.ch/%sm.flv' % video_id,
+ 'title': title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ }
float_or_none,
int_or_none,
sanitized_Request,
+ unescapeHTML,
)
_VALID_URL = r'https?://www\.udemy\.com/(?:[^#]+#/lecture/|lecture/view/?\?lectureId=)(?P<id>\d+)'
_LOGIN_URL = 'https://www.udemy.com/join/login-popup/?displayType=ajax&showSkipButton=1'
_ORIGIN_URL = 'https://www.udemy.com'
- _SUCCESSFULLY_ENROLLED = '>You have enrolled in this course!<'
- _ALREADY_ENROLLED = '>You are already taking this course.<'
_NETRC_MACHINE = 'udemy'
_TESTS = [{
}]
def _enroll_course(self, webpage, course_id):
- enroll_url = self._search_regex(
+ checkout_url = unescapeHTML(self._search_regex(
+ r'href=(["\'])(?P<url>https?://(?:www\.)?udemy\.com/payment/checkout/.+?)\1',
+ webpage, 'checkout url', group='url', default=None))
+ if checkout_url:
+ raise ExtractorError(
+ 'Course %s is not free. You have to pay for it before you can download. '
+ 'Use this URL to confirm purchase: %s' % (course_id, checkout_url), expected=True)
+
+ enroll_url = unescapeHTML(self._search_regex(
r'href=(["\'])(?P<url>https?://(?:www\.)?udemy\.com/course/subscribe/.+?)\1',
- webpage, 'enroll url', group='url',
- default='https://www.udemy.com/course/subscribe/?courseId=%s' % course_id)
- webpage = self._download_webpage(enroll_url, course_id, 'Enrolling in the course')
- if self._SUCCESSFULLY_ENROLLED in webpage:
- self.to_screen('%s: Successfully enrolled in' % course_id)
- elif self._ALREADY_ENROLLED in webpage:
- self.to_screen('%s: Already enrolled in' % course_id)
+ webpage, 'enroll url', group='url', default=None))
+ if enroll_url:
+ webpage = self._download_webpage(enroll_url, course_id, 'Enrolling in the course')
+ if '>You have enrolled in' in webpage:
+ self.to_screen('%s: Successfully enrolled in the course' % course_id)
def _download_lecture(self, course_id, lecture_id):
return self._download_json(
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import int_or_none
-
-
-class UltimediaIE(InfoExtractor):
- _VALID_URL = r'''(?x)
- https?://(?:www\.)?ultimedia\.com/
- (?:
- deliver/
- (?P<embed_type>
- generic|
- musique
- )
- (?:/[^/]+)*/
- (?:
- src|
- article
- )|
- default/index/video
- (?P<site_type>
- generic|
- music
- )
- /id
- )/(?P<id>[\d+a-z]+)'''
- _TESTS = [{
- # news
- 'url': 'https://www.ultimedia.com/default/index/videogeneric/id/s8uk0r',
- 'md5': '276a0e49de58c7e85d32b057837952a2',
- 'info_dict': {
- 'id': 's8uk0r',
- 'ext': 'mp4',
- 'title': 'Loi sur la fin de vie: le texte prévoit un renforcement des directives anticipées',
- 'thumbnail': 're:^https?://.*\.jpg',
- 'duration': 74,
- 'upload_date': '20150317',
- 'timestamp': 1426604939,
- 'uploader_id': '3fszv',
- },
- }, {
- # music
- 'url': 'https://www.ultimedia.com/default/index/videomusic/id/xvpfp8',
- 'md5': '2ea3513813cf230605c7e2ffe7eca61c',
- 'info_dict': {
- 'id': 'xvpfp8',
- 'ext': 'mp4',
- 'title': 'Two - C\'est La Vie (clip)',
- 'thumbnail': 're:^https?://.*\.jpg',
- 'duration': 233,
- 'upload_date': '20150224',
- 'timestamp': 1424760500,
- 'uploader_id': '3rfzk',
- },
- }]
-
- @staticmethod
- def _extract_url(webpage):
- mobj = re.search(
- r'<(?:iframe|script)[^>]+src=["\'](?P<url>(?:https?:)?//(?:www\.)?ultimedia\.com/deliver/(?:generic|musique)(?:/[^/]+)*/(?:src|article)/[\d+a-z]+)',
- webpage)
- if mobj:
- return mobj.group('url')
-
- def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- video_id = mobj.group('id')
- video_type = mobj.group('embed_type') or mobj.group('site_type')
- if video_type == 'music':
- video_type = 'musique'
-
- deliver_info = self._download_json(
- 'http://www.ultimedia.com/deliver/video?video=%s&topic=%s' % (video_id, video_type),
- video_id)
-
- yt_id = deliver_info.get('yt_id')
- if yt_id:
- return self.url_result(yt_id, 'Youtube')
-
- jwconf = deliver_info['jwconf']
-
- formats = []
- for source in jwconf['playlist'][0]['sources']:
- formats.append({
- 'url': source['file'],
- 'format_id': source.get('label'),
- })
-
- self._sort_formats(formats)
-
- title = deliver_info['title']
- thumbnail = jwconf.get('image')
- duration = int_or_none(deliver_info.get('duration'))
- timestamp = int_or_none(deliver_info.get('release_time'))
- uploader_id = deliver_info.get('owner_id')
-
- return {
- 'id': video_id,
- 'title': title,
- 'thumbnail': thumbnail,
- 'duration': duration,
- 'timestamp': timestamp,
- 'uploader_id': uploader_id,
- 'formats': formats,
- }
webpage = self._download_webpage(url, video_id)
- files = set(re.findall(r'file\s*:\s*"([^"]+)"', webpage))
+ files = set(re.findall(r'file\s*:\s*"(/[^"]+)"', webpage))
quality = qualities(['SD', 'HD'])
formats = []
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
- # some sites use this embed format (see: http://github.com/rg3/youtube-dl/issues/2990)
+ # some sites use this embed format (see: https://github.com/rg3/youtube-dl/issues/2990)
if m.group('type') == 'embed/recorded':
video_id = m.group('id')
desktop_url = 'http://www.ustream.tv/recorded/' + video_id
class VideoMegaIE(InfoExtractor):
+ _WORKING = False
_VALID_URL = r'(?:videomega:|https?://(?:www\.)?videomega\.tv/(?:(?:view|iframe|cdn)\.php)?\?ref=)(?P<id>[A-Za-z0-9]+)'
_TESTS = [{
'url': 'http://videomega.tv/cdn.php?ref=AOSQBJYKIDDIKYJBQSOA',
'skip_download': True,
},
}, {
- # season single serie with og:video:iframe
+ # season single series with og:video:iframe
'url': 'http://videomore.ru/poslednii_ment/1_sezon/14_seriya',
'only_matching': True,
}, {
class VideoTtIE(InfoExtractor):
+ _WORKING = False
ID_NAME = 'video.tt'
IE_DESC = 'video.tt - Your True Tube'
_VALID_URL = r'http://(?:www\.)?video\.tt/(?:(?:video|embed)/|watch_video\.php\?v=)(?P<id>[\da-zA-Z]{9})'
self._sort_formats(formats)
- synopsis = info.get('Synopsis', {})
+ synopsis = info.get('Synopsis') or {}
# Prefer title outside synopsis since it's less messy
title = (info.get('Title') or synopsis['Title']).strip()
- description = synopsis.get('Detailed') or info.get('Synopsis', {}).get('Short')
+ description = synopsis.get('Detailed') or (info.get('Synopsis') or {}).get('Short')
duration = int_or_none(info.get('Duration'))
timestamp = parse_iso8601(info.get('ReleaseDate'))
compat_urlparse,
)
from ..utils import (
+ determine_ext,
encode_dict,
ExtractorError,
InAdvancePagedList,
'url': 'https://vimeo.com/groups/travelhd/videos/22439234',
'only_matching': True,
},
+ {
+ # source file returns 403: Forbidden
+ 'url': 'https://vimeo.com/7809605',
+ 'only_matching': True,
+ },
]
@staticmethod
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage)
if mobj:
player_url = unescapeHTML(mobj.group('url'))
- surl = smuggle_url(player_url, {'Referer': url})
+ surl = smuggle_url(player_url, {'http_headers': {'Referer': url}})
return surl
# Look for embedded (swf embed) Vimeo player
mobj = re.search(
self._login()
def _real_extract(self, url):
- url, data = unsmuggle_url(url)
+ url, data = unsmuggle_url(url, {})
headers = std_headers
- if data is not None:
+ if 'http_headers' in data:
headers = headers.copy()
- headers.update(data)
+ headers.update(data['http_headers'])
if 'Referer' not in headers:
headers['Referer'] = url
raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option')
if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
- if data and '_video_password_verified' in data:
+ if '_video_password_verified' in data:
raise ExtractorError('video password verification failed!')
self._verify_video_password(url, video_id, webpage)
return self._real_extract(
if config.get('view') == 4:
config = self._verify_player_video_password(url, video_id)
+ if '>You rented this title.<' in webpage:
+ feature_id = config.get('video', {}).get('vod', {}).get('feature_id')
+ if feature_id and not data.get('force_feature_id', False):
+ return self.url_result(smuggle_url(
+ 'https://player.vimeo.com/player/%s' % feature_id,
+ {'force_feature_id': True}), 'Vimeo')
+
# Extract title
video_title = config["video"]["title"]
download_data = self._download_json(download_request, video_id, fatal=False)
if download_data:
source_file = download_data.get('source_file')
- if source_file and not source_file.get('is_cold') and not source_file.get('is_defrosting'):
- formats.append({
- 'url': source_file['download_url'],
- 'ext': source_file['extension'].lower(),
- 'width': int_or_none(source_file.get('width')),
- 'height': int_or_none(source_file.get('height')),
- 'filesize': parse_filesize(source_file.get('size')),
- 'format_id': source_file.get('public_name', 'Original'),
- 'preference': 1,
- })
+ if isinstance(source_file, dict):
+ download_url = source_file.get('download_url')
+ if download_url and not source_file.get('is_cold') and not source_file.get('is_defrosting'):
+ source_name = source_file.get('public_name', 'Original')
+ if self._is_valid_url(download_url, video_id, '%s video' % source_name):
+ ext = source_file.get('extension', determine_ext(download_url)).lower()
+ formats.append({
+ 'url': download_url,
+ 'ext': ext,
+ 'width': int_or_none(source_file.get('width')),
+ 'height': int_or_none(source_file.get('height')),
+ 'filesize': parse_filesize(source_file.get('size')),
+ 'format_id': source_name,
+ 'preference': 1,
+ })
config_files = config['video'].get('files') or config['request'].get('files', {})
for f in config_files.get('progressive', []):
video_url = f.get('url')
from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
+ NO_DEFAULT,
sanitized_Request,
)
class VodlockerIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?vodlocker\.com/(?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:\..*?)?'
+ _VALID_URL = r'https?://(?:www\.)?vodlocker\.(?:com|city)/(?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:\..*?)?'
_TESTS = [{
'url': 'http://vodlocker.com/e8wvyzz4sl42',
webpage = self._download_webpage(
req, video_id, 'Downloading video page')
+ def extract_file_url(html, default=NO_DEFAULT):
+ return self._search_regex(
+ r'file:\s*"(http[^\"]+)",', html, 'file url', default=default)
+
+ video_url = extract_file_url(webpage, default=None)
+
+ if not video_url:
+ embed_url = self._search_regex(
+ r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?vodlocker\.(?:com|city)/embed-.+?)\1',
+ webpage, 'embed url', group='url')
+ embed_webpage = self._download_webpage(
+ embed_url, video_id, 'Downloading embed webpage')
+ video_url = extract_file_url(embed_webpage)
+ thumbnail_webpage = embed_webpage
+ else:
+ thumbnail_webpage = webpage
+
title = self._search_regex(
r'id="file_title".*?>\s*(.*?)\s*<(?:br|span)', webpage, 'title')
thumbnail = self._search_regex(
- r'image:\s*"(http[^\"]+)",', webpage, 'thumbnail')
- url = self._search_regex(
- r'file:\s*"(http[^\"]+)",', webpage, 'file url')
+ r'image:\s*"(http[^\"]+)",', thumbnail_webpage, 'thumbnail', fatal=False)
formats = [{
'format_id': 'sd',
- 'url': url,
+ 'url': video_url,
}]
return {
class VRTIE(InfoExtractor):
- _VALID_URL = r'https?://(?:deredactie|sporza|cobra)\.be/cm/(?:[^/]+/)+(?P<id>[^/]+)/*'
+ _VALID_URL = r'https?://(?:deredactie|sporza|cobra(?:\.canvas)?)\.be/cm/(?:[^/]+/)+(?P<id>[^/]+)/*'
_TESTS = [
# deredactie.be
{
'duration': 661,
}
},
+ {
+ 'url': 'http://cobra.canvas.be/cm/cobra/videozone/rubriek/film-videozone/1.2377055',
+ 'only_matching': True,
+ }
]
def _real_extract(self, url):
if mobj:
formats.extend(self._extract_m3u8_formats(
'%s/%s' % (mobj.group('server'), mobj.group('path')),
- video_id, 'mp4'))
+ video_id, 'mp4', m3u8_id='hls'))
mobj = re.search(r'data-video-src="(?P<src>[^"]+)"', webpage)
if mobj:
formats.extend(self._extract_f4m_formats(
- '%s/manifest.f4m' % mobj.group('src'), video_id))
+ '%s/manifest.f4m' % mobj.group('src'), video_id, f4m_id='hds'))
self._sort_formats(formats)
title = self._og_search_title(webpage)
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class WeiqiTVIE(InfoExtractor):
+ IE_DESC = 'WQTV'
+ _VALID_URL = r'http://www\.weiqitv\.com/index/video_play\?videoId=(?P<id>[A-Za-z0-9]+)'
+
+ _TESTS = [{
+ 'url': 'http://www.weiqitv.com/index/video_play?videoId=53c744f09874f0e76a8b46f3',
+ 'md5': '26450599afd64c513bc77030ad15db44',
+ 'info_dict': {
+ 'id': '53c744f09874f0e76a8b46f3',
+ 'ext': 'mp4',
+ 'title': '2013年度盘点',
+ },
+ }, {
+ 'url': 'http://www.weiqitv.com/index/video_play?videoId=567379a2d4c36cca518b4569',
+ 'info_dict': {
+ 'id': '567379a2d4c36cca518b4569',
+ 'ext': 'mp4',
+ 'title': '民国围棋史',
+ },
+ }, {
+ 'url': 'http://www.weiqitv.com/index/video_play?videoId=5430220a9874f088658b4567',
+ 'info_dict': {
+ 'id': '5430220a9874f088658b4567',
+ 'ext': 'mp4',
+ 'title': '二路托过的手段和运用',
+ },
+ }]
+
+ def _real_extract(self, url):
+ media_id = self._match_id(url)
+ page = self._download_webpage(url, media_id)
+
+ info_json_str = self._search_regex(
+ 'var\s+video\s*=\s*(.+});', page, 'info json str')
+ info_json = self._parse_json(info_json_str, media_id)
+
+ letvcloud_url = self._search_regex(
+ 'var\s+letvurl\s*=\s*"([^"]+)', page, 'letvcloud url')
+
+ return {
+ '_type': 'url_transparent',
+ 'ie_key': 'LetvCloud',
+ 'url': letvcloud_url,
+ 'title': info_json['name'],
+ 'id': media_id,
+ }
from .common import InfoExtractor
from ..utils import (
- unified_strdate,
- str_to_int,
+ float_or_none,
int_or_none,
- parse_duration,
+ unified_strdate,
)
'title': 'FemaleAgent Shy beauty takes the bait',
'upload_date': '20121014',
'uploader': 'Ruseful2011',
- 'duration': 893,
+ 'duration': 893.52,
'age_limit': 18,
}
},
'title': 'Britney Spears Sexy Booty',
'upload_date': '20130914',
'uploader': 'jojo747400',
- 'duration': 200,
+ 'duration': 200.48,
'age_limit': 18,
}
},
webpage = self._download_webpage(mrss_url, video_id)
title = self._html_search_regex(
- [r'<title>(?P<title>.+?)(?:, (?:[^,]+? )?Porn: xHamster| - xHamster\.com)</title>',
- r'<h1>([^<]+)</h1>'], webpage, 'title')
+ [r'<h1[^>]*>([^<]+)</h1>',
+ r'<meta[^>]+itemprop=".*?caption.*?"[^>]+content="(.+?)"',
+ r'<title[^>]*>(.+?)(?:,\s*[^,]*?\s*Porn\s*[^,]*?:\s*xHamster[^<]*| - xHamster\.com)</title>'],
+ webpage, 'title')
# Only a few videos have an description
mobj = re.search(r'<span>Description: </span>([^<]+)', webpage)
description = mobj.group(1) if mobj else None
- upload_date = self._html_search_regex(r'hint=\'(\d{4}-\d{2}-\d{2}) \d{2}:\d{2}:\d{2} [A-Z]{3,4}\'',
- webpage, 'upload date', fatal=False)
- if upload_date:
- upload_date = unified_strdate(upload_date)
+ upload_date = unified_strdate(self._search_regex(
+ r'hint=["\'](\d{4}-\d{2}-\d{2}) \d{2}:\d{2}:\d{2} [A-Z]{3,4}',
+ webpage, 'upload date', fatal=False))
uploader = self._html_search_regex(
- r"<a href='[^']+xhamster\.com/user/[^>]+>(?P<uploader>[^<]+)",
+ r'<span[^>]+itemprop=["\']author[^>]+><a[^>]+href=["\'].+?xhamster\.com/user/[^>]+>(?P<uploader>.+?)</a>',
webpage, 'uploader', default='anonymous')
thumbnail = self._search_regex(
r'''<video[^>]+poster=(?P<q>["'])(?P<thumbnail>.+?)(?P=q)[^>]*>'''],
webpage, 'thumbnail', fatal=False, group='thumbnail')
- duration = parse_duration(self._html_search_regex(r'<span>Runtime:</span> (\d+:\d+)</div>',
- webpage, 'duration', fatal=False))
+ duration = float_or_none(self._search_regex(
+ r'(["\'])duration\1\s*:\s*(["\'])(?P<duration>.+?)\2',
+ webpage, 'duration', fatal=False, group='duration'))
- view_count = self._html_search_regex(r'<span>Views:</span> ([^<]+)</div>', webpage, 'view count', fatal=False)
- if view_count:
- view_count = str_to_int(view_count)
+ view_count = int_or_none(self._search_regex(
+ r'content=["\']User(?:View|Play)s:(\d+)',
+ webpage, 'view count', fatal=False))
mobj = re.search(r"hint='(?P<likecount>\d+) Likes / (?P<dislikecount>\d+) Dislikes'", webpage)
(like_count, dislike_count) = (mobj.group('likecount'), mobj.group('dislikecount')) if mobj else (None, None)
from __future__ import unicode_literals
+import itertools
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
+ int_or_none,
parse_duration,
sanitized_Request,
str_to_int,
class XTubeIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?(?P<url>xtube\.com/watch\.php\?v=(?P<id>[^/?&#]+))'
+ _VALID_URL = r'(?:xtube:|https?://(?:www\.)?xtube\.com/watch\.php\?.*\bv=)(?P<id>[^/?&#]+)'
_TEST = {
'url': 'http://www.xtube.com/watch.php?v=kVTUy_G222_',
'md5': '092fbdd3cbe292c920ef6fc6a8a9cdab',
def _real_extract(self, url):
video_id = self._match_id(url)
- req = sanitized_Request(url)
+ req = sanitized_Request('http://www.xtube.com/watch.php?v=%s' % video_id)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
class XTubeUserIE(InfoExtractor):
IE_DESC = 'XTube user profile'
- _VALID_URL = r'https?://(?:www\.)?xtube\.com/community/profile\.php\?(.*?)user=(?P<username>[^&#]+)(?:$|[&#])'
+ _VALID_URL = r'https?://(?:www\.)?xtube\.com/profile/(?P<id>[^/]+-\d+)'
_TEST = {
- 'url': 'http://www.xtube.com/community/profile.php?user=greenshowers',
+ 'url': 'http://www.xtube.com/profile/greenshowers-4056496',
'info_dict': {
- 'id': 'greenshowers',
+ 'id': 'greenshowers-4056496',
'age_limit': 18,
},
'playlist_mincount': 155,
}
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- username = mobj.group('username')
-
- profile_page = self._download_webpage(
- url, username, note='Retrieving profile page')
-
- video_count = int(self._search_regex(
- r'<strong>%s\'s Videos \(([0-9]+)\)</strong>' % username, profile_page,
- 'video count'))
-
- PAGE_SIZE = 25
- urls = []
- page_count = (video_count + PAGE_SIZE + 1) // PAGE_SIZE
- for n in range(1, page_count + 1):
- lpage_url = 'http://www.xtube.com/user_videos.php?page=%d&u=%s' % (n, username)
- lpage = self._download_webpage(
- lpage_url, username,
- note='Downloading page %d/%d' % (n, page_count))
- urls.extend(
- re.findall(r'addthis:url="([^"]+)"', lpage))
-
- return {
- '_type': 'playlist',
- 'id': username,
- 'age_limit': 18,
- 'entries': [{
- '_type': 'url',
- 'url': eurl,
- 'ie_key': 'XTube',
- } for eurl in urls]
- }
+ user_id = self._match_id(url)
+
+ entries = []
+ for pagenum in itertools.count(1):
+ request = sanitized_Request(
+ 'http://www.xtube.com/profile/%s/videos/%d' % (user_id, pagenum),
+ headers={
+ 'Cookie': 'popunder=4',
+ 'X-Requested-With': 'XMLHttpRequest',
+ 'Referer': url,
+ })
+
+ page = self._download_json(
+ request, user_id, 'Downloading videos JSON page %d' % pagenum)
+
+ html = page.get('html')
+ if not html:
+ break
+
+ for _, video_id in re.findall(r'data-plid=(["\'])(.+?)\1', html):
+ entries.append(self.url_result('xtube:%s' % video_id, XTubeIE.ie_key()))
+
+ page_count = int_or_none(page.get('pageCount'))
+ if not page_count or pagenum == page_count:
+ break
+
+ playlist = self.playlist_result(entries, user_id)
+ playlist['age_limit'] = 18
+ return playlist
r'root\.App\.Cache\.context\.videoCache\.curVideo = \{"([^"]+)"',
r'"first_videoid"\s*:\s*"([^"]+)"',
r'%s[^}]*"ccm_id"\s*:\s*"([^"]+)"' % re.escape(page_id),
+ r'<article[^>]data-uuid=["\']([^"\']+)',
+ r'yahoo://article/view\?.*\buuid=([^&"\']+)',
]
video_id = self._search_regex(
CONTENT_ID_REGEXES, webpage, 'content ID')
get_element_by_attribute,
get_element_by_id,
int_or_none,
+ mimetype2ext,
orderedSet,
parse_duration,
remove_quotes,
},
'params': {
'skip_download': 'requires avconv',
- }
+ },
+ 'skip': 'This live event has ended.',
},
# Extraction from multiple DASH manifests (https://github.com/rg3/youtube-dl/pull/6097)
{
},
{
# Title with JS-like syntax "};" (see https://github.com/rg3/youtube-dl/issues/7468)
+ # Also tests cut-off URL expansion in video description (see
+ # https://github.com/rg3/youtube-dl/issues/1892,
+ # https://github.com/rg3/youtube-dl/issues/8164)
'url': 'https://www.youtube.com/watch?v=lsguqyKfVQg',
'info_dict': {
'id': 'lsguqyKfVQg',
try:
args = player_config['args']
caption_url = args['ttsurl']
+ if not caption_url:
+ self._downloader.report_warning(err_msg)
+ return {}
timestamp = args['timestamp']
# We get the available subtitles
list_params = compat_urllib_parse.urlencode({
full_info.update(f)
codecs = r.attrib.get('codecs')
if codecs:
- if full_info.get('acodec') == 'none' and 'vcodec' not in full_info:
+ if full_info.get('acodec') == 'none':
full_info['vcodec'] = codecs
- elif full_info.get('vcodec') == 'none' and 'acodec' not in full_info:
+ elif full_info.get('vcodec') == 'none':
full_info['acodec'] = codecs
formats.append(full_info)
else:
video_description = re.sub(r'''(?x)
<a\s+
(?:[a-zA-Z-]+="[^"]+"\s+)*?
- title="([^"]+)"\s+
+ (?:title|href)="([^"]+)"\s+
(?:[a-zA-Z-]+="[^"]+"\s+)*?
- class="yt-uix-redirect-link"\s*>
- [^<]+
+ class="(?:yt-uix-redirect-link|yt-uix-sessionlink[^"]*)"[^>]*>
+ [^<]+\.{3}\s*
</a>
''', r'\1', video_description)
video_description = clean_html(video_description)
if 'ratebypass' not in url:
url += '&ratebypass=yes'
+ dct = {
+ 'format_id': format_id,
+ 'url': url,
+ 'player_url': player_url,
+ }
+ if format_id in self._formats:
+ dct.update(self._formats[format_id])
+
# Some itags are not included in DASH manifest thus corresponding formats will
# lack metadata (see https://github.com/rg3/youtube-dl/pull/5993).
# Trying to extract metadata from url_encoded_fmt_stream_map entry.
mobj = re.search(r'^(?P<width>\d+)[xX](?P<height>\d+)$', url_data.get('size', [''])[0])
width, height = (int(mobj.group('width')), int(mobj.group('height'))) if mobj else (None, None)
- dct = {
- 'format_id': format_id,
- 'url': url,
- 'player_url': player_url,
+
+ more_fields = {
'filesize': int_or_none(url_data.get('clen', [None])[0]),
'tbr': float_or_none(url_data.get('bitrate', [None])[0], 1000),
'width': width,
'fps': int_or_none(url_data.get('fps', [None])[0]),
'format_note': url_data.get('quality_label', [None])[0] or url_data.get('quality', [None])[0],
}
+ for key, value in more_fields.items():
+ if value:
+ dct[key] = value
type_ = url_data.get('type', [None])[0]
if type_:
type_split = type_.split(';')
kind_ext = type_split[0].split('/')
if len(kind_ext) == 2:
- kind, ext = kind_ext
- dct['ext'] = ext
+ kind, _ = kind_ext
+ dct['ext'] = mimetype2ext(type_split[0])
if kind in ('audio', 'video'):
codecs = None
for mobj in re.finditer(
if codecs:
codecs = codecs.split(',')
if len(codecs) == 2:
- acodec, vcodec = codecs[0], codecs[1]
+ acodec, vcodec = codecs[1], codecs[0]
else:
acodec, vcodec = (codecs[0], 'none') if kind == 'audio' else ('none', codecs[0])
dct.update({
'acodec': acodec,
'vcodec': vcodec,
})
- if format_id in self._formats:
- dct.update(self._formats[format_id])
formats.append(dct)
elif video_info.get('hlsvp'):
manifest_url = video_info['hlsvp'][0]
for a_format in formats:
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
else:
+ unavailable_message = self._html_search_regex(
+ r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
+ video_webpage, 'unavailable message', default=None)
+ if unavailable_message:
+ raise ExtractorError(unavailable_message, expected=True)
raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
# Look for the DASH manifest
--- /dev/null
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ determine_ext,
+ str_to_int,
+)
+
+
+class ZippCastIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?zippcast\.com/(?:video/|videoview\.php\?.*\bvplay=)(?P<id>[0-9a-zA-Z]+)'
+ _TESTS = [{
+ # m3u8, hq direct link
+ 'url': 'http://www.zippcast.com/video/c9cfd5c7e44dbc29c81',
+ 'md5': '5ea0263b5606866c4d6cda0fc5e8c6b6',
+ 'info_dict': {
+ 'id': 'c9cfd5c7e44dbc29c81',
+ 'ext': 'mp4',
+ 'title': '[Vinesauce] Vinny - Digital Space Traveler',
+ 'description': 'Muted on youtube, but now uploaded in it\'s original form.',
+ 'thumbnail': 're:^https?://.*\.jpg$',
+ 'uploader': 'vinesauce',
+ 'view_count': int,
+ 'categories': ['Entertainment'],
+ 'tags': list,
+ },
+ }, {
+ # f4m, lq ipod direct link
+ 'url': 'http://www.zippcast.com/video/b79c0a233e9c6581775',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.zippcast.com/videoview.php?vplay=c9cfd5c7e44dbc29c81&auto=no',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ webpage = self._download_webpage(
+ 'http://www.zippcast.com/video/%s' % video_id, video_id)
+
+ formats = []
+ video_url = self._search_regex(
+ r'<source[^>]+src=(["\'])(?P<url>.+?)\1', webpage,
+ 'video url', default=None, group='url')
+ if video_url:
+ formats.append({
+ 'url': video_url,
+ 'format_id': 'http',
+ 'preference': 0, # direct link is almost always of worse quality
+ })
+ src_url = self._search_regex(
+ r'src\s*:\s*(?:escape\()?(["\'])(?P<url>http://.+?)\1',
+ webpage, 'src', default=None, group='url')
+ ext = determine_ext(src_url)
+ if ext == 'm3u8':
+ formats.extend(self._extract_m3u8_formats(
+ src_url, video_id, 'mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False))
+ elif ext == 'f4m':
+ formats.extend(self._extract_f4m_formats(
+ src_url, video_id, f4m_id='hds', fatal=False))
+ self._sort_formats(formats)
+
+ title = self._og_search_title(webpage)
+ description = self._og_search_description(webpage) or self._html_search_meta(
+ 'description', webpage)
+ uploader = self._search_regex(
+ r'<a[^>]+href="https?://[^/]+/profile/[^>]+>([^<]+)</a>',
+ webpage, 'uploader', fatal=False)
+ thumbnail = self._og_search_thumbnail(webpage)
+ view_count = str_to_int(self._search_regex(
+ r'>([\d,.]+) views!', webpage, 'view count', fatal=False))
+
+ categories = re.findall(
+ r'<a[^>]+href="https?://[^/]+/categories/[^"]+">([^<]+),?<',
+ webpage)
+ tags = re.findall(
+ r'<a[^>]+href="https?://[^/]+/search/tags/[^"]+">([^<]+),?<',
+ webpage)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ 'uploader': uploader,
+ 'view_count': view_count,
+ 'categories': categories,
+ 'tags': tags,
+ 'formats': formats,
+ }
'--sub-lang', '--sub-langs', '--srt-lang',
action='callback', dest='subtitleslangs', metavar='LANGS', type='str',
default=[], callback=_comma_separated_values_options_callback,
- help='Languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'')
+ help='Languages of the subtitles to download (optional) separated by commas, use --list-subs for available language tags')
downloader = optparse.OptionGroup(parser, 'Download Options')
downloader.add_option(
elif mname in _builtin_classes:
res = _builtin_classes[mname]
else:
- # Assume unitialized
+ # Assume uninitialized
# TODO warn here
res = undefined
stack.append(res)
def rsa_verify(message, signature, key):
- from struct import pack
from hashlib import sha256
-
assert isinstance(message, bytes)
- block_size = 0
- n = key[0]
- while n:
- block_size += 1
- n >>= 8
- signature = pow(int(signature, 16), key[1], key[0])
- raw_bytes = []
- while signature:
- raw_bytes.insert(0, pack("B", signature & 0xFF))
- signature >>= 8
- signature = (block_size - len(raw_bytes)) * b'\x00' + b''.join(raw_bytes)
- if signature[0:2] != b'\x00\x01':
- return False
- signature = signature[2:]
- if b'\x00' not in signature:
- return False
- signature = signature[signature.index(b'\x00') + 1:]
- if not signature.startswith(b'\x30\x31\x30\x0D\x06\x09\x60\x86\x48\x01\x65\x03\x04\x02\x01\x05\x00\x04\x20'):
- return False
- signature = signature[19:]
- if signature != sha256(message).digest():
+ byte_size = (len(bin(key[0])) - 2 + 8 - 1) // 8
+ signature = ('%x' % pow(int(signature, 16), key[1], key[0])).encode()
+ signature = (byte_size * 2 - len(signature)) * b'0' + signature
+ asn1 = b'3031300d060960864801650304020105000420'
+ asn1 += sha256(message).hexdigest().encode()
+ if byte_size < len(asn1) // 2 + 11:
return False
- return True
+ expected = b'0001' + (byte_size - len(asn1) // 2 - 3) * b'ff' + b'00' + asn1
+ return expected == signature
def update_self(to_screen, verbose, opener):
'January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
+KNOWN_EXTENSIONS = (
+ 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'aac',
+ 'flv', 'f4v', 'f4a', 'f4b',
+ 'webm', 'ogg', 'ogv', 'oga', 'ogx', 'spx', 'opus',
+ 'mkv', 'mka', 'mk3d',
+ 'avi', 'divx',
+ 'mov',
+ 'asf', 'wmv', 'wma',
+ '3gp', '3g2',
+ 'mp3',
+ 'flac',
+ 'ape',
+ 'wav',
+ 'f4f', 'f4m', 'm3u8', 'smil')
+
def preferredencoding():
"""Get preferred encoding.
guess = url.partition('?')[0].rpartition('.')[2]
if re.match(r'^[A-Za-z0-9]+$', guess):
return guess
- elif guess.rstrip('/') in (
- 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'aac',
- 'flv', 'f4v', 'f4a', 'f4b',
- 'webm', 'ogg', 'ogv', 'oga', 'ogx', 'spx', 'opus',
- 'mkv', 'mka', 'mk3d',
- 'avi', 'divx',
- 'mov',
- 'asf', 'wmv', 'wma',
- '3gp', '3g2',
- 'mp3',
- 'flac',
- 'ape',
- 'wav',
- 'f4f', 'f4m', 'm3u8', 'smil'):
+ # Try extract ext from URLs like http://example.com/foo/bar.mp4/?download
+ elif guess.rstrip('/') in KNOWN_EXTENSIONS:
return guess.rstrip('/')
else:
return default_ext
if sign == '-':
time = -time
unit = match.group('unit')
- # A bad aproximation?
+ # A bad approximation?
if unit == 'month':
unit = 'day'
time *= 30
if s is None:
return None
- # The lower-case forms are of course incorrect and inofficial,
+ # The lower-case forms are of course incorrect and unofficial,
# but we support those too
_UNIT_TABLE = {
'B': 1,
_, _, res = mt.rpartition('/')
return {
- 'x-ms-wmv': 'wmv',
- 'x-mp4-fragmented': 'mp4',
+ '3gpp': '3gp',
'ttml+xml': 'ttml',
+ 'x-flv': 'flv',
+ 'x-mp4-fragmented': 'mp4',
+ 'x-ms-wmv': 'wmv',
}.get(res, res)
from __future__ import unicode_literals
-__version__ = '2016.01.01'
+__version__ = '2016.01.23'