[extractor/common] Improved support for HTML5 subtitles
authorYen Chi Hsuan <yan12125@gmail.com>
Sat, 24 Sep 2016 06:20:42 +0000 (14:20 +0800)
committerYen Chi Hsuan <yan12125@gmail.com>
Sat, 24 Sep 2016 06:20:42 +0000 (14:20 +0800)
Ref: #10625

In a strict sense, <track>s with kind=captions are not subtitles. [1]
openload misuses this attribute, and I guess there will be more
examples, so I add it to common.py.

Also allow extracting information for subtitles-only <video> or <audio>
tags, which is the case of openload.

[1] https://www.w3.org/TR/html5/embedded-content-0.html#attr-track-kind

ChangeLog
youtube_dl/extractor/common.py

index a1c4df4793cfbcd8da0b23ba165b372e3808ce38..ebe4ff0e86e987b1c56f8f4c68259466dce7ee95 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+vesion <unreleased>
+
+Core
++ Improved support for HTML5 subtitles
+
+
 version 2016.09.24
 
 Core
index 9c8991542d02f46c8c228e120afd921c344b182b..5cb4479ec5271256cef27836ecbe1cdc5b8df3f3 100644 (file)
@@ -1828,7 +1828,7 @@ class InfoExtractor(object):
                 for track_tag in re.findall(r'<track[^>]+>', media_content):
                     track_attributes = extract_attributes(track_tag)
                     kind = track_attributes.get('kind')
-                    if not kind or kind == 'subtitles':
+                    if not kind or kind in ('subtitles', 'captions'):
                         src = track_attributes.get('src')
                         if not src:
                             continue
@@ -1836,7 +1836,7 @@ class InfoExtractor(object):
                         media_info['subtitles'].setdefault(lang, []).append({
                             'url': absolute_url(src),
                         })
-            if media_info['formats']:
+            if media_info['formats'] or media_info['subtitles']:
                 entries.append(media_info)
         return entries