title: Video title, unescaped.
ext: Video filename extension.
+ Instead of url and ext, formats can also specified.
+
The following fields are optional:
format: The video format, defaults to ext (used for --get-format)
view_count: How many users have watched the video on the platform.
urlhandle: [internal] The urlHandle to be used to download the file,
like returned by urllib.request.urlopen
-
- The fields should all be Unicode strings.
+ age_limit: Age restriction for the video, as an integer (years)
+ formats: A list of dictionaries for each format available, it must
+ be ordered from worst to best quality. Potential fields:
+ * url Mandatory. The URL of the video file
+ * ext Will be calculated from url if missing
+ * format A human-readable description of the format
+ ("mp4 container with h264/opus").
+ Calculated from width and height if missing.
+ * format_id A short description of the format
+ ("mp4_h264_opus" or "19")
+ * width Width of the video, if known
+ * height Height of the video, if known
+
+ Unless mentioned otherwise, the fields should be Unicode strings.
Subclasses of this one should re-define the _real_initialize() and
_real_extract() methods and define a _VALID_URL regexp.
if m:
encoding = m.group(1)
else:
- m = re.search(br'<meta[^>]+charset="?([^"]+)[ /">]',
+ m = re.search(br'<meta[^>]+charset=[\'"]?([^\'")]+)[ /\'">]',
webpage_bytes[:1024])
if m:
encoding = m.group(1).decode('ascii')
self._og_regex('video')],
html, name, **kargs)
+ def _rta_search(self, html):
+ # See http://www.rtalabel.org/index.php?content=howtofaq#single
+ if re.search(r'(?ix)<meta\s+name="rating"\s+'
+ r' content="RTA-5042-1996-1400-1577-RTA"',
+ html):
+ return 18
+ return 0
+
+
class SearchInfoExtractor(InfoExtractor):
"""
Base class for paged search queries extractors.