Fix Brightcove detection when another Flash object is on the page
authorJoey Adams <joeyadams3.14159@gmail.com>
Sat, 12 Oct 2013 01:52:30 +0000 (21:52 -0400)
committerJoey Adams <joeyadams3.14159@gmail.com>
Sat, 12 Oct 2013 01:52:33 +0000 (21:52 -0400)
The regex used non-greedy match, but alas it failed on input like this:

    <object class="...> ... class="BrightcoveExperience"

It captured two objects and the intervening HTML.  This commit fixes this by
not allowing a ">" to appear before BrightcoveExperience.

Video in question: http://www.harpercollinschildrens.com/feature/petethecat/

youtube_dl/extractor/generic.py

index 7060c6f9258c28c9dcb18681c62882f52715edf9..d48c84f8d575111dd0459056efd33e7338b1a1df 100644 (file)
@@ -121,7 +121,7 @@ class GenericIE(InfoExtractor):
 
         self.report_extraction(video_id)
         # Look for BrightCove:
-        m_brightcove = re.search(r'<object.+?class=([\'"]).*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
+        m_brightcove = re.search(r'<object[^>]+?class=([\'"])[^>]*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
         if m_brightcove is not None:
             self.to_screen(u'Brightcove video detected.')
             bc_url = BrightcoveIE._build_brighcove_url(m_brightcove.group())