[extractor/common] Improve HTML5 entries extraction and add some realworld tests