merged unescapeHTML branch; removed lxml dependency