How to set rule using regex in scrapy for extracting urls?
I want to crawl pages related to Disney on bloomberg websites. The url
follow pattern as
"http://bloomberg.com/news/2013-07-08/disney-welcometohomepageofdisney"
So, i have written below rule for it
rules = [
Rule(SgmlLinkExtractor(allow=('/news/*/disney*',)), follow=True),
]
but the above rule doesn't working as i want and i am getting crawled
pages output not related to Disney. please help to fix this rule.
No comments:
Post a Comment