Nginx反爬虫 因为user-agent带有Bytespider爬虫标记,这可以通过Nginx规则来限定流氓爬虫的访问,直接返回403错误。 修改对应站点配置文件(注意是在server里面) Copy to Clipboard#forbidden-Scrapy if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { return 403; } #forbidden-UA if ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/| FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot| CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench| Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib| lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1| YandexBot|FlightDeckReports|Linguee Bot|^$" ) { return 403; } #forbidden not GET|HEAD|POST method access if ($request_method !~ ^(GET|HEAD|POST)$) { return 403; } 测试效果 curl -I -A 'jaunty' https://www.imzeng.com/ curl -I -A '' https://www.imzeng.com/ TAI泰2020-03-30T23:50:38+08:00代码|0 条评论 评论 取消回复必须登陆才能发表评论.
评论
必须登陆才能发表评论.