This commit is contained in:
严浩
2025-01-13 17:21:40 +08:00
parent 456f56e40d
commit 320df7e2a5
5 changed files with 186 additions and 0 deletions

28
README.md Normal file
View File

@ -0,0 +1,28 @@
# ScrapeGraphAI
ScrapeGraphAI是一个用于网络爬虫和数据抓取的AI工具。
https://github.com/ScrapeGraphAI/Scrapegraph-ai/blob/main/docs/chinese.md
## Reference
https://www.aivi.fyi/aiagents/introduce-ScrapeGraphAI+LangChain+LangGraph#可用管道
## Dependencies
```
pip install scrapegraphai
playwright install
pip install -U duckduckgo-search
pip install scrapegraphai'[other-language-models]'
pip install scrapegraphai'[more-semantic-options]'
pip install scrapegraphai'[more-browser-options]'
ollama pull mistral-nemo
ollama list
```
## Tips
- Comment
- 小参数模型的api比调用gpt-4o的省钱很多
- Playwright +plugins 能解决一部分captcha。如果再加上llm基本就不是什么问题了
- 这个repo就是传统爬虫套了一个ai的壳子数据解析部分用ai来做代替以前的hard code, 反爬只能通过ip proxy (家宅ip供应商最好) + playwright or chrome driver&selenium attach到 chrome进程来解决