Files
ScrapeGraphAI-experiments/README.md
2025-01-16 15:57:50 +08:00

31 lines
1.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ScrapeGraphAI
ScrapeGraphAI是一个用于网络爬虫和数据抓取的AI工具。
- https://github.com/ScrapeGraphAI/Scrapegraph-ai/blob/main/docs/chinese.md
- https://github.com/ScrapeGraphAI/ScrapegraphLib-Examples
- https://github.com/ScrapeGraphAI/ScrapegraphLib-Examples/blob/main/extras/authenticated_playwright.py
## Reference
https://www.aivi.fyi/aiagents/introduce-ScrapeGraphAI+LangChain+LangGraph
## Dependencies
```
pip install scrapegraphai
playwright install
pip install --upgrade duckduckgo-search
pip install scrapegraphai'[other-language-models]'
pip install scrapegraphai'[more-semantic-options]'
pip install scrapegraphai'[more-browser-options]'
ollama pull mistral-nemo
ollama list
```
## Tips
- Comment
- 小参数模型的api比调用gpt-4o的省钱很多
- Playwright +plugins 能解决一部分captcha。如果再加上llm基本就不是什么问题了
- 这个repo就是传统爬虫套了一个ai的壳子数据解析部分用ai来做代替以前的hard code, 反爬只能通过ip proxy (家宅ip供应商最好) + playwright or chrome driver&selenium attach到 chrome进程来解决