搜索资源列表
delphi_searchengine
- Search over 200 internet search engines. will launch the users default browser and show the results.. This source uses TLinkLabel By Vitaly Zayko on a few of the tabs It is not needed by the search engine itself. however it is included in
heritrix-2.0.0-src
- Heritrix: Internet Archive Web Crawler The archive-crawler project is building a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
strigi.tar
- STrigi是一个高效的搜索引擎架构。可以迅速索引你的硬盘而不会拖慢你的系统。使得其成为一个迅速而且小型的桌面搜索系统。而且其可以索引多种文件格式。-Strigi is a daemon which uses a very fast and efficient crawler that can index data on your harddrive. Indexing operations are performed without hammering your system, this ma
4pm
- 本文用lucene和Heritrix构建了一个Web 搜索应用程序 Lucene 是基于 Java 的全文信息检索包,它目前是 Apache Jakarta 家族下面的一个开源项目。 Lucene很强大,但是,无论多么强大的搜索引擎工具,在其后台,都需要一样东西来支援它,那就是网络爬虫Spider。网络爬虫,又被称为蜘蛛Spider,或是网络机器人、BOT等,这些都无关紧要,最重要的是要认识到,由于爬虫的存在,才使得搜索引擎有了丰富的资源。 Heritrix是一个纯由Java开
heritrix_developer_manual
- Heritrix官方开发文档,crawler.archive.org/articles,提供了基本的类的开发介绍。-(Heritrix official development documents, crawler.archive.org/articles, provides a basic introduction class development.)