搜索资源列表
自己写的简单网络蜘蛛
- 本源码简单易懂,便于JAVA初学者参考编程.而且可以对文件类型,大小,搜索深度进行设置,详细说明见下载说明.-the source straightforward, easy reference beginners JAVA programming. But can the file type, size, depth search settings, a detailed explanation see Note download.
crawler
- 一个针对分主题的网页分析和下载系统,能主动下载信息详细页-Automatically analyze and download classified web pages
Lucene+Nutch
- 该书首先描述了开发平台的配置, 接着详细介绍LUCENE和NUTCH开发。-The book first describes the development platform configuration, and then details the development of Lucene and NUTCH.
Search_Engine
- 描述了搜索引擎的系统结构,从网络机器人、索引引擎、Web服务器三个方面进行详细的说明,并通过实现一个新闻搜索引擎来进行例证。-Describes the search engine' s system structure, from the network robot, indexing engine, Web server three areas detailed explanation, and through the realization of a news search engi
nutchbook
- Nutch 是一个开源的、Java 实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。本电子数据里有详细的介绍-Nutch is an open-source, Java to achieve search engine. We run it offers its own search engine all the necessary tools.
SearchEngine
- dySE 是个开源的 Java 小型搜索引擎。该搜索引擎分为三个模块:爬虫模块、预处理模块和搜索模块。其中详细阐述了: 多线程页面爬取、正文内容提取、文本提取、分词、索引建立、快照等功能的实现。-dySE is an open source Java small search engines. The search engine is divided into three modules: crawler module, pretreatment module and search module
demo
- 实现java网页爬虫功能,内容详细,包含了多个预留功能接口(accomplish the spider function and it's very copmpletely)