|
About:
Heritrix is the Internet Archive's extensible, Web-scale,
archival-quality Web crawler.
Release focus: Major feature enhancements
Changes:
This release offers improved remote control and
monitoring via JMX, a crawl-checkpointing
facility, experimental support for bloom filter
already-included testing, partitioning a crawl
across multiple independent crawlers, and
per-host/domain/queue-grouping collection quotas.
Performance and stability in large crawls was
improved. 39 requested enhancements were included
and 96 reported bugs were fixed. You will need to
tweak your old order files again to make them work
with the new release.
Author:
Michael Stack [contact developer]
Homepage:
http://crawler.archive.org/
Tar/GZ:
http://sourceforge.net/[..]iles.php?group_id=73833&package_id=73980
Zip:
http://sourceforge.net/[..]iles.php?group_id=73833&package_id=73980
Changelog:
http://crawler.archive.org/articles/releasenotes.html
CVS tree (cvsweb):
http://archive-crawler.cvs.sourceforge.net/
Mailing list archive:
http://groups.yahoo.com/group/archive-crawler/
Trove categories:
[change]
Dependencies:
[change]
No dependencies filed
|