|
| Fri, May 16th | home | browse | articles | contact | chat | submit | faq | newsletter | about | stats | scoop | 21:48 PDT |
|
login « register « recover password « |
| [Article] | add comment | [Article] |
Mirrors are extremely useful when used to their full potential -- but this rarely happens. There is nothing wrong with mirrors but the way that we use them. I want to make it so average users who don't (and shouldn't need to) know too many technical details can automatically make the best use of mirrors. Copyright notice: All reader-contributed material on freshmeat.net is the property and responsibility of its author; for reprint rights, please contact the author directly. As Fiber to the home (15-30 megabit speeds) and Cable/DSL (1-6 megabit speeds) become more common, some servers are having trouble maxing out a user's download pipe. One way to increase performance is to download from multiple resources at once. This is mainly useful for large files.
Mirrors are confusing to an inexperienced Web user. The Fedora Project
has 110 mirror sites in North America alone.
In this case, not all mirrors carry all files. Some might not have all large ISOs (the Fedora Core 4 DVD image is around 2.5 gigabytes), or might only carry a subset of files (some kernel.org mirrors only have .tar.gz or .bz2 files, some have both). Or they might just be out of sync. That means you have to navigate through them to find out if they really have the file you need. This is basically a usability problem. With some downloads, complications arise from users needing to select their Operating System, language, and location. I hope to make things easier. Mirrors are great. We need to keep using them, but we need a better, more automatic way to use them. Peer-to-Peer (P2P) in general and BitTorrent specifically are amazing. They make it so individuals can share their bandwidth and distribute files that would otherwise cost too much through traditional server-to-client downloads. But... P2P and regular hyperlinks are not that reliable. A hyperlink is one link to a file. If that file is gone or moved, or the server is temporarily down, that's it. 404 Error. You can search by filename, but there is no unique identifier to find that file again on the Web. P2P sharing is ephemeral. Most files are not available constantly or for the long term. I'm sure everyone has found a .torrent that he really wants, but that no one is sharing any more. BitTorrent downloads will not complete if there are no seeds at 100%. A torrent download will sit at 99.9% forever until a 100% seed (someone with the full file) starts sharing. There is no fallback plan. I have been working on a file format called MetaLink that bundles the various methods (P2P/HTTP/FTP) of downloading files in order to improve usability, performance, reliability, and efficiency over one P2P method or a regular hyperlink. One of the main goals is to make the download process simpler for the end user. I hope this format will be found useful by Free and Open Source software projects. Performance is increased because you download from multiple resources at the same time. Reliability is greater because there are multiple avenues or alternate locations to get a file. Hyperlinks have a single point of failure. Metalinks do not; all resources have to go out at the same time for a file to be unavailable. And it is more efficient because it spreads the downloads more evenly across multiple resources (P2P or Web/FTP servers) by multi-threading (a.k.a. segmenting or accelerating) downloads. That means that a portion of each file is downloaded from separate servers. The minimum requirement for Metalink to be integrated into a program is that it already supports segmented downloads. Clients should also have a way to check MD5 and SHA-1 sums. And if it has BitTorrent and other P2P methods (ed2k links, magnet links, Gnutella) built in, even better. The perfect client will be able to share and access files across many P2P networks. A few clients are implementing MetaLink right now and should be available shortly. Here is an example MetaLink for OpenOffice.org 2.0 with links for a BitTorrent .torrent, magnet, ed2k, FTP, and HTTP. A really useful MetaLink will include combinations for different Operating Systems and languages.
<?xml version="1.0" encoding="UTF-8"?>
<metalink version="2.0" xmlns="http://www.m3talink.org/"
origin="http://www.openoffice.org/mmm/OpenOffice.org-2.0.1.metalink"
type="static" pubdate="2005-12-21-22:07:22"
refreshdate="2005-12-23-03:24:18">
<files>
<file name="OOo_2.0.1_LinuxIntel_install.tar.gz">
<identity>OpenOffice.org</identity>
<version>2.0.1</version>
<description>OpenOffice.org 2.0.1 - free office
suite</description>
<tags>OpenOffice.org, office suite, OpenDocument, open
source</tags>
<language>en-US</language>
<os>Linux-x86</os>
<size>109237237</size>
<verification>
<md5>e0d123e5f316bef78bfdf5a008837577</md5>
</verification>
<publisher>
<name>OpenOffice.org</name>
<url>http://www.openoffice.org/</url>
</publisher>
<license>
<name>LGPL</name>
<url>http://www.gnu.org/copyleft/lesser.html</url>
</license>
<copyright>Copyright 2000-2005 Sun Microsystems
Inc.</copyright>
<resources>
<magnet>
<url>
magnet:?xt=urn:sha1:TWTEVOAO2IIEV67QT2ZITTXHXEUR4EXD&xt=urn:kzhash:07b7760f1c05440c779479b50dd9dd5d96708cf47b7cef1181058119637ff20ab7d38af0&xt=urn:tree:tiger:VKFOQ3RETGBCLWOJAMX53EQR4OWNV7CUEOAVY6Q&xt=urn:ed2k:8966658d3b75ff12e1260371ad257098&xl=109237237&dn=
OpenOffice.org_2.0.1_LinuxIntel_install.tar.gz&xs=http://ftp.snt.utwente.nl/pub/software/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz
</url>
<preference>90</preference>
</magnet>
<ed2k>
<url>
ed2k://|file|OpenOffice.org_2.0.1_LinuxIntel_install.tar.gz|109237237|8966658D3B75FF12E1260371AD257098|h=3JVTR3O2DYGSBYCDCHKBOBXL2IJ6A3H3|s=
http://ftp.snt.utwente.nl/pub/software/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz|/
</url>
<preference>90</preference>
</ed2k>
<bittorrent>
<torrent>
<url>http://borft.student.utwente.nl:6969/file?info_hash=%53%13%06%4e%30%c4%1e%e2%6f%e2%b0%24%8f%1b%e7%1e%97%ae%ec%ca</url>
</torrent>
<preference>100</preference>
</bittorrent>
<http>
<url>http://mirrors.isc.org/pub/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz</url>
<location>US</location>
<preference>80</preference>
</http>
<ftp>
<url>ftp://ftp.ussg.iu.edu/pub/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz</url>
<location>US</location>
<preference>20</preference>
</ftp>
<http>
<url>http://mirrors.ibiblio.org/pub/mirrors/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz</url>
<location>US</location>
<preference>20</preference>
</http>
<ftp>
<url>ftp://openofficeorg.secsup.org/pub/software/openoffice/stable/2.0.1/OOo_2.0.1_LinuxIntel_install.tar.gz</url>
<location>US</location>
<preference>40</preference>
</ftp>
</resources>
</file>
</files>
</metalink>
The goal is simplicity. A user will click this one .metalink, and the client will download the file in segments from P2P and mirrors. After the download is complete, the checksums will be compared to verify that the files are identical. So, to sum up, these are the benefits over traditional methods:
I'd be interested in any comments you have. Author's bio: Anthony Bryan usually sits on his lazy bum all day, but this time he's done something. Luckily, that something doesn't involve physical movement, but it may allow him to get a new chair sometime in the next five years. Probably... Possible improvements to the download process -- by an otherwise lazy bum. T-Shirts and Fame! We're eager to find people interested in writing articles on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an article gets a t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about. [Comments are disabled]
[»]
Setting the Preference Parameter On The Server? I understand that the metalink configuration provides a 'preference'
parameter for each link that determines how likely the client should be to
select that particular link. I assume that this parameter would not be
static, but rather would be dynamically set by the web server providing
the metalink. But how would the server know how to set this? It seems
that you may lose some of the intuitive "I live near X, so I will
choose the server near X" functionality you get with regular mirror
hyperlinks. Your thoughts on this?
[»]
Metalink tools Bram Nejit has released Metalink tools which are extremely useful for making metalinks, by generating many different checksums and importing mirror lists.
[»]
BSD/Linux Distributions using Metalink DesktopBSD, BLAG Linux, StartCom Linux, Berry Linux, Ubuntu Christian Edition
[»]
Thank you Great advice, thank you!
[»]
New and updated Metalink clients wxDownload Fast is a download manager on Mac, Unix, and Windows that supports Metalink. aria2 is a unix command line download utility that supports BitTorrent and Metalink. Version 0.7.0 offers updated Metalink support. BLAG offers their Linux distribution ISO for download with Metalink.
[»]
Re: New and updated Metalink clients Speed Download (Mac) now supports Metalinks. It looks and works great, check it out.
[»]
FlashGot support for Metalink FlashGot 0.5.9.995 (Firefox extension) now supports an earlier version of Metalink with GetRight. FlashGot could be modified so Metalink could work with any of the other cross platform download managers it supports.
[»]
GetRight 6 GetRight 6 (final version) is now out. It supports metalinks and works with Wine on Linux. I'd still love to see a command line metalink client for unix.
[»]
Re: Updated metalinks for various files Metalink @ Packages Resources provides updated Metalinks for the Linux Kernel, OpenOffice.org, & Fedora with more Open Source projects on the way (KDE, Debian, Ubuntu, Mandriva). Software and (GPL'd) source code for generating Metalinks is also available there.
[»]
aria2 - Unix client aria2 is a command line client for Unix that supports Metalink (HTTP/FTP) and BitTorrent.
[»]
OpenOffice.org uses metalinks OpenOffice.org uses metalinks.
Clients:
[»]
Update We have a site up for the project at http://www.metalinker.org/. If you are on Windows, you can try some of the samples on the Metalink site with GetRight 6 Beta. The next version (.5.9.994?) of FlashGot (cross platform Firefox extension) should also support it. There are also a few other clients adding native support.
[»]
critics and salesmen when a critic attempts to sell their own solution, it taints the
critique.
[»]
SMTM? Oh, and where's the price tag? --
[»]
simba I agree with you. Most of the mirrors are not transparent. You don't even know what is excluded from a mirror. You don't know when was last updated, or what the mirror size is or (worse) what was transfered on the last update. What about some rss feeds ? Do you think they are usefull ? If you do, take a look at RoEduNet Iasi Online Archive . The guys from RoEduNet Iasi are using simba to manage their mirrors, and as you can see, almost all the information related to a mirror is available online ;)
[»]
Bandwidth management The easiest way to pick a mirror according to resources would be to use
bing or pchar to determine the available bandwidth between client and each
server, then go for the one with the greatest available bandwidth.
[»]
Could be done with BitTorrent alone Instead of mixing HTTP, FTP and Torrents, one could just use Torrents to
get the listed benefits: Torrents let you address multiple trackers, so
there is no single point of failure at that point. Instead of having 5
HTTP or FTP Mirrors, you can deploy 5 "always on" seeds for your
data on different hosts. That way, everyone has the chance to always reach
a 100% seed. I don't see why HTTP and FTP should be added to the mix, they
just make things more complicated IMHO. --
[»]
Re: Could be done with BitTorrent alone
[»]
XML Structure For my own education, I'm writing a metalink parser/generator in PHP. I'm
going to make a database of metalinks for all the packages in the Fedora
YUM repository as a test, and I've run into a couple things...
[»]
Round Robin DNS + Virtual Hosting ( + optional BGP Virtual IP Routing) It's possible to provide mirror transparently through a combination of
methods. The easiest is round robin DNS with web/ftp virtual hosting.
This is basically how the Debian archives scale.
[»]
Re: Round Robin DNS + Virtual Hosting ( + optional BGP Virtual IP Routing)
This is unsuitable for long-lived connections, because routing changes can suddenly direct a user to a different server in the middle of a download. It's fine for DNS though.
[»]
Re: Round Robin DNS + Virtual Hosting ( + optional BGP Virtual IP Routing)
[»]
Re: Round Robin DNS + Virtual Hosting ( + optional BGP Virtual IP Routing)
[»]
Good idea, but implementation raises questionmarks I think the idea behind this is plausible but I wonder if all the
assumptions are correct, these are my questions/reservations etc: --
[»]
Which clients are implementing the standard? First, you mentioned clients to implement this new standard. Which
ones?
|