Branches
Comments
[»]
Handles large archives?
by Peter Abrahamsen - Jan 5th 2004 13:19:20
I'm curious as to whether this tool has the same architectural limitation
as rsync, that it must build a complete archive listing before it begins to
transfer files. I have a system with about 6-7 million files, and even
though the vast majority of them are --exclude'd out, and even though the
system has 2GB of RAM, rsync runs out of memory. Does this program work the
same way?
[reply]
[top]
[»]
Re: Handles large archives?
by abo - Jan 9th 2004 00:59:25
> I'm curious as to whether this tool has
> the same architectural limitation as
> rsync, that it must build a complete
> archive listing before it begins to
> transfer files.
...
pysync only implements the delta calculation and patch application, it
does not include any directory walk or network transport stuff. So pysync
doesn't have those limitations because it doesn't include that kind of
functionality. Pysync could be used to implement something that does what
you want without those limitations.
Have a look at librsync, rdiff-backup, unison etc for possible other
alternatives that might be closer to what you want.
[reply]
[top]
[»]
Re: Handles large archives?
by Luke Kenneth Casson Leighton - Jun 6th 2005 15:39:39
search on google.com for "python rsync". almost right at the top is
someone implementing rsync in python, but he hasn't got round to doing the
bits that this guy has.
combine the two projects and you have a _complete_ implementation of rsync
in python.
i aim to investigate this project because i want to be able to offer
different files "merged" into one single repository, depending on who
connects to the rsync server :)
i.e. i can back up several machines, but the config files will be
different.... cool, huh? :)
[reply]
[top]
[»]
Re: Handles large archives?
by abo - Jun 6th 2005 19:50:54
> search on google.com for "python rsync".
> almost right at the top is someone
> implementing rsync in python, but he
> hasn't got round to doing the bits that
> this guy has.
I think you are refering to rsync.py. I just
looked at it. It doesn't implement the rsync algorithm or any network
transport. It only copies the walk/filter/copy functionality of rsync.
Another interesting project is zsync. This implements an inverse
rsync algorithm and uses a normal http server for network transport. It has
no walk/filter/copy functionality.
For those who want to add walk/filter functionality to pysync, I have bits
and pieces that might be useful;
efnmatch.py
rsync style extended fnmatch.
dirscan.py
rsync style include/exclude pattern directory scanning.
ddiffutils.py
efficient directory comparison walk generators.
I haven't yet tied these together into a useful combination, but I should
some day :-)
[reply]
[top]
[»]
Version 2.24 release
by abo - Oct 17th 2003 09:37:22
Version 2.24 has been released to update pysync for the new librsync 0.9.6.
Also includes some minor tweaks, including psyco support which gives a 33%
speedup.
If anyone wants windows binaries, let me know and I'll build them.
[reply]
[top]
[»]
Version 2.16
by abo - Jun 24th 2002 19:19:17
Version 2.16 was a quick release to include the rollsum extension module. I
am currently working on improving librsync and wanted to release pysync
with the work I had completed on it thus far before leaving it for a little
while.
It does not include inverse delta support yet, and the librsync extension
is unchanged. This means the librsync incremental API and memory leak
problems are still present. After I finish improving librsync I will
address these problems.
[reply]
[top]
[»]
Windows installer also available.
by abo - May 3rd 2002 06:17:16
Now that the release is out, I see they didn't like me putting the windows
.exe installer as an "OS X" package :-).
For those who want it, a windows installer for python 2.1 is available at
the ftp site..
For those who are intensly curious, there is a development diary in the Software
Working File also publicly visible.
[reply]
[top]
[»]
Release of version 2.7
by abo - May 2nd 2002 13:57:10
This release is a major milestone, including both an md4 sum extension
module and a swig librsync extension.
Note that the API has changed a little for pysync to bring it more in line
with rdiff. Both pysync.py and librsync.py can be used as drop-in
replacements for rdiff, with the exception they use "rdelta" instead of
"delta" as an option. This is to distinguish from the pysync.py alternative
of "xdelta". Note that the file parameters have changed order!
The librsync wrapper supports the higer level file api, but the low-level
API is currently faulty. I hope to have this fixed in the next release.
The other major change is use of distutils to build releases. This allows
me to produce rpm's and windows installers. The source distribution comes
as a unix tar.bz2, or a windows zip. Because windows does not usualy have
support for autoconf and swig, the zip includes a pre-configured and
swigged librsync. The tar.bz2 does not include librsync, so you will need
to get it and the patch from the Sourceforge rproxy project.
[reply]
[top]
[»]
Comming soon: librsync wrapper sponsored by Accellion.
by abo - Apr 22nd 2002 01:05:20
I am currently working on making a Python extension for librsync (part of
the rproxy project on sourceforge) to add to pysync. This work has been
sponsored by Accellion.
This should be finished before the end of this week (2002-04-26). Those
interested in tracking this development can do so in the pysync Software
Working File.
[reply]
[top]
[»]
Adding Reverse Delta's
by abo - Jan 30th 2002 19:35:47
It looks like I will be adding reverse-delta support to this soon, as I
have a need for it. This will allow client-side delta calculation, reducing
the load on a server.
I have some neat ideas about how this could be implemented simply using
inheritance from the forward rdelta class. Any interested
comments/encoragement will spur me on to implement this sooner :-).
[reply]
[top]
[»]
Now stable.
by abo - Sep 24th 2001 21:00:33
I've just changed the status of this from alpha to production/stable,
because it basicly is.
I haven't really used it enought to be 100% confident there are no bugs,
but it's objective of being a python demonstration of the algo has been
met.
There are some further things that could be done with it... using md4sums
instead of md5sums, restructuring to allow reverse-patching, adding a
python interface to librsync, and simplify it more. These are things I'm
unlikely to do myself soon, but I'm wide open for patches, suggestions,
whatever.
[reply]
[top]
[»]
The new zlib like API features
by abo - Mar 1st 2001 19:19:04
Release 1.2 introduced the new zlib-like API, allowing for incremental
calculation of deltas and applying patches. The comments at the top of
pysync.py explains it all;
# Low level API signature calculation
sig=calcsig(oldfile)
# Low level API rsync style incremental delta calc from sig and
newdata
delta=rdeltaobj(sig)
# or for xdelta style incremental delta calc from oldfile and
newdata
# delta=xdeltaobj(oldfile)
incdelta=delta.calcdelta(newdata)
:
incdelta=delta.flush()
# Low level API applying incremental delta to oldfile to get
newdata
patch=patchobj(oldfile)
newdata=patch.calcpatch(incdelta)
:
The rdeltaobj.flush() method supports R_SYNC_FLUSH and R_FINISH flush
modes that behave the same as their zlib equivalents. Next on the TODO list
is incremental signature calculation, and further cleanups. Eventualy I
plan to create a md4sum module and move the rolling checksum stuff into C
code.
The performance has been marginaly hurt by this new API. Interestingly,
the python profiler shows that most of the time is wasted performing
string-copies when taking slices from input buffers, not actualy doing the
rsync. This suggests that significant performance increases might be
achievable by re-arranging things a bit, rather than moving python code
into C.
I have also added a pysync-test.py script for thorough formal testing of
pysync. It generates/reuses random test files that make pysync really work
hard, verifying that it behaves as it should.
Incidentaly, release 1.2 also fixed a rather embarassing bug in release
0.9's adler32.py that corrupted the rolling checksums, resulting in heaps
of missed matches. This caused serious bad performance and very large
deltas.
[reply]
[top]
[»]
Re: The new zlib like API features
by damien morton - Mar 5th 2001 15:43:24
>
> The performance has been marginaly
> hurt by this new API. Interestingly, the
> python profiler shows that most of the
> time is wasted performing string-copies
> when taking slices from input buffers,
> not actualy doing the rsync. This
> suggests that significant performance
> increases might be achievable by
> re-arranging things a bit, rather than
> moving python code into C.
I dont know if this can help, but you can create read-only buffers which
are views into other buffers.
>>> a = buffer("the quick brown fox jumped over the lazy
dog")
>>> a
<read-only buffer for 007D8908, ptr 007D891C, size 44 at
007DD720>
>>> buffer(a, 5, 10)
<read-only buffer for 007D8908, ptr 007D8921, size 10 at
007DF298>
This can save some copying.
[reply]
[top]
[»]
Re: The new zlib like API features
by abo - Mar 5th 2001 17:59:29
> I dont know if this can help, but you
> can create read-only buffers which are
> views into other buffers.
Yes, I discovered buffer() soon after I released 1.2 by accident (trying
to decipher extended slices in 2.0 and found a reference to it in the
docs). It does make a big difference, and I've already started
experimenting with it. It also opens up a few implementation options that
were closed before. I'll probably have a new slightly faster and simpler
version out soon.
[reply]
[top]
[»]
Re: The new zlib like API features
by abo - Mar 13th 2001 20:22:07
The new version 1.7 now takes advantage of buffer(), plus a few other
simplifications and optimisations. This version is approximately 33% faster
than version 1.2.
I have also properly implemented the xdelta style delta calculation using
a fairly neat inheritance from the rdelta class. This gives optimal deltas
but requires direct access to the original oldfile.
[reply]
[top]
[»]
Where to from here?...
by abo - Dec 17th 2000 19:41:24
There have been a few downloads already so I figure at least some people
have looked at this. I'd like some feedback on where to take it from
here...
Since this is such a small piece of code, there is nothing like a
supporting webpage or development site. I'm tossing up whether to create a
SourceForge project for it, or just post it as a code-snippet. It's so
small I hardly feel it's worth it, but I guess bugtracking etc might be
useful. In the mean time, email me with suggestions.
I've already started working on cleaning up the api to be more like zlib,
along the lines that rproxy's libhsync. I'll be releasing the new version
soon. This should make it easier to use for real applications and provide a
better reference api.
[reply]
[top]
[»]
Re: Where to from here?...
by abo - Mar 1st 2001 19:26:33
Well, as people may have noticed, I've decided that for a project as small
as pysync, the best solution is to make the freshmeat entry the official
homepage.
Please post comments, suggestions, bugreports, etc here or email them
direct to me. If this starts to grow too big, I'll then consider something
like sourceforge.
[reply]
[top]
[»]
Very simple, can do things rsync and xdelta can't.
by abo - Dec 7th 2000 19:17:21
I hate to blow my own trumpet, but I thought I'd add a bit more info not
really applicable for the description.
This is _really_ simple... it is only about 300 lines, and half of
those are comments containing descriptions and observations. It should be
dead easy for anyone to read, understand, and modify.
It also implements a breakdown of the rsync algorithm that you can't
easily get from any of the current C based implementations, which means you
can use it for things you can't use rsync or xdelta for. It's
"Usage" says it all;
Usage:
pysync sig oldfile sigfile
... generates signature file sigfile from oldfile
pysync delta newfile sigfile diffile
... generates delta file diffile for newfile from sigfile
pysync apply oldfile diffile newfile
... applies delta file diffile to oldfile to generate newfile
[reply]
[top]
|