Hi,
I just wanted to share my experiences of using Nirvanix to upload/download lots of small files.
We're developing a data backup application. Because of the delta compression system, it is common for us to need to upload lots of small files (~10kb).
I have found that we get quite poor performance with small files in this manner with Nirvanix, even though we're using an efficient transport mechanism (i.e. multipart form posting).
Now, some overhead to be expected: there is an inherent cost with lots of small files versus, say, a single large file of the same size. However, my gut feeling is that even with this overhead, Nirvanix seems unexpectly slow.
Firstly, I'd like to seperate out my experiences into three seperate issues: HTTP upload, HTTP download and the web service calling.
Web service calling
The problem here is simply that you need to make several web service calls for each file uploaded. This is just an inherent problem with webservices over HTTP. The average round trip for us is around 50ms, so if we upload lots of files sequentially and we need to make at least one WS call for each, we find our average upload speed just plummits - we're spending a lot of time waiting for the HTTP response.
What would be great would be to have "aggregate" web services. So for example, the ability to call SetMetadata with multiple files/metadata in one go.
HTTP Uploading
We're using multipart form posting. We post several files in one go (up to 50) to try and eliminate the overhead of many HTTP calls. This means there is a single HTTP post which streams all the data for many files all in one go.
When I watch the progress in Wireshark (a packet sniffer), all seems to go well: the initial POST occurs and then I see all the files' data pushed out in one go. During this part, the upload bandwidth of my broadband connection is maxed out - i.e. uploading nice as fast.
However, the problem is that there is a huge pause (about 20 secs for 50 files) between the last data packet being sent and the HTTP OK being returned. I'm certain that this delay is at the Nirvanix server end because I can see in Wireshark that all the data has been sent out on the wire.
I then compared this to uploading a single file of the same total size, and predictably the pause wasn't there. Now of course I totally appreciate that a certain amount of overhead is to expected as you increase the number of files, but the delays I'm seeing seem too high (around 500ms per file - not trivial up you're uploading 1000s of files).
So the bottom line is, that uploading lots of small files meant that the *overall* upload speed is pitiful. I get a burst of really good speed while the data is posted, but then a huge pause.
This feels like it shouldn't be the case: the total amount of data I was uploading wasn't very large (less than a megabyte), and the number of files wasn't very large either (50 files). It seems like there is some per-file overhead at the server end, not related to the actual data size.
Some good news is that this overhead seems to dissapear if you're file sizes are slightly larger. Also, I guess for most people this just won't be an issue (depending on the size of the files they want to upload).
I've got a spreadsheet of some experiments if anyone at Nirvanix wants to take a look.
HTTP downloading
This is also not so performant: we have to make single webservice call for each file, and then a single HTTP GET call for each file. I guess the problem here is simply the overhead in the HTTP requests. What might be nice, again, is some kind of aggregate call which returns all the data in one go. Not sure how you'd do this with HTTP; I don't know if there is a download equivalent of the multipart form POST? Maybe a special call which recovers the files data zipped up?
This one seems harder to solve.
Well anyway, I just wanted to share my experiences and see if any of
the Nirvanix staff have any comments. I'm using node2 (Europe I think)
if that helps (though I don't think it is a temporary congestion issue).
Regards,
John