Quantcast
Channel: Active questions tagged config - Stack Overflow
Viewing all articles
Browse latest Browse all 5049

Camel file component throughput when readLock=changed

$
0
0

I'm using Apache Camel to transfer files from an input directory to a messagebroker. The files are written via SFTP. To avoid consuming incomplete files that are still in transit, I've set readLock=changed and readLockCheckInterval=3000.

As an example, this is how one of my tests looks:

<route><from uri="file:inbox?readLock=changed&amp;readLockCheckInterval=3000"/><log message="copying ${file:name}"/><to uri="file:outbox"/></route>

I test this with (echo line 1; sleep 2; echo line 2) > inbox/test and the file gets copied faithfully when readLockCheckInterval=3000. However, this doesn't scale, because the file component will wait three seconds before processing each file. So when I test with

for n in $(seq 1 100); do (echo line 1; sleep 2; echo line 2) > inbox/$n & done

it takes camel five minutes to move the files from inbox to outbox.

I've read the chapter on parallel processing in the Camel in Action book. But the examples focus on parallelizing processing of lines in a single consumed file. I couldn't find a way to parallelize the consumer itself.

A throughput of around one file per second would be fine in my use-case. I just don't like the idea of being forced to risk incomplete data to achieve it. The readLock=changed setting seems like a hack anyway, but we can't tell the customer to copy then move, so there doesn't seem to be another option.

How can I improve throughput without sacrificing integrity in the face of network delays?


Viewing all articles
Browse latest Browse all 5049

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>