move/copy files very slow when overwriting

Support/help with CloverETL implementation problems

wills__aperio
Posts: 4
Joined: Mon Mar 13, 2017 11:27 pm
Location: Sausalito, CA

move/copy files very slow when overwriting

Postby wills__aperio » Thu Jun 29, 2017 7:18 pm

hi,

I have a graph that uses FlatFileWriter to generates 3500 text files in ./data-tmp/subfolder. At the very end of the graph, I have a MoveFiles component that uses ./data-tmp/subfolder/* as input to move the files to a remote location (using sftp). The overwrite flag is set to always.

When the remote location is empty, this process takes about 30-45 seconds. However, when it is not empty, it drops to about 1 file/sec, which takes almost an hour. I tried with CopyFiles and it has the exact same issue.

Is there a way to increase the throughput without adding components to check/wipe the directory first?

Thanks in advance!

- will

bartonv
Posts: 32
Joined: Wed May 03, 2017 12:10 pm

Re: move/copy files very slow when overwriting

Postby bartonv » Fri Jun 30, 2017 3:27 pm

Hello Will,
the way how the MoveFiles component actually moves files to a remote location via SFTP heavily depends on the SFTP utility and CloverETL version that you are using. Could you please get back to us with more details so that we can suggest the most suitable resolution?

    1. What is your SFTP tool that you are taking advantage of?
    2. Do you connect by using a user/password logon credentials or you are using a certificate?
    3. What is your CloverETL version (Designer and/or Server)?
    4. What is the approximate size of a single input file?
Regards,
---
Vladimir Barton
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com
How to speed up communication with CloverCARE support

wills__aperio
Posts: 4
Joined: Mon Mar 13, 2017 11:27 pm
Location: Sausalito, CA

Re: move/copy files very slow when overwriting

Postby wills__aperio » Mon Jul 03, 2017 8:44 pm

hello,

  • We are simply calling the files via SFTP in the Move/CopyFiles component --( sftp://username:password@server/path/to/files )-- I'm not aware of Clover using a 3rd party utility to perform this.
  • Username:password (see above)
  • Designer v4.4.0.011 / Server v4.4.0.11
  • There are about 3500 files, ranging between 3kb and 300kb, with the average being about 25kb

thank you!

bartonv
Posts: 32
Joined: Wed May 03, 2017 12:10 pm

Re: move/copy files very slow when overwriting

Postby bartonv » Mon Jul 10, 2017 12:17 pm

Hi wills__aperio,
thank you for the provided details. After a close inspection, it appears that Clover keeps listing the entire target directory content every time there is a file that is supposed to be overwritten. Such behavior, however, does not seem to be correct. I have logged the following issue in JIRA so that it gets reviewed and corrected by our developers: https://bug.javlin.eu/browse/CLO-11223.
One of the workaround approaches, as you rightly said, is having the target directory content cleaned up before the MoveFiles component comes into play. You can either consider cleaning up the entire target directory content or you can clean up only those files that are about to be overwritten. The latter option can be achieved by using the ListFiles component together with the DeleteFiles component to run before (in an earlier phase than) the MoveFiles component. Another option is to take advantage of a custom Java code that would zip the files before the moving, move the single zipped file via SFTP and unzip the single zipped file in the target directory.
Regards,
---
Vladimir Barton
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com
How to speed up communication with CloverCARE support