File Operations and JobFlow patterns

Support questions related to CloverETL Server

melshami
Posts: 10
Joined: Tue Dec 23, 2014 4:39 pm

File Operations and JobFlow patterns

Postby melshami » Tue Dec 30, 2014 12:30 pm

This is the 3rd day so far since I started evaluating CloverETL server and I am looking for help to implement some basic file operations tasks using FTP and other remote file protocols.

The basic scenario is:

1. List files using input pattern filter or expression.
2. Process/copy every files over to target remote location.
3. Rename each output filename, using expressions e.g. appending datetime to the filename.
4. Move the original remote file to done directory.

The JobFlow design patterns in the user guide seems to suggest similiar pattern here: http://doc.cloveretl.com/documentation/ ... rence.html

However, I am not sure how achieve the steps above, and in particular renaming output file names using expressions. Camel has short semantics to achive this http://camel.apache.org/file2.html, is there something similiar in CloverETL?

Regards,
M. Elshami

kubosj
Posts: 372
Joined: Thu Jan 12, 2012 9:10 am

Re: File Operations and JobFlow patterns

Postby kubosj » Wed Dec 31, 2014 10:07 am

Hi M. Elshami,

What you need can be done using 3 components in CloverETL. Please see attached JobFlow. It works like this:

  1. ListFiles - list files using pattern, in my case *.csv
  2. In output mapping of ListFiles I map URL of each matched file to custom metadata which contains only from and to fields.
  3. In Reformat I fill to field. In my case it is just replacing folder csvs by csvs2 in URL. But you can do any complex logic you want - put there another server, change filename, add timestamp to name, ...
  4. So in output metadata we have both original URL of file and desired new URL.
  5. Now we can use CopyFiles or MoveFiles with proper input mapping to copy/move files.

If you need to process files on Server then process would be:
  1. Use steps above to copy files to local Server storage
  2. Process files locally
  3. Use steps above to copy files to destination directory

Or alternatively use Supported File URL Formats for Readers and Supported File URL Formats for Writers for direct file access from Readers/Writers. You would load remote file for processing and save remote file with results - so no local copy would be necessary.

You can use JobFlow to simplify processing logic, for example:
  • Prepare master JobFlow which only list files and then for each executes sub JobFlow (passing file URL as parameter)
  • In sub JobFlow download file from original location to local file, execute processing graph, and move/copy result to destination server/folder
  • In processing graph you just process local file and save result into local file

I hope this helps.
Attachments
ftp_copy.jbf
(3.49 KiB) Downloaded 108 times
Jaroslav Kubos
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com

melshami
Posts: 10
Joined: Tue Dec 23, 2014 4:39 pm

Re: File Operations and JobFlow patterns

Postby melshami » Mon Jan 05, 2015 4:08 pm

Thanks a lot Jaroslav, this was helpful.

I am very new to CloverETL, not sure how the binding works in the reformat component, looks like you've assigned the out.0.from and out.0.to in the ListFiles component?

function integer transform() {
$out.0.from = $in.0.from;
$out.0.to = $in.0.from.replace("DailyTrans", "DailyTrans_20150102.csv" );
printLog(info, "out.0.from: " + $out.0.from);
return ALL;
}

Also, how do I specify more dynamic input pattern? so instead of wildcards, I would like to specify input pattern based on the date, e.g. DailyTrans_<date>*.csv

Regards,
Mohamed

slechtaj
Posts: 192
Joined: Wed Aug 15, 2012 8:18 am

Re: File Operations and JobFlow patterns

Postby slechtaj » Tue Jan 06, 2015 4:34 pm

Hi Mohamed,

As you can see Clover components have input and/or output ports. And in order to work with data coming through these ports in CTL you work with the following values:
$in.0 - Which represents the first input port. Zero is the index of the input port and the word in tells clover it is input port. Numbering of port indexes begins with 0 (0 – first port, 1 – second port etc.)
$out.0 - Similarly to previous example, this stands for the first (index 0) output port (out)

Regarding the pattern you would like to use, there are two ways:
  • For simple patterns (just like you have) you may still use wildcards (just like DailyTrans_????-??-??*.csv – which can handles strings like DailyTrans_2012-11-27_Monaco.csv etc.).
  • For more complicated patterns you may at first list all files from a folder (using ListFiles) and after that use ExtFilter component to filter out unwanted records based on regular expression comparison.

Hope this helps.
Jan
Jan Slechta
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com

How to speed up communication with CloverCARE support

melshami
Posts: 10
Joined: Tue Dec 23, 2014 4:39 pm

Re: File Operations and JobFlow patterns

Postby melshami » Wed Jan 14, 2015 3:41 pm

Thanks a lot Jan

I made another attempt to create basic ListFiles -> CopyFiles flow.

I can't figure out how URL metadata is propagated, I thought it's automatically recognised, but then looking at your example it seems that I have to do input mapping and output mapping.

I've trying the below CTL2 the CopyFiles input mapping:

// Transforms input record into output record.
function integer transform() {
$out.0.sourceURL = $in.0.URL;

return ALL;
}

I am getting the following error:

Caused by: java.lang.IllegalArgumentException: Copy source is empty
at org.jetel.component.fileoperation.FileManager.copy(FileManager.java:271)

I've attached the jobflow example.

Regards,
Mohamed
Attachments
local-files-copy.jbf
(3.46 KiB) Downloaded 53 times

slechtaj
Posts: 192
Joined: Wed Aug 15, 2012 8:18 am

Re: File Operations and JobFlow patterns

Postby slechtaj » Fri Jan 16, 2015 12:47 pm

Hi Mohamend,

as you can see you are getting "Copy source is empty" message, which means the URL string is empty. If you enable debug on the edge between ListFiles and CopyFiles you can view the data that goes through it. In your case it is only empty record. The reason why the record does not contain any data is that you haven't defined Output mapping in ListFiles component. I've prepared a short example for you (copies all files from data-in to data-out).

local-files-copy.jbf
(5.39 KiB) Downloaded 63 times
Jan Slechta
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com

How to speed up communication with CloverCARE support

melshami
Posts: 10
Joined: Tue Dec 23, 2014 4:39 pm

Re: File Operations and JobFlow patterns

Postby melshami » Wed Jan 21, 2015 5:10 pm

Thanks Jan,

I got it now.

pintail
Posts: 30
Joined: Wed Aug 27, 2014 6:58 pm

Re: File Operations and JobFlow patterns

Postby pintail » Tue Mar 17, 2015 4:06 am

Hi - do you have a jobflow example graph that runs a series of graphs in sequence assuming that the previous graph executes successfully for the next one to run? I'm looking to automate a series of graphs that take a long time to run, so trying to break them up into smaller running parts to the memory can flush itself out as well as join small partitions of data to speed it up a little bit. I can't find a good example out there of how to use jobflow to call more graphs once one has finished without error.

thanks for any help!

slechtaj
Posts: 192
Joined: Wed Aug 15, 2012 8:18 am

Re: File Operations and JobFlow patterns

Postby slechtaj » Tue Mar 17, 2015 3:16 pm

Hi pintail,

CloverETL Server comes with set of examples in which you may find the answers to your questions. You might want to start with jobflows in JobflowExamples sandbox.

Hope this helps.
Jan Slechta
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com

How to speed up communication with CloverCARE support

pintail
Posts: 30
Joined: Wed Aug 27, 2014 6:58 pm

Re: File Operations and JobFlow patterns

Postby pintail » Mon Mar 23, 2015 9:35 pm

It does thanks - I didn't even think to look in the example sandboxes...completely slipped my mind. thanks!