Partition Component

Support/help with CloverETL implementation problems

pintail
Posts: 30
Joined: Wed Aug 27, 2014 6:58 pm

Partition Component

Postby pintail » Thu Jun 15, 2017 2:02 am

Hi - Can a partition component be used to separate data into three separate output ports based on the key value? I have incoming data with a field called TypeID (values 1,2 or 3). I want to separate the data into three separate ports, records with values 1 going to port 0, records with values 2 go to port 1, values 3 go to port 2. All I did was assign the key to equal the Type ID metadata column. Is there something else that needs to be done because doing that just seems to assign records randomly to the output ports?

Anyone have a simple example graph for this?

thanks.

bartonv
Posts: 5
Joined: Wed May 03, 2017 12:10 pm

Re: Partition Component

Postby bartonv » Thu Jun 15, 2017 10:06 am

Hello pintail,
as you rightly said, the Partition component does assign the filtered records to the output ports randomly when using the Partition key as the only property definition. This is, in fact, the easiest way how this component can be used to get the data forked into multiple output ports. If you need to assign a specific TypeID to the respective output port, you might need to take advantage of the CTL code partition definition (the Partition property of the component). In the situation that you described, the definition could be as simple as this:

Code: Select all

function integer getOutputPort() {
               if ($in.0.TypeId == 1) {
                              return 0;
               } else if ($in.0.TypeId == 2) {
                              return 1;
               } else
  return 2;
}


Regards,
---
Vladimir Barton
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com
How to speed up communication with CloverCARE support

pintail
Posts: 30
Joined: Wed Aug 27, 2014 6:58 pm

Re: Partition Component

Postby pintail » Thu Jun 15, 2017 3:48 pm

that worked great. thanks!

dpavlis
Posts: 180
Joined: Sat Mar 10, 2007 8:12 pm

Re: Partition Component

Postby dpavlis » Fri Jun 16, 2017 10:43 am

Just a small note:
Partition component does assign the filtered records to the output ports randomly when using the Partition key


That is not really true. In fact the Partition component in that case (partition key defined) calculates a HASH of that key (which is a 32bit number) and then based on that hash value sends the data record out through particular port which represents a "bucket" into which that value belongs. Very much like hash table works.

Why is this important ? Simple - the same key value gets sent out through the same output port - which means you are essentially grouping records with the same partition key values. Which may become important in certain cases. However this does not guarantee that, for example, the value "A" would be sent out through the first port and "B" through the second. Just guarantees that all "A"s would be sent through the same port.
David Pavlis
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com


cron