hneff1
Posts: 10
Joined: Fri May 22, 2015 7:53 pm

Normalized Data to Denormalized Number Columns - Use Denormalize Component?

Postby hneff1 » Mon May 01, 2017 10:12 pm

I have a text file that is in a normalized format: ID|AddressLine1|AddressLine2|City|State|Zip|Country
with multiple addresses records per ID

I want to transform the data into a denormalized column structure with the key being the "ID" and the Address columns denormalized into separate numbered repeating group columns:
IID|AddressLine1_1|AddressLine2_1|AddressCity_1|AddressState_1|AddressZip_1|AddressCountry_1|AddressLine1_2|AddressLine2_2|AddressCity_2|AddressState_2|AddressZip_2|AddressCountry_2|AddressLine1_3|AddressLine2_3|AddressCity_3|AddressState_3|AddressZip_3|AddressCountry_3|AddressLine1_4|AddressLine2_4|AddressCity_4|AddressState_4|AddressZip_4|AddressCountry_4|AddressLine1_5|AddressLine2_5|AddressCity_5|AddressState_5|AddressZip_5|AddressCountry_5

I am trying to determine if the "Denormalizer" component is the correct component to use for this. It seems like it should be, but the only examples I can find in your documentation concatenate multiple source values into one output column as opposed to mapping to individual numbered output columns like the example I am providing. I am attaching some sample data for each file structure.

Can you provide some direction?

Thanks,
Heather
Attachments
CoveredRecipientAddress_20150115145705.txt
(678 Bytes) Downloaded 4 times
CoveredRecipientIndividualAddressFinalFormat.txt
(578 Bytes) Downloaded 4 times

svecp
Posts: 25
Joined: Wed Nov 09, 2016 11:51 pm
Location: 2111 Wilson Blvd., Arlington VA 22201
Contact:

Re: Normalized Data to Denormalized Number Columns - Use Denormalize Component?

Postby svecp » Tue May 02, 2017 11:14 pm

Well, this is fairly easily achievable by Denormalizer, as you suggested. If you can rely on number and structure of input records - which won't exceed projected output record's width and will have fields specified in correct order, you should be able to use following code (assuming all fields are strings, except ID).

Code: Select all

<output record's name> accumulator;
integer cnt = 0; // How many records were processed in this group
integer RECORD_OFFSET = 1;   // How many fields should be skipped from reading of incoming record - skip just first field (ID)

function integer append() {
   // Iteration number multiplied by number of fields to transfer (whole record, minus offset)
   integer outputOffset = cnt * (length($in.0) - RECORD_OFFSET);

   // Copy all fields from the record, disregard those at the beginning
   for (integer i=0;i<length($in.0)-RECORD_OFFSET;i++) {
      integer inFieldIdx = i+RECORD_OFFSET;
      integer outFieldIdx = outputOffset+i+RECORD_OFFSET;
      // Copy field of to an offset + current field position, skipping global offset (beginning of a record)
      // If there's anything to copy
      if (!isNull($in.0,inFieldIdx) {
         setStringValue(accumulator,outFieldIdx,getStringValue($in.0,inFieldIdx));
      }
   }
   
   // Increase iteration number
   cnt++;
   return OK;
}

// Copy accumulation record to the output and set ID of a group
function integer transform() {
   $out.0.* = accumulator.*;
   $out.0.ID = $in.0.ID;
   return OK;
}

// Reset counter and accumulation record
function void clean() {
   cnt = 0;
   resetRecord(accumulator);
}


There might be some errors in the code, haven't actually executed it. But the idea is to use CTL reflection and copy fields by order with given offset from "short" to "long" aggregated record.
--
Pavel Švec | CloverETL | Sales Engineer | 2111 Wilson Blvd | Suite 320 | Arlington, VA 22201

hneff1
Posts: 10
Joined: Fri May 22, 2015 7:53 pm

Re: Normalized Data to Denormalized Number Columns - Use Denormalize Component?

Postby hneff1 » Wed May 03, 2017 2:25 pm

Thank you Pavel! That worked perfectly! I just added one check to make sure the number of records in each group did not exceed the number I want to load which is 5.

if ( cnt <= 4 ) {
if (isNull($in.0,inFieldIdx) == false ) {
setStringValue(accumulator,outFieldIdx,getStringValue($in.0,inFieldIdx));
}
}

Thanks for your help and quick turnaround!

Heather


cron