anyeone
Posts: 12
Joined: Thu Feb 09, 2017 4:30 pm

Partitioning JSONWriter output not working & hiding elements from the xml

Postby anyeone » Wed Apr 12, 2017 6:23 pm

I've been trying to get data to partition into multiple files using a JSONWriter and though I thought I had it distilled into a simple case, but it is not working properly in two ways.

1) I have flattened my data into a single input port that has a field called "catalog_id" that is the partition key. For my testing there are 3 data records coming in holding product data, each one has a different catalog_id. Ergo, I would expect to get 3 files where each file has a single product element only inside the catalog element.

What is happening is I am getting 3 files named properly, but each file has ALL products in it regardless of what catalog_id they belonged to. (See attached xml files)

I've attached a screenshot of the input data and my jsonwriter setup and the following is my Mapping:
Screen Shot 2017-04-12 at 12.08.29 PM.png
Screen Shot 2017-04-12 at 12.08.29 PM.png (73.04 KiB) Viewed 81 times

Screen Shot 2017-04-12 at 12.08.06 PM.png
Screen Shot 2017-04-12 at 12.08.06 PM.png (36.98 KiB) Viewed 81 times


Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns:clover="http://www.cloveretl.com/ns/xmlmapping" xmlns="http://www.demandware.com/xml/impex/catalog/2006-10-31" catalog-id="$0.catalog_id" clover:inPort="0">
  <category category-id="$0.record_id" clover:inPort="0" clover:writeNullElement="false">
    <custom-attributes clover:writeNullElement="false">
      <custom-attribute attribute-id="sdx_noindex" clover:writeNullElement="false">$0.noindex</custom-attribute>
      <custom-attribute attribute-id="sdx_metatitle_auto" clover:writeNullElement="false">$0.metatitle_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_metatitle_override" clover:writeNullElement="false">$0.metatitle_override</custom-attribute>
      <custom-attribute attribute-id="sdx_metadescription_auto" clover:writeNullElement="false">$0.metadescription_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_metadescription_override" clover:writeNullElement="false">$0.metadescription_override</custom-attribute>
      <custom-attribute attribute-id="sdx_h1_auto" clover:writeNullElement="false">$0.h1_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_h1_override" clover:writeNullElement="false">$0.h1_override</custom-attribute>
      <custom-attribute attribute-id="sdx_canonical_auto" clover:writeNullElement="false">$0.canonical_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_canonical_override" clover:writeNullElement="false">$0.canonical_override</custom-attribute>
      <custom-attribute attribute-id="sdx_schema_org_auto" clover:writeNullElement="false">$0.schemaorg_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_schema_org_override" clover:writeNullElement="false">$0.schemaorg_override</custom-attribute>
      <custom-attribute attribute-id="sdx_opengraph_auto" clover:writeNullElement="false">$0.opengraph_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_opengraph_override" clover:writeNullElement="false">$0.opengraph_override</custom-attribute>
    </custom-attributes>
  </category>
  <product product-id="$0.p_record_id" clover:inPort="0" clover:writeNullElement="false">
    <custom-attributes clover:writeNullElement="false">
      <custom-attribute attribute-id="sdx_noindex" clover:writeNullElement="false">$0.p_noindex</custom-attribute>
      <custom-attribute attribute-id="sdx_metatitle_auto" clover:writeNullElement="false">$0.p_metatitle_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_metatitle_override" clover:writeNullElement="false">$0.p_metatitle_override</custom-attribute>
      <custom-attribute attribute-id="sdx_metadescription_auto" clover:writeNullElement="false">$0.p_metadescription_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_metadescription_override" clover:writeNullElement="false">$0.p_metadescription_override</custom-attribute>
      <custom-attribute attribute-id="sdx_h1_auto" clover:writeNullElement="false">$0.p_h1_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_h1_override" clover:writeNullElement="false">$0.p_h1_override</custom-attribute>
      <custom-attribute attribute-id="sdx_canonical_auto" clover:writeNullElement="false">$0.p_canonical_auto</custom-attribute>
      <custom-attribute attribute-id="sdx_canonical_override" clover:writeNullElement="false">$0.p_canonical_override</custom-attribute>
      <custom-attribute attribute-id="sdx_schema_org_auto" clover:writeNullElement="false">$0.p_schemaorg_auto
      </custom-attribute>
      <custom-attribute attribute-id="sdx_schema_org_override" clover:writeNullElement="false">$0.p_schemaorg_override
      </custom-attribute>
      <custom-attribute attribute-id="sdx_opengraph_auto" clover:writeNullElement="false">$0.p_opengraph_auto
      </custom-attribute>
      <custom-attribute attribute-id="sdx_opengraph_override" clover:writeNullElement="false">
        <![CDATA[[attachment=0]Screen Shot 2017-04-12 at 12.08.06 PM.png[/attachment]$0.p_opengraph_override
      </custom-attribute>
    </custom-attributes>
  </product>
</catalog>


The logs show no useful info as no error is being thrown.

2) Additionally, since I have flattened my category & product data into single records (as I only had the option of using a partition key on the last port added to the JSONWriter and I couldn't see a way to populate the catalog-id attribute in the root element from the partition key anyway) I need to hide the <category> element completely for products (ie. which have only the p_ and catalog_id fields populated) and the <product> element completely for categories (i.e rows that do not have any of the p_ fields populated).

The customers data realistically may have products and categories in different catalogs, and when I had the product data coming in on one port and the category data on another, if there were no categories (the 0 port) then the file was always empty.

In this particular test data, I have only products and no categories so the values of all category mapped fields are null, l but I am still getting the skeleton of the category in the file despite having "writeNullElement=false" on every element. (see xml files and mapping above).

Any assistance you can give with these issues is greatly appreciated!

anyeone
Posts: 12
Joined: Thu Feb 09, 2017 4:30 pm

Re: Partitioning JSONWriter output not working & hiding elements from the xml

Postby anyeone » Thu Apr 13, 2017 5:31 pm

I still don't have the partitioning working, but I found another solution to the two issues (not quite as elegant, I fear) but I thought I would share.

Instead of partitioning, in my main jobflow I use an aggregate to get the list of distinct catalog names that appears in both the category and the product feeds, then combine the two so that I have a single list of catalog names that I will want to produce files for. Then I feed those into another jobflow, which reads the category and product input data into separate feeds, filters out all records that are not for the current catalog name, and looks to see whether there are only categories, only products, or both. Depending on which of those 3 scenarios is relevent, a different XMLWriter is called with the two feeds that either outputs only categories, only products, or both. In this way I don't get any empty records in the files and they all contain only the correct products and/or categories.

I still think partitioning would have been more elegant, but this works.

vazquezrosariop
Posts: 92
Joined: Mon Feb 29, 2016 5:33 pm

Re: Partitioning JSONWriter output not working & hiding elements from the xml

Postby vazquezrosariop » Mon Apr 17, 2017 1:32 pm

Hi,

I would suggest the following:

  • Instead of using a Jobflow, your use case is better suited in a graph
  • If you need to use the ListFiles in the you graph, Right Click on the Palette - Customize - File Operations - disable Hide
  • I would start by removing the ExHashJoin (this will remove the null values in the Catalog)
  • You will have three inputs streams coming into the XMLWriter
    • In the upper stream where you have Catalog you will want to use a Reformat to create two output streams
    • The first will output the catalog_id
    • The second you will output everything (including the catalog_id
  • The stream that includes the catalog_id you will add a Dedup (thus removing duplicates from the root element)
  • In the XMLWriter you will map the first stream (catalog_id) with the root catalog_id
  • The second stream will be mapped to the category elements (make sure Key and Parent Key are mapped to catalog_id)
  • The third stream will be mapped to the product elements (make sure Key and Parent Key are mapped to catalog_id)
---
Pedro Vazquez Rosario
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com
How to speed up communication with CloverCARE support