PaperVision Capture Code: Find Duplicates in Batch

Scanfree develops PaperVision Capture Code and Custom Functions for End-users and Service Providers.

A few Scanfree clients run very large batches through their PaperVision Capture stations. In some cases they’re running PaperVision Capture Automated Import Queues (AutoImport1.xml and AutoImport2.xml) that get created nightly. Others run large manual imports via the PaperVision Capture Operator Console. Whether automatic or manual, a scanning bureau will often find they have duplicates in a batch. Either duplicated images or, at the document level, duplicate metadata (index) field values.

PaperVision Capture maintains the old “Merge Like Documents” from PaperFlow 7.x days, This handy function checks index values and merges documents where all values imatch. But what does a scanning bureau manager do when not all index fields are identical, but certain values need to be checked for duplicates? Browsing the batch in the Operator Console is tedious. It’s not always easy to track down those dupes.

Scanfree has developed a solution. It is available now for any PaperVision Capture user. Service bureaus, in particular, will benefit from this (we use it in our scanning bureau), but so can end-users running PaperVision Capture scan stations.

The process is simple. The PaperVision Capture admin creates a job as normal in the Capture Administration Console. In the appropriate Operator workstep, insert a Custom Code command and import Scanfree’s PaperVision Capture Duplicates Report code. Then it’s just a matter of changing a few settings.

A setting within the code will tell the function which fields should be checked for duplicate values. It looks like this:

private string[] PVC_FIELDS_TO_SEARCH_FOR_DUPLICATES = {“Field1”, “Field2”, “Field3″}

Just change the quoted values to the exact field names of the PaperVision Capture Job. Don’t need to check three fields? Just add the one (or two) you’re after. A second setting in the code will define an output location for the report written by the custom code:

private const string REPORT_ROOT_FOLDER = @”C:\Reports\”

In this example we use a folder named Reports on the local C drive, but it could just as well be any accessible network location.

And that’s really all there is to it. The scan operator logs in to the PaperVision Operator Console and processes their batch as they normally would. When they’re ready they select the Execute Custom Code command within the Operator Console. A message will pop up to say the report has been written (or a message to say no duplicates exist). Open the report (which is time-stamp named) and the following report appears:

duplicatevalue1,3
duplicatevalue2,7
duplicatevalue3,4
etc…

OR

dupeval1a,dupeval1b,3
dupeval2a,dupeval2b,7
dupeval3a,dupeval3b,4

These examples show reports with one or two index values being tested, respectively. The final numeric value displays the number of duplicates found for each row item.

To learn more about PaperVision Capture visit our Products section. To learn about Scanfree’s Duplicates Reports go to our Development pages. (INSERT LINKS IN THIS SECTION TO CORRECT PAGES).

And incidentally, you can now run multiple instances of this report wll within the same batch, within the same workstep. For example, first run a duplicates test on Field1/Field2 combined, and then separately run a duplicates test on Field 3. In fact, you can now run as many pieces of custom code as you like all within a single operator workstep. We’ll write more about this one soon, but you can get a tech overview today by visiting the project page for the PaperVision Capture Functions Buttons (INSERT LINK HERE FOR THIS PROJECT).

Join the conversation

*