Find Duplicates Use-Case

June 23rd, 2012

Use Case

Help scientists to find out data belonging to duplicated set based on specific conditions likely to identify members of duplicated set. Firstly scientists search the specimen records according to specific condition. And then in the returned data set, scientists will identify duplicated set by creating annotations which will be sent out to the FPCQC network.

Implementation

We’ve implemented a prototype workflow to encapsulate “fuzzy match” function in the Filtered-Push SPNCH demo, which is the first step of find duplicates. This workflow talks to remote mysql database and local Triage service. So you need to start a local Triage service before running this workflow.

Four actors are invovled as shown in the following figure:

1. CollecitonComposer actor imports input for this workflow. Actually the input is maily about the search conditions. Currently the search conditions is to search all the records whose date is 1900. You can also add the other search conditions supported by Triage service following the same rules.

2. SearchSpecimenRecord actor searches the specimen records according to the search conditions by talking to Triage web service.

3. DisplaySpecimenRecord actor displays the search result.

4. CollectionDisplay actor displays the trace of the workflow. You can see the data collection and dependencies history from this file.