For the DVT task, I exported 400 DVT reports with preannotations (conclusions from preliminary …show more content…
But that data is not completely reliable due to report id mismatched issue. We've gotten apparently good performance on visit level. To make sure the pipeline has good performance on document level, I exported another 400 chest CT reports for him to review. Haven't gotten them back yet.
For the rest of project, I think there are some works that you can help, but Dr. Johnson and Reed would be the ones to decide where are we heading. If Dr. Johnson hasn't finished the 400 PE reports review, I think you can work on the rest.
If we are focusing on the operational side, then this project is almost close to the end. However, scientifically, there is an issue that we haven't addressed before. That's the reliability of the annotated data, because we only have one expert annotated the data. Because there are some cases that different experts may have different opinions, we often need multiple experts to make the annotated data, so that we can estimate how much percentage of the total data may result such disagreement using the interannotator agreement (IAA). Then we can more reliably estimate how NLP performs. For instance, if the IAA is around 80%, but NLP performance appears to be above 95%, we cannot trust that performance report for sure. On the contrary, if IAA is around 95%, and NLP performance is obviously lower than that, then we are sure that there is a room for NLP to