Submission¶
Participants should upload submissions via the grand-challenge.org submission system, with a short method introduction paper (at least 4 pages, PDF format). They could only submit the algorithm in the form of a docker container (.tar file).
For the 1st phase (validation phase), participants can validate using the validation set split from the training dataset on their own. For the 2nd phase (testing phase), each team will be allowed to create 5 submissions between June 16th, 2025, and August 15th, 2025 to this phase.
Evaluation¶
Metrics¶
We will use the following seven evaluation metrics to assess the algorithm's denoising performance:
Fréchet Inception Distance (FID), Contrast-to-Noise Ratio (CNR), generalized Contrast-to-Noise Ratio (gCNR), Kolmogorov-Smirnov test (KS), Dice, Average surface distance (ASD), Inference time.
-
The denoising performance is evaluated using the Fréchet Inception Distance (FID) between the denoised images of difficult-to-image subjects and the clean images of easy-to-image subjects, as well as the Contrast-to-Noise Ratio (CNR) and generalized Contrast-to-Noise Ratio (gCNR) between specific annotated regions of interest (ROIs). These ROIs include the myocardial tissues of the ventricular septum (region A), which should be preserved, and the noise artifacts within the ventricle (region B), where multipath reflection noise should be removed.
-
The cardiac structure preservation is evaluated using the Kolmogorov-Smirnov (KS) test. The KS statistic, defined as the maximum distance between the empirical distributions of two samples, serves as a measure of their statistical similarity. The higher KS test value in the ventricular septum (region A) and the lower KS test value in the ventricle (region B) indicate better structure preservation and noise removal.
-
Dice and Average surface distance (ASD) are used to assess the impact of the denoising method on the performance of computer-aided algorithms (using a pre-trained universal ultrasound foundation model 'USFM' to segment the left ventricle).
-
Inference time is used to evaluate the real-time performance of the dehazing method.
Ranking methods¶
For each test case, respectively calculate their 1) FID, 2) CNR; 3) gCNR 4) KS^A, 5) KS^B, 6) Dice, 7) ASD, and 8) Inference time. To balance these aspects, we propose a weighting scheme of 4:3:2:1 for the four evaluation categories (Denoising Performance, cardiac structure preservation, Impact on Downstream Tasks, and Real-Time Performance, respectively).