Small Multi-Object Tracking for Spotting Birds (SMOT4SB) Challenge 2025

Overview

In conjunction with MVA2025, we host the Small Multi-Object Tracking for Spotting Birds (SMOT4SB) Challenge. This challenge builds upon the MVA2023 Small Object Detection for Spotting Birds (SOD4SB) Challenge [1] by extending the task from static image-based detection to dynamic video-based tracking. The primary goal is to detect and track small birds captured by unmanned aerial vehicles (UAVs) across multiple frames.

Unlike conventional small object detection tasks, this challenge introduces motion information to enhance detection accuracy. Inspired by the human visual system, particularly areas such as MT/V5 and the dorsal stream, which specialize in motion perception, this challenge explores whether incorporating temporal information can improve the detection performance of small birds that are otherwise difficult to identify in static images.

Additionally, beyond just detection, this challenge introduces an small multi-object tracking task, where participants must assign consistent IDs to individual birds across frames. This added complexity aims to advance research in Small Object Recognition and improve machine vision techniques for dynamic and cluttered environments.

As with the previous challenge, SMOT4SB aligns with MVA’s mission of bridging academia and industry in the field of machine vision and its applications. The challenge is designed not only to drive fundamental research but also to foster practical applications. Potential real-world use cases include:

Autonomous UAV systems capable of detecting and avoiding bird collisions to ensure safe flight operations.
Agricultural protection systems that identify and track harmful birds to help prevent damage to fields and rice paddies.
Automated ecological monitoring systems that track bird populations and movements to support environmental conservation efforts.

Through this challenge, we aim to stimulate research in small object tracking while encouraging the development of practical solutions for real-world challenges in UAV perception and automated surveillance.

Announcements

[April 10, 2025] Update on the competition timeline: Adjusted dates for Private Test, Paper Submission Deadline, and Notification. Please check the updated the important dates.
[March 4, 2025] Update on the Public Test Phase schedule: Adjusted dates for Phase 1 and Phase 2. Read more
[January 21, 2025] The Phase Split Information: Details on the planned division of the Public Test Phase. Read more
[January 21, 2025] The challenge website is now online!

Task

The SMOT4SB Challenge extends the SOD problem into a multi-object tracking (MOT) task. Participants must track small birds across multiple frames in video sequences captured by UAVs.

Unlike static image-based SOD, this challenge requires:

Detecting birds in each frame of a video sequence.
Assigning unique and consistent tracking IDs to each bird across frames.
Handling occlusions, motion blur, and rapid movement, which commonly occur in UAV-based bird tracking.

To successfully address this challenge, participants must develop MOT models capable of overcoming the following difficulties:

Small Object Size: Birds often occupy only a few pixels, making detection difficult.
High Motion Variability: Birds exhibit irregular and rapid movements, requiring motion-aware tracking strategies.
Instance Identity Preservation: Ensuring consistent tracking IDs across frames despite occlusions and temporary disappearances.
Realistic UAV-based Conditions: Videos may contain camera motion, changing lighting conditions, and background clutter, necessitating robust tracking algorithms.
Camera Motion vs. Object Motion Interaction: Unlike many traditional tracking scenarios, both the birds and the UAV-mounted camera move simultaneously, introducing motion entanglement that challenges conventional object tracking techniques.

Participants are encouraged to explore approaches such as:

Optical flow-based tracking to capture fine-grained bird motion.
Re-identification (Re-ID) models to maintain consistent tracking IDs despite occlusions.
Transformer-based tracking architectures for handling long-term motion dependencies.

The dataset includes annotated bird tracking sequences, allowing participants to develop and evaluate their methods under realistic UAV conditions.

By participating in SMOT4SB, researchers and engineers will contribute to advancing small object tracking methodologies with applications in autonomous UAV navigation, ecological monitoring, and bird strike prevention systems.

Dataset

The dataset for the SMOT4SB Challenge extends the SOD4SB dataset [1] by incorporating tracking IDs into video sequences. The dataset is structured into two main parts:

Pre-training Object Detection Data (Image-based, from SOD4SB dataset):
- Consists of static images with annotated bird locations.
- Designed to help participants pre-train object detection models before fine-tuning on tracking data.
- Derived from the SOD4SB Challenge with the following subsets:
  - Train1 (Modified from [2]): 47,260 images with 60,971 annotated bird instances.
  - Train2 (Subset of SOD4SB dataset [1]): 9,759 images with 29,037 annotated bird instances.
Tracking Dataset (New Video-based "SMOT4SB Dataset"):
- Contains newly collected video sequences for multi-object tracking.
- Includes tracking IDs to facilitate instance-level bird tracking.
- Divided into three subsets:
  - Train: TBA (Includes both newly collected data and SOD4SB dataset-based video sequences).
  - Public Test: TBA.
  - Private Test: TBA.
Data Format:
- Input: Video sequences (for tracking) & Images (for pre-training).
- Annotations: COCO format (bounding boxes, tracking IDs).

Participants are encouraged to utilize the Pre-training Data to build robust object detection models before fine-tuning them on the Tracking Dataset for multi-frame bird tracking.

📁SMOT4SB dataset

Evaluation

In this challenge, we focus on evaluating tracking performance with an emphasis on HOTA (Higher Order Tracking Accuracy) [3], which explicitly considers detection, localization, and association.

However, because this challenge deals with small object tracking, traditional IoU-based evaluation methods commonly used in general tracking face significant challenges. IoU metrics tend to be overly sensitive to localization errors when applied to small objects, leading to unreliable evaluations.

To address this issue, we propose a novel metric called SO-HOTA (Small Object HOTA), inspired by HOTA but specifically designed for small object tracking. For detection and localization evaluation, we adopt Dot Detection (DotD) [4], a method better suited for evaluating small objects compared to conventional IoU-based approaches. Please refer to “SO-HOTA DETAILS” for details of this evaluation metric.

The final rankings in this challenge will be determined solely based on SO-HOTA.

Additionally, a challenge report will be published after the competition, and the top-ranked participants will be invited as co-authors. This report will include an extensive analysis incorporating not only SO-HOTA but also traditional MOT evaluation metrics (e.g., MOTA, IDF1), as well as computational speed and complexity evaluations.

Proposed Evaluation Metric: SO-HOTA (Small Object HOTA)

SO-HOTA adapts the HOTA [3] framework for evaluating tracking performance specifically for small objects. Instead of relying on IoU for similarity scoring, SO-HOTA uses Dot Distance (DotD) [4] for similarity scoring, which compares precise point-like object representations. This is particularly effective for small objects, where IoU-based evaluation often underperforms due to sensitivity to spatial misalignments.

Mathematical Definition

1. Dot Distance (DotD) [4]

DotD measures the normalized Euclidean distance between the centroids of predicted and ground-truth bounding boxes. For a predicted bounding box \( A \) and a ground-truth bounding box \( B \), DotD is given by:

\text{DotD}(A, B) = \exp\left(-\frac{d(A, B)}{s}\right)

Where:

\( d(A, B) \): The Euclidean distance between the centroids of \( A \) and \( B \):

d(A, B) = \sqrt{(x_A - x_B)^2 + (y_A - y_B)^2}

\( s \) : The average size of all objects in the dataset:

s = \sqrt{\frac{\sum_{i=1}^{M} \sum_{j=1}^{N_i} w_{ij} \cdot h_{ij}}{\sum_{i=1}^{M} N_i}}

2. Matching Predictions and Ground Truth

A one-to-one matching between ground-truth and predicted points is established using the Hungarian algorithm, maximizing the sum of DotD similarity scores. A match is valid only if \( \text{DotD}(A, B) \geq \alpha \), where \( \alpha \) is a threshold.

3. True Positives, False Positives, and False Negatives

To calculate the performance metrics, we define the following:

\( TP \) (True Positives): Valid matches with \( \text{DotD}(A, B) \geq \alpha \).
\( FP \) (False Positives): Predicted points not matched to any ground truth.
\( FN \) (False Negatives): Ground-truth points not matched to any prediction.

These definitions form the basis for computing detection and association accuracies.

4. SO-HOTA Scoring

Using the definitions of \( TP \), \( FP \), and \( FN \) from the previous section, SO-HOTA integrates detection accuracy (\( \text{DetA} \)) and association accuracy (\( \text{AssA} \)) as follows:

\text{DetA}_\alpha = \frac{|TP|}{|TP| + |FN| + |FP|} \] \[ \text{AssA}_\alpha = \frac{1}{|TP|} \sum_{c \in \text{TP}} \frac{|TPA(c)|}{|TPA(c)| + |FNA(c)| + |FPA(c)|}

Where:

\( TPA(c) \): True positive associations for track \( c \).
\( FNA(c) \): False negative associations for track \( c \).
\( FPA(c) \): False positive associations for track \( c \).

The final SO-HOTA score for a given threshold \( \alpha \) is then computed as:

\text{SO-HOTA}_\alpha = \sqrt{\text{DetA}_\alpha \cdot \text{AssA}_\alpha}

5. Integration over Thresholds

The final SO-HOTA score is obtained by averaging over a range of thresholds \( \alpha \) from 0.05 to 0.95 in increments of 0.05:

\text{SO-HOTA} = \frac{1}{19} \sum_{\alpha \in \{0.05, 0.10, \dots, 0.95\}} \text{SO-HOTA}_\alpha

Key Advantages of SO-HOTA

Robust to Localization Errors: DotD focuses on centroids, making it less sensitive to object size variations and boundary alignment issues.
Tailored for Small Objects: Unlike IoU, DotD effectively evaluates performance on small objects with minimal spatial extent.
Balanced Evaluation: SO-HOTA inherits HOTA's balanced consideration of detection, localization, and association.

Baseline

The baseline code for this challenge is available at: GitHub Repository

Important dates

Event	Date (23:59 PST)
Site online	2025.1.21
Dataset and baseline code release	2025.2.4
Public test server open	2025.2.4
Public test server close	2025.4.26
Code submission deadline	2025.5.10
Private test results	~~2025.5.31~~ ⇒ 2025.5.24
Paper submission deadline	~~2025.6.28~~ ⇒ 2025.6.14
Notification	~~2025.6.30~~ ⇒ 2025.6.17
Camera-ready deadline	2025.7.6 (for both competition and award-track papers)

Please note that the schedule is subject to change.

Note:
There are only 3 weeks between the private test results (2025. 5. 24), when awardees will be confirmed, and the paper submission deadline (2025. 6. 14).
After the code submission deadline (2025. 5. 10), a brief summary of the number of final submissions will be shared with participants.
We recommend that participants who are considering submitting a paper begin preparing their manuscripts at that point, based on the summary, rather than waiting until the final results are released.

Prizes & Awards

This Challenge offers cash prizes and awards, along with free admission to MVA2025 for award recipients. Additionally, among the top-ranked participants, those whose own technical paper describing their proposed method is accepted through the peer-review process will be granted the right to present their work in the special session of this challenge at MVA2025.

Rank	PrizeMoney	Award
1st	200,000 JPY	Best Solution Award
2nd	150,000 JPY	Runner-Up Solution Award
3rd	100,000 JPY	Honorable Mention Solution Award
4th – 5th	50,000 JPY	-

A chance to win the Best Booster Award will be given to participants. This award will be presented to the individual who is most actively contributing to discussions on the Discord channel. The evaluation criteria will be based on the quality of discussions and the number of positive reactions received. The winner of this award will also receive free admission to MVA2025.

Registration

If you wish to participate in this challenge, please register for the challenge in Codabench to receive email notifications for the challenge, and you can then join the Discord channel from email.

👉Codabench page

Discussion

A dedicated Discord channel will be available for discussions among participants.

Submission

Participants must submit their detection results for the public test dataset in a zip file, where the results are stored in JSON format. For the private test phase, participants must submit their trained models along with their test scripts before the deadline via the Google Form.

After the private test results, only the top 1-5 ranked participants will be invited to co-author the challenge report. This report will summarize the competition results and analysis, including detailed evaluations beyond the ranking metric (SO-HOTA), such as traditional MOT evaluation metrics (MOTA, IDF1, etc.), computation speed, and computational cost. Participation in this challenge report as a co-author is mandatory for all top-ranked participants.

Additionally, each of the top 1-5 ranked participants will be granted the opportunity to submit their own technical paper describing their proposed method. These papers must follow the format and submission guidelines of the MVA main conference. Submitted papers will undergo a peer-review process, and accepted papers will be presented orally in the special session of this challenge. Accepted authors may also choose to present their work as a poster.

Further details regarding the submission process will be communicated directly to the winners.

Challenge organizers

Technical Event Chairs

Norimichi Ukita
Toyota Technological Institute

Yuki Kondo
TOYOTA Motor Corporation

Staff

Riku Kanayama
Toyota Technological Institute

Yuki Yoshida
Toyota Technological Institute

Contributor

Takayuki Yamaguchi
Iwate Prefecture Coastal Regional Development Bureau

Adviser

Masatsugu Kidode
Nara Institute of Science and Technology

Technical Partner

We sincerely appreciate the generous support of Prodrone Co., Ltd. in UAV-based dataset acquisition for this challenge.

Citing SMOT4SB Challenge 2025

If you use the dataset, evaluation metrics, or baseline code from the SMOT4SB Challenge 2025 in your research, please cite the following papers accordingly.

The first citation refers to the challenge report summarizing the dataset, evaluation methodology, and results, which will be published after the competition. The second citation is for the baseline code used in this challenge.

@inproceedings{mva2025_smot4sb_challenge,
  title={{MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results}},
  author={Yuki Kondo and Norimichi Ukita and Riku Kanayama and Yuki Yoshida and Takayuki Yamaguchi and [Challenge winners]},
  booktitle={2025 19th International Conference on Machine Vision and Applications (MVA)},
  note={\url{https://www.mva-org.jp/mva2025/challenge}},
  year={2025}}
Note: This paper is scheduled to be published in July 2025, and the title and other details are subject to change.

@misc{baselinecode_mva2025_smot4sb_challenge,
  title={{Baseline code for SMOT4SB by IIM-TTIJ}},
  author={Riku Kanayama and Yuki Yoshida and Yuki Kondo},
  license={MIT},
  url={\url{https://github.com/IIM-TTIJ/MVA2025-SMOT4SB}},
  year={2025}}

References

[1].: Y. Kondo, N. Ukita, T. Yamaguchi, H.-Y. Hou, M.-Y. Shen, C.-C. Hsu, E.-M. Huang, Y.-C. Huang, Y.-C. Xia, C.-Y. Wang, C.-Y. Lee, D. Huo, M. A. Kastner, T. Liu, Y. Kawanishi, T. Hirayama, T. Komamizu, I. Ide, Y. Shinya, X. Liu, G. Liang, and S. Yasui, "MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results," in Proceedings of the 18th International Conference on Machine Vision and Applications (MVA), 2023. Available: https://www.mva-org.jp/mva2023/challenge
[2].: S. Fujii, K. Akita, N. Ukita, "Distant Bird Detection for Safe Drone Flight and Its Dataset" in Proceedings of 17th International Conference on Machine Vision and Applications (MVA), 2021.
[3].: J. Luiten, A. Osep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, and B. Leibe, "HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking,", International Journal of Computer Vision (IJCV), 2021.
[4].: C. Xu, J. Wang, W. Yang, and L. Yu, "Dot Distance for Tiny Object Detection in Aerial Images," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2021.

Contact

Google form

Please share!

Tweet