Author(s)

Annavaram Chandrakanth Reddy, Syed Reahan, vyyapuri dileep, Dr. Raja K

  • Manuscript ID: 120274
  • Volume 2, Issue 4, Apr 2026
  • Pages: 325–334

Subject Area: Computer Science

DOI: https://doi.org/10.5281/zenodo.19557744
Abstract

It's not easy to tell how much a coin is worth on its own because it needs computer vision, deep learning, and technology that helps people.The baseline model, ICDRNet, is designed for recognition of Indian coins and utilizes DenseNet feature propagation and depthwise separable convolutions with CBAM attention, as well as a Dilation Enabled Inverse Bottleneck (DEIB) module. Though the system has achieved weighted F1-scores of over 97% on the IMCD, ICCD, ICDD, and CIDCIC benchmark datasets, it is, however, limited to single-coin cases and is hindered by coins in busy backgrounds. Additionally, the challenge of no transformer-based global context modeling limits system representation. To mitigate these issues, we suggest a Hybrid CNN-Transformer Multi-Scale Attention Fusion (MSAF) framework that combines YOLO-based coin localization, adaptive dilated pyramid convolutions, and a transformer token fusion layer. This strategy is aimed at managing multi-coin detection, enhanced foreground isolation, and improved classification on mobile assistive technology.

Keywords
computer visiondeep learningIndian coin recognitioncoin denomination detectionconvolutional neural networksmulti-scale attentionhybrid CNN transformer modelassistive technology.