Author(s)

R Kaviya, Vitus Anto, A Abhishek

  • Manuscript ID: 120291
  • Volume 2, Issue 5, Apr 2026
  • Pages: 129–137

Subject Area: Computer Science

DOI: https://doi.org/10.5281/zenodo.19892975
Abstract

The domain of industrial machine vision has witnessed rapid advancement with the integration of deep learning-based object detection models for automated quality inspection. In steel manufacturing industries, surface defects such as crazing, inclusion, pitted surfaces, scratches, and rolled-in scales significantly degrade mechanical properties and compromise structural reliability. Traditional inspection methods including ultrasonic testing and manual visual inspection suffer from low efficiency, subjective bias, and high operational cost. To address these limitations, modern approaches employ single-stage object detection algorithms like YOLO variants due to their real-time processing capability and deployment flexibility. The base paper introduces YOLOv8n-GSE, which enhances YOLOv8n using GAM attention, CSP-ABAN modules, GSConv layers, ESCD detection head, and PIoUv2 loss function. Experimental validation on the NEU-DET dataset demonstrates improved detection accuracy with reduced computational complexity, making it suitable for edge deployment. Despite achieving improved performance, the base system still exhibits limitations in handling complex industrial backgrounds and highly variable defect scales. The integration of GAM increases parameter sensitivity under varying illumination conditions. Although CSP-ABAN reduces redundancy, it may weaken deep semantic representation when handling ultra-small defects. The ESCD detection head improves lightweight performance but relies on fixed-scale feature aggregation, limiting adaptive response to irregular defect shapes. Furthermore, the PIoUv2 loss function, while improving regression precision, depends heavily on hyperparameter tuning and may exhibit instability under dense defect clustering scenarios. Generalization performance on complex datasets such as GC10-DET reveals a noticeable drop in detection accuracy compared to NEU-DET, highlighting robustness challenges. To overcome these drawbacks, we propose HARF-Net, a hybrid architecture integrating Dual-Path Transformer Attention (DPTA) with Adaptive Multi-Scale Residual Fusion (AMRF) and a novel Dynamic Hybrid IoU-Focal Regression (DHIFR) loss. The proposed model combines CNN-based local feature extraction with lightweight transformer-based global contextual modeling to enhance finegrained defect perception. Compared with YOLOv8n-GSE and baseline YOLOv8n, HARF-Net introduces a hybrid regression mechanism that dynamically balances IoU, center distance, and aspect ratio penalties. Experimental comparison against YOLOv8n and YOLOv8n-GSE demonstrates superior mAP improvement while maintaining competitive GFLOPs. The hybrid design ensures robust detection across multi-resolution datasets and complex industrial environments.

Keywords
Steel Surface Defect DetectionHARF-NetHybrid AttentionCNN-Transformer FusionYOLOv8Multi-Scale Feature FusionIoU Regression LossReal-Time Industrial InspectionEdge ComputingIndustry 4.0Automated Quality ControlDeep Learning.