An intelligent framework for visually impaired people through indoor object Detection-Based assistive system using YOLO with recurrent neural networks – Scientific Reports

Why indoor object detection for VIPs is still hard

Indoor object detection (OD) for visually impaired people (VIPs) has progressed rapidly, yet real-world deployments still hit familiar walls. Many systems struggle to run in real time on portable hardware, which limits usability. Privacy concerns remain under-addressed, undermining user trust. Accuracy often drops under indoor realities like poor lighting, clutter, occlusion, and frequent movement. And while multimodal inputs (such as depth and voice) could help, they are rarely integrated into lightweight, cost-effective devices. Small object detection and robust adaptation across diverse homes, offices, and public spaces are also inconsistently handled.

The research gap is clear: the field needs efficient, privacy-aware, adaptable frameworks that balance accuracy, speed, and hardware constraints—without compromising usability.

Meet IODAS‑IOA: a real‑time assistive stack

The proposed IODAS‑IOA model targets safe, independent navigation for VIPs through a carefully engineered pipeline: image pre-processing, object detection, transfer learning, feature extraction, classification, and automated parameter tuning. The design aim is a smooth blend of accuracy, robustness, and speed—suited to practical, on-device assistive use.

Sharper inputs: Gaussian filtering for cleaner frames

The pipeline begins with Gaussian filtering (GF), a classic yet effective pre-processing step that reduces noise while preserving crucial edges. Unlike median or bilateral filters, GF provides a smooth, artifact-free blur and handles Gaussian noise typically found in low-light, cluttered indoor scenes. The result: clearer contours and more reliable features without heavy compute overhead—ideal for real-time assistive systems.

By suppressing high-frequency noise, GF improves downstream detection in variable lighting and helps stabilize performance when scenes get complex or crowded.

Fast, accurate detection: YOLOv12 with attention

For OD, the system employs YOLOv12, chosen for its strong speed–accuracy trade-off in real-time settings. It enhances small and overlapping object handling—common indoors—while retaining the single-stage efficiency that makes YOLO architectures attractive on constrained hardware.

What’s new here is an attention-centric design:

  • Progressive attention mechanisms help the model focus on relevant regions and ignore noise.
  • Area Attention (AA) segments images into vertical or horizontal regions, preserving a large receptive field while cutting compute—useful for dense scenes and high resolutions.
  • Residual Efficient Layer Aggregation Networks (R‑ELAN) improve training stability and feature aggregation through residual connections.
  • Architectural pruning and the inclusion of Flash Attention streamline memory access and inference speed.

Together, these upgrades balance precision and throughput, making YOLOv12 a strong fit for indoor assistive perception.

Deeper semantics: DenseNet161 for feature extraction

IODAS‑IOA uses DenseNet161 to extract rich, discriminative features. DenseNet’s hallmark is its dense connectivity: each layer feeds into every other layer within a block. This encourages feature reuse, strengthens gradient flow, and often reduces parameter count compared to alternatives like ResNet or VGG—without sacrificing representational power.

For indoor scenes packed with similar-looking items, DenseNet161’s hierarchical feature learning and global average pooling help separate fine-grained categories while generalizing across varied layouts and lighting. The result is a solid foundation for reliable classification.

Context that counts: BiGRU with attention for classification

After feature extraction, classification is handled by a bidirectional GRU (BiGRU) augmented with attention. The bidirectional setup captures both past and future context in the feature sequence, improving understanding of temporal and structural dependencies—useful when objects are partially occluded or only momentarily visible.

The attention mechanism highlights the most informative features, boosting discrimination among similar classes and reducing the impact of noise. Compared with vanilla RNNs or one-way GRUs, BiGRU‑AM tends to be more robust in the dynamic, cluttered conditions typical of indoor environments.

Smarter tuning: Ivy Optimization Algorithm (IOA)

Hyperparameters can make or break performance. To tune them efficiently, the framework uses the Ivy Optimization Algorithm (IOA), inspired by the coordinated growth and diffusion of ivy plants. IOA balances exploration and exploitation, helps avoid local minima, and often converges with fewer iterations than grid search or purely gradient-based methods—cutting computational cost while improving accuracy.

Each “ivy” represents a candidate solution that adapts its direction using information from its neighbors and the current best solution. Iteratively, the population moves toward better configurations, guided by a fitness function centered on classification accuracy. This dynamic search makes IOA well-suited for complex, multi-parameter deep learning stacks.

Putting it all together

  • Pre-processing with GF cleanly denoises frames while protecting edges, stabilizing detection in low light.
  • YOLOv12’s attention-driven, single-stage detector spots small and overlapping objects quickly and accurately.
  • DenseNet161 extracts rich, reusable features that generalize across diverse indoor scenes.
  • BiGRU with attention focuses on the most relevant signals for reliable classification under occlusion and clutter.
  • IOA streamlines hyperparameter tuning, improving performance with fewer compute cycles.

Why it matters

For VIP users, assistive perception is only as helpful as it is fast, accurate, and trustworthy. IODAS‑IOA addresses several long-standing bottlenecks: it is designed for real-time operation, maintains robustness under indoor variability, and targets efficiency on constrained hardware. While broader issues like privacy-by-design and multimodal integration require ongoing attention, this framework marks a compelling step toward practical, dependable indoor navigation and object awareness—bringing the promise of AI assistance closer to everyday independence.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Unlock Your Escape: Mastering Asylum Life Codes for Roblox Adventures

Asylum Life Codes (May 2025) As a tech journalist and someone who…

Challenging AI Boundaries: Yann LeCun on Limitations and Potentials of Large Language Models

Exploring the Boundaries of AI: Yann LeCun’s Perspective on the Limitations of…

Unveiling Oracle’s AI Enhancements: A Leap Forward in Logistics and Database Management

Oracle Unveils Cutting-Edge AI Enhancements at Oracle Cloud World Mumbai In an…

Charting New Terrain: Physical Reservoir Computing and the Future of AI

Beyond Electricity: Exploring AI through Physical Reservoir Computing In an era where…