Urdu OCR

Optical Character Recognition system for Urdu language text with video indexing capabilities.

Developed a custom Urdu OCR recogntion model using Tensorflow, achieving 88% accuracy in detecting/recognizing Urdu text from low-resolution news videos. Built a content-based video indexing system (Python, C#) for news archives, reducing search time by 60% across 10,000+ hours of footage.

Text Detection Pipeline

  • Implemented region-based text detection
  • Developed custom preprocessing techniques for low-resolution video frames
  • Created specialized filters for Urdu text enhancement

Recognition System

  • Implemented deep learning models for text recognition
  • Developed post-processing algorithms for accuracy improvement

Video Indexing System

  • Built content-based video indexing system using C#
  • Implemented keyframe extraction and metadata tagging
  • Created semantic querying capabilities

Results

  • 88% accuracy in Urdu text detection/recognition
  • 60% reduction in search time across 10,000+ hours of footage
  • Published in EURASIP Journal on Image and Video Processing

Publication

Detection and recognition of cursive text from video frames

Technologies Used

  • Python
  • TensorFlow
  • OpenCV
  • Tesseract OCR
  • Video Processing
  • Deep Learning