Urdu OCR
Optical Character Recognition system for Urdu language text with video indexing capabilities.
Developed a custom Urdu OCR recogntion model using Tensorflow, achieving 88% accuracy in detecting/recognizing Urdu text from low-resolution news videos. Built a content-based video indexing system (Python, C#) for news archives, reducing search time by 60% across 10,000+ hours of footage.
Text Detection Pipeline
- Implemented region-based text detection
- Developed custom preprocessing techniques for low-resolution video frames
- Created specialized filters for Urdu text enhancement
Recognition System
- Implemented deep learning models for text recognition
- Developed post-processing algorithms for accuracy improvement
Video Indexing System
- Built content-based video indexing system using C#
- Implemented keyframe extraction and metadata tagging
- Created semantic querying capabilities
Results
- 88% accuracy in Urdu text detection/recognition
- 60% reduction in search time across 10,000+ hours of footage
- Published in EURASIP Journal on Image and Video Processing
Publication
Detection and recognition of cursive text from video frames
Technologies Used
- Python
- TensorFlow
- OpenCV
- Tesseract OCR
- Video Processing
- Deep Learning