Machine Learning · Computer Vision · NLP

Turkish Sign Language
Recognition & Animation

Real-time TİD recognition from webcam using a BiLSTM model (Top-1: 76.14%). No GPU required for inference — runs entirely in the browser backed by a FastAPI server.

Sign language recognition demo

Model Performance

76.1% Top-1 Accuracy
89.3% Top-3 Accuracy
91.9% Top-5 Accuracy

Evaluated on a cross-subject validation split — 31 training signers / 6 validation signers, reflecting real-world generalization.

Recognition

MediaPipe Holistic runs entirely in the browser (WASM) and extracts hand + pose landmarks in real time. The 252-dimensional vectors are streamed via WebSocket to a FastAPI backend where the BiLSTM model performs inference.

  • Top-3 predictions with confidence bars
  • Session stats: total predictions, avg confidence, latency ms
  • Scrollable prediction history
  • 226 sign classes, 16 frames per sample

Animation

Type any word or sentence — a stick figure performs each sign sequentially with smooth frame interpolation. Landmark data comes directly from AUTSL videos, not synthesized motion.

  • 184 available signs in animation mode
  • Smooth interpolation between consecutive signs
  • Browsable sign grid with all available words
Hand landmark diagram

Development Pipeline

1

Landmark Extraction

MediaPipe Holistic (model_complexity=2) processes every AUTSL video. 16 frames sampled per video → 225 landmarks per frame (21 left-hand + 21 right-hand + 33 pose keypoints × xyz). Color and depth streams concatenated: 450 features/frame for the Transformer model.

2

Transformer Baseline — 226 classes

Transformer Encoder with sinusoidal positional encoding, 4× Multi-Head Attention blocks (8 heads), AdamW optimizer with cosine decay, label smoothing 0.1, data augmentation (Gaussian noise, time masking, scale jitter, horizontal flip). Input: (16, 450).

3

BiLSTM Final Model — 184 classes

42 low-accuracy classes removed (val accuracy < 50%). Redesigned to use only hand landmarks (feat_dim=252) to reduce input noise. Bidirectional LSTM(256) → LSTM(128) → Dense(512) → Dense(256) → Dense(184, softmax). Labels remapped with sklearn.LabelEncoder.

Dataset — AUTSL

PropertyValue
Total classes226 words
Total videos~38,000
FormatRGB + Depth, 512×512, 30fps
Total signers43
Train signers31 (~28k videos)
Validation signers6 (~4.4k videos)

Architecture Overview

Browser
📷 Webcam
MediaPipe (WASM)
16 × 252 features/frame
WebSocket
FastAPI Backend
BiLSTM Model
256 → 128 → 184 classes
Top-3 + confidence
returned to browser

Technologies Used

Backend

Python FastAPI Uvicorn WebSocket Docker

Machine Learning

TensorFlow Keras BiLSTM Transformer Scikit-learn

Computer Vision

MediaPipe OpenCV NumPy Pandas

Frontend

JavaScript MediaPipe WASM

Training

Google Colab T4 GPU Mixed Precision fp16

Dataset

AUTSL - Ankara University