#asr

5 article(s) tagged with "asr".

Japanese IT-Domain ASR Benchmark 2026: Granite 4.0 vs Qwen3 vs Whisper

April 30, 2026asr benchmark

We tested 9 ASR models on Japanese IT jargon — SaaS names, abbreviations, company terms. Benchmarks include Granite 4.0, Qwen3-ASR, Cohere Transcribe, and Whisper variants.

Released the Highest-Accuracy Japanese ASR Model for Free

April 29, 2026asr

Fine-tuned Qwen3-ASR-1.7B for proper noun recognition. Free on Hugging Face. Outperforms Whisper on both CER and proper noun F1. Also available in Sonophie for macOS.

Are WER and CER Not Enough? Re-evaluating Japanese ASR with Noun and Proper Noun F1 Scores

April 21, 2026asr benchmark

Japanese speech recognition is more challenging than many other languages due to the wide variety of kanji and proper noun representations. This primary benchmark article implements a morphological-analysis-based F1 score evaluation using Sudachi to quantify "kanji and proper noun recognition accuracy" that WER and CER struggle to capture, and re-evaluates 9 open-source ASR models.

Building a High-Quality ASR Training Data Pipeline with Multiple Models — Design Philosophy and Implementation of Whisper × Qwen3 × LLM

April 16, 2026asr dataset

A deep-dive into a two-stage pipeline that integrates Whisper and Qwen3-ASR transcriptions via LLM diff-merging to generate ASR training data that surpasses what any single model can achieve. The pipeline combines Whisper's strength in kanji and proper nouns with Qwen3's lower overall CER, then applies Forced Alignment to assign word-level timestamps. The downstream chunk-splitting stage automates natural 20–30 second chunk generation using sentence-end scoring, followed by Whisper re-transcription for CER-based quality filtering.

How I Finetuned IBM Granite Speech 1B on Japanese Audio and Improved CER from 0.37 to 0.14

April 8, 2026asr finetune

I finetuned IBM Granite Speech (`granite-4.0-1b-speech`) on 100 hours of Japanese speech data and reduced CER from 0.37 to 0.14. The official script's Projector+LoRA-only training has a ceiling on accuracy gains. The key breakthrough was additionally training `lm_head` and the last 8 layers of the Language Model. The result matches Qwen3-ASR-1.7B (CER 0.14) with only 1B parameters.