May 1, 2026Best Japanese ASR Models 2026: Qwen3 ASR vs Whisper vs Reazonspeech BenchmarkWe benchmarked 8 Japanese ASR models — Qwen3-ASR, Whisper, ReazonSpeech, and Parakeet — on WER and inference speed. See which wins for accuracy, speed, and broadcast audio.Neosophie Team
May 1, 2026diarizationBest Open-Source Speaker Diarization Models 2026: NeMo vs Pyannote vs VibeVoice BenchmarkedVibeVoice tops our 2026 benchmark across 6–14 speaker scenarios. We tested NeMo (SortFormer, MSDD), Pyannote, and VibeVoice on real-world audio — full DER scores, latency, and a model picker by use case.Neosophie Team
April 30, 2026asrbenchmarkJapanese IT-Domain ASR Benchmark 2026: Granite 4.0 vs Qwen3 vs WhisperWe tested 9 ASR models on Japanese IT jargon — SaaS names, abbreviations, company terms. Benchmarks include Granite 4.0, Qwen3-ASR, Cohere Transcribe, and Whisper variants.Neosophie Team
April 29, 2026asrReleased the Highest-Accuracy Japanese ASR Model for FreeFine-tuned Qwen3-ASR-1.7B for proper noun recognition. Free on Hugging Face. Outperforms Whisper on both CER and proper noun F1. Also available in Sonophie for macOS.Neosophie Team
April 29, 2026llmfinetuneCustom LogitsProcessor for HuggingFace Transformers: Fixing JSON Repetition Hallucinations in QwenStep-by-step guide to building a custom LogitsProcessor in HuggingFace Transformers. Applies no-repeat-ngram control selectively inside JSON Content fields to suppress repetition hallucinations during fine-tuned Qwen inference.Neosophie Team
April 29, 2026ttsZero-Shot Voice Cloning with OpenVoice: Emotion Control Across 8 Styles from a 14-Second ReferenceHow to run OpenVoice for zero-shot voice cloning with emotion style control. Hands-on results across 8 emotion styles using a 14-second reference clip, including code with se_extractor.Neosophie Team
April 29, 2026osswhisper.cpp Local Inference on Mac: Offline Transcription with CoreML & Apple SiliconOffline transcription on Mac using whisper.cpp + CoreML. Supports microphone and web conference audio, with no data transmission and completely free to use.Neosophie Team
April 27, 2026diarizationmetricsSpeaker Diarization Metrics: DER, JER, Purity & Boundary Error with PythonDER alone won't tell you why your diarization fails. Learn DER, JER, Purity, Coverage, and Boundary Error — with formulas and working Python code using pyannote.Neosophie Team
April 27, 2026ttsBest Open-Source TTS Models 2026: Chatterbox vs Fish Speech vs Qwen3Hands-on benchmarks of Chatterbox, Qwen3-TTS, Fish Speech 1.5, CosyVoice 3.0 & IndexTTS2 — voice quality, latency, zero-shot cloning, emotion control, and license gotchas.Neosophie Team
April 21, 2026asrbenchmarkAre WER and CER Not Enough? Re-evaluating Japanese ASR with Noun and Proper Noun F1 ScoresJapanese speech recognition is more challenging than many other languages due to the wide variety of kanji and proper noun representations. This primary benchmark article implements a morphological-analysis-based F1 score evaluation using Sudachi to quantify "kanji and proper noun recognition accuracy" that WER and CER struggle to capture, and re-evaluates 9 open-source ASR models.Neosophie Team
April 16, 2026asrdatasetBuilding a High-Quality ASR Training Data Pipeline with Multiple Models — Design Philosophy and Implementation of Whisper × Qwen3 × LLMA deep-dive into a two-stage pipeline that integrates Whisper and Qwen3-ASR transcriptions via LLM diff-merging to generate ASR training data that surpasses what any single model can achieve. The pipeline combines Whisper's strength in kanji and proper nouns with Qwen3's lower overall CER, then applies Forced Alignment to assign word-level timestamps. The downstream chunk-splitting stage automates natural 20–30 second chunk generation using sentence-end scoring, followed by Whisper re-transcription for CER-based quality filtering.Neosophie Team
April 8, 2026asrfinetuneHow I Finetuned IBM Granite Speech 1B on Japanese Audio and Improved CER from 0.37 to 0.14I finetuned IBM Granite Speech (`granite-4.0-1b-speech`) on 100 hours of Japanese speech data and reduced CER from 0.37 to 0.14. The official script's Projector+LoRA-only training has a ceiling on accuracy gains. The key breakthrough was additionally training `lm_head` and the last 8 layers of the Language Model. The result matches Qwen3-ASR-1.7B (CER 0.14) with only 1B parameters.Neosophie Team
March 12, 2026books5 Books a Working AI Engineer Actually Read for Real-World Accuracy Improvement and Product DevelopmentThe reason you can call AI APIs but still can't improve accuracy is a lack of understanding of what's happening inside the model. These 5 books — all O'Reilly titles written by practitioners at the forefront of the field, including Hugging Face engineers and Chip Huyen — are the standard reading list for ML engineers worldwide.Neosophie Team
March 12, 2026ragReplaced Embedding RAG with PageIndex on My Own Site — Here's What Actually HappenedAfter hitting a wall with hybrid BM25 + Embedding + Rerank search (wrong retrievals, high infra cost, Japanese embedding model limitations), I implemented **PageIndex** — a vector-free RAG architecture — as the AI assistant for my company site. The result: no VectorDB, no embedding model, no Reranker needed. Two-stage LLM-based filtering via a URL tree navigated to the correct pages for queries like *"What products do you have?"* and *"What ASR-related blog posts exist?"* — zero vector math involved.Neosophie Team