Baidu, a Chinese search engine and AI tech giant, recently filed a patent in the China National Intellectual Property Administration (CNIPA) for a system that could help humans communicate with animals. The system uses AI to analyze animal sounds, movements, and body signals, and translate them into human language.
The idea of decoding animal communication isn’t new. Apps like MeowTalk and Dog Translator have offered playful attempts to interpret pet sounds and behaviors, reflecting a growing interest in animal-human interaction. In 2002, Takara Co., a Japanese toy company, released the BowLingual device, a handheld translator for dog barks. It uses a wireless microphone to capture barking, then categorizes the sound into emotional states like happiness or frustrations and displays a matching phrase on a screen.
In 2013, Nordic Society of Inventions and Discovery, a small Scandinavian research lab, announced their device called No More Woof, a wearable EEG device that aims to read a dog’s brainwaves and translate mental states into human language.
Earlier patents from individual inventors such as U.S. Patent No. 5,790,033 and U.S. Pat. App. No. 2007/0067161, proposed systems that recognize animal sounds based on audio and behavioral cues.
This initiative places Baidu among other global efforts aiming to decode animal communication using AI. For instance, Project CETI (Cetacean Translation Initiative) and the Earth Species Project are also exploring the use of artificial intelligence to understand non-human languages.
In our previous blog, we explored Meta’s SeamlessM4T, a model designed to translate across 100+ human languages. Baidu, too, is building momentum in this space, with a growing portfolio of patents in the U.S. related to real-time speech translation. As of this writing, the specific publication or application number for Baidu’s animal translation patent has not been disclosed publicly. Instead, we will highlight several of Baidu’s speech translation patents, and how they fit into the broader AI landscape of language and communication technologies.
Baidu’s Patenting Activity in Speech Translation
Baidu’s patent filings grew from 2015 and peaked in 2021. This peak aligns with the increase in investments in Natural Language Processing (NLP). According to reports, 60% of tech leaders increased their NLP budgets by at least 10%. That same year, Baidu demonstrated its AI-powered speech recognition within the Baidu App, achieving 98% accuracy even with long, mixed language queries.
Key milestones that likely contributed to Baidu’s growth in AI and speech recognition include:
- The launch of PaddlePaddle in 2016, an open-source deep learning platform, designed to make AI development more accessible and efficient.
- The introduction of ERNIE (Enhanced Representation through kNowledge IntEgration) in 2019, a pre-training framework that learns language more effectively by combining multiple tasks and gradually building knowledge.
- An upgrade to PaddlePaddle in 2021, adding a graph engine, new NLP models, and PaddleFlow to simplify AI development.
Baidu’s Speech Translation: Top Law Firms
Baidu, although a Chinese company, relied on U.S. law firms to support its intellectual property filings. Leading the list is Womble Bond Dickinson LLP, followed by Brooks Kushman P.C. and North Weber & Baugh LLP. Other notable firms include Lippes Mathias LLP, Seed IP Law Group, and Knobbe Martens, each contributing to Baidu’s growing patent portfolio in AI and digital technologies.
Baidu’s Speech Translation: Top Tech Areas
Baidu’s U.S. patent filings highlight a strong focus on technologies that support speech translation. Leading is G06F (electrical digital data processing), which reflects Baidu’s heavy investment in computational infrastructure necessary for real-time AI applications. This is followed by G06N (computing models), which include machine learning systems used to train language models.
G10L (speech recognition) also plays a key role in Baidu’s translation tools by converting spoken words into text before they’re translated. This is supported by other technologies such as image and video recognition (G06V) and image data processing (G06T), which help AI better understand context and is useful in applications that combine voice and visuals. These categories reflect Baidu’s goal to create fast and accurate speech translation and combining key AI tools to turn speech into real-time translations for a smooth user experience.
Baidu’s speech translation patents
Direct speech translation cuts recognition errors
Language barriers can make real-time communication difficult, especially when speech recognition systems mishear words and cause wrong translations.
U.S. Patent No. 11,328,133 addresses this issue by skipping the traditional step of converting speech to text before translating. Instead, it extracts key features from the spoken language using Mel-frequency cepstrum analysis, then feeds those features into a pre-trained translation model.
This model, trained on matched speech and text examples from different languages, converts the speech directly into translated text. That text is then used to create and play back a voice response in the second language. By translating speech directly, this method reduces errors and boosts accuracy, making it more useful for travel, multilingual chats, and customer support.
The patent was filed on September 27, 2019, and was granted on May 10, 2022. The listed inventors are Hao Xiong, Zhongjun He, Xiaoguang Hu, Hua Wu, Zhi Li, Zhou Xin, Tian Wu, Haifeng Wang. Lippes Mathias LLP represented Baidu in the patent filing.
Faster translations without waiting for full sentences
Real-time speech translation often suffers from delays due to its reliance on complete sentences marked by punctuations like period. As a result, translation systems take too long to respond, causing delays and a poor experience especially in fast-paced settings like live events, international meetings or customer support.
U.S. Patent No. 11,132,518 introduces a smarter translation pipeline. It first recognizes speech from a source language and adds the resulting text to previously collected segments. Instead of waiting for a full stop, the system then sends this combined text to a pre-trained discriminant model.
This model decides whether the current input is meaningful and complete enough to translate. If the model gives a positive result, the system immediately translates and outputs the result. If not, it waits for more speech input and repeats the process. This approach uses meaning instead of punctuation to decide when to translate, making real-time translation faster and more efficient.
The patent was filed on November 21, 2019, and was granted on September 28, 2021. Chuanqiang Zhang, Tianchi Bi, Hao Xiong, Zhi Li, Zhongjun He, Haifeng Wang are listed as inventors. Weaver Austin Villeneuve & Sampson LLP represented Baidu in the patent filing.
Context-aware transcription for better summaries
Traditional speech transcription systems excel at converting spoken words into text, but often deliver generic outputs that don’t fully capture the context or intent behind the audio. This is a problem when transcribing things like meetings, lectures, or personal notes, which each need a different style of summary.
U.S. Pat. App. No. 2025/0006187 addresses this challenge by integrating scenario awareness directly into the transcription process. It first detects the type of scenario based on where the user starts in a storage app. After recognizing the speech, it uses a special language model with scenario-specific prompts to create summaries that fit the context. This makes the transcriptions more relevant and useful by customizing the output for different situations.
The patent application was filed on September 13, 2024, and was published on January 2, 2025. The inventors are Hongtao Zou and Si Chen and was represented by Seed IP.









