Spotify is securing a patent for a “karaoke query processing system” that can process samples lines and melodies, in addition to song names.
It has been a little over a year since Spotify incorporated real-time lyrics to its service, and a newly published application with the U.S. Patent and Trademark Office suggests the world’s largest music streaming platform is considering a dedicated sing-along mode.
The patent filing notes how at times, the search for an instrumental can be “overwhelming,” given the vast song selection offered in karaoke. Spotify cites the need for a system that can find tracks quickly and without requiring the user to remember song titles. The company says it is possible to find a song’s instrumental version using only a user’s a capella rendition of the parts they know.
A system diagram of Spotify’s karaoke system, in accordance with some embodiments.
Spotify’s karaoke feature may comprise a performance system and a processing system, linked via communication networks. The performance system serves as the karaoke machine and is expected to come equipped with a microphone, a speaker, a display, and a user interface. A person’s computer, smartphone, smart speaker, or similarly capable device may fulfill this role.
Meanwhile, the processing system may come in the form of one or more cloud-connected servers. These would contain Spotify’s catalog of karaoke tracks, which could include millions of instrumentals, depending on licenses provided.
A separation module isolates the instrumentals, automatically extracting the frequencies for a song’s vocal components. The patent application cites the research behind this technology, which may also be applied to a capella queries recorded in loud environments. Separate libraries are established for backing tracks and the isolated vocals, which will serve as “reference tracks” for searches.
To find tracks at low latency, the processing system only looks at portions of reference tracks that users are more likely to remember and use for their search. An annotation module is in charge of marking a single verse and a single chorus for each reference track. It works on the assumption that verses are typically similar in their melody lines, and that the same is true for choruses.
The system may also account for pitch and key differences between annotated reference tracks and submitted a capella samples. A transposition module duplicates the reference tracks and transposes each duplicate into one of the 12 subdivisions of the octave; more than 12 subdivisions may be employed should there be a need for more precision half-tones. However, increasing the number of transposed tracks would lead to higher processing latency, according to Spotify.
Spotify illustrates the system working with 10-second audio clips, split into four 2.5-second segments. A matching module would use each segment in succession until it finds a corresponding item in the reference track library. A threshold is set to maintain the accuracy of search results; matching the same song twice in a row may tell the system that it has found the requested backing track, for instance.
ng.The featured patent application, “Karaoke Query Processing System”, was filed with the USPTO on December 18, 2019 and published thereafter on June 24, 2021. The listed applicant is Spotify AB. The listed inventors are Marco Marchini and Nicola Montecchio.