Browse Podcasts: FAQ
SoundSage lets you browse podcasts based on specific topics and themes, more specific than any other podcast directory. Here's some more information on how to use it, how it works, and answers to other frequently-asked questions:
How can I browse topics to find specific podcasts?
Most podcast directories have a list of broad topics, like "Sports" and "Religion," that you can browse top podcasts in these categories (usualy ranked by popularity). Usually, though, you can't see anything more specific than the top-level themes.
What's more, each podcast producer has to select a category for themselves, and they can only select themes from the pre-determiend list, usually the one created initially by Apple Podcasts. For instance, a podcast might select that they are part of the category Fiction > Science Fiction, but they can't specify that they are specifically a podcast about Dr. Who.
SoundSage's powerful browsing tool uses topic modeling to identify specific podcast themes, including "hidden" topics that can't be found using the pre-existing list of topics, or there could be sub-themes that bring together specific subsets of podcasts. It also organizes them so that you can explore theme. Start with top level categories, like "Business," and drill down from there. You'll be surprised at what you find!
What are these topics? How are they created?
SoundSage analyzes the entire podcasting ecosystem and uses semi-supervised machine-learning and AI in order to understand the various topics and themes. We then created a hierarchical topic model—a set of topics and themes that are interrelated to one another. This topic model is auto-generated by our servers, so it represents how the algorithm understands the various podcasts that are available. The more podcasts that exist on a topic, the more likely it was that the algorithm would identify the topic.
For most of the topics, you can also see a list of the top keywords. These are some of the keywords that SoundSage has identified that are associated with podcasts in that theme or topic. Internally, each topic is a probablistic model of how frequently these keywords (and other ones too) appear in each of the podcasts.
Most of the lower-level topics use some of the top keywords as the titles for the topic.
Why did you group together X and Y? That seems a little odd.
Actually, the word you're looking for is uncanny. The computer algorithm looked at all of the podcasts and made some interesting "choices." For instance, it put together podcasts on Australia, New Zealand, the UK, and Ireland. That's pretty cool! Another example is to put together Star Wars and the Marvel Cinematic Universe under one category.
So, how did the algorithm figure this out? Computers don't actually "understand" anything; it's more correlation than causation. That's to say, the algorithm "understood" that these regions (and thus the podcasts that discuss them) are related to each other, not because of the shared history of the British Commonwealth, but because of the shared language of these regions, for instance the common use of the word "mum." Or similarly, Disney owns the IP of both Star Wars and the MCU, which means that people who podcast on these topics might often talk about similar issues.
Altogether, this probablistic means that the SoundSage algorithm is able to identify interesting and novel topics that allows us to understand podcasting and the public discourse in a better way.
I can't find a specific topic, or the topic of my own podcast. Why isn't it here?
SoundSage's topic modeling algorithm is mostly unsupervised, which means that we did not use experts (i.e., humans) to tell the computer what topics should be selected. Instead, the computer looked at all of the podcasts and determined likely themes which it could use to group podcasts together. As it is a computer, sometimes the topics were a little strange, or it grouped together two themes that you might not expect to be related—but frequently, we found that there were a lot of topics that discussed both of them. We also limited the number topics to something reasonable (about 400 topics total), and went through the list of computer-generated topics by hand to remove some topics that were not particualrly useful for actually exploring the podcast landscape.
We are working to add more topics, but we believe that the computer-generated model does a very good job at identifying the top topics which can be found in the podcasting ecosystem.
What is topic modeling? How do you apply it to podcasts?
Topic modeling combines machine learning with probablistic modeling with the goal of taking a corpus or library of texts and determining the various topics within it. For instance, you could take a corpus of newspaper articles and a topic modeling algorithm could parse it into 10, 20, or 100 topics. Sometimes, these topics are ones that you expect to find; given a newspaper archive, you would not be surprised that "Politics" is a topic. However, topic modeling has the ability to identify "latent" topics, which are themes that are deeply hidden in the corpus; one could identify these topics by hand if they were intimately familiar with the library of texts, but computer-generated topic models allow us to understand the corpus in an in-depth way without reading all of the texts.
Frequently, topic modeling is used to explore a large corpus. Data scientists will take a set of documents (for instance, a message board or transcripts from a call center) and use topic modeling in order to identify some key topics in an automated fashion, rather than reading all of the texts themselvse and trying to figure out the topics by hand.
At SoundSage, We used topic modeling on podcasts with two goals in mind. First, we did want to explore the podcasting ecosystem and understand it better. More importantly, though, we wanted to create a usable tool that would allow listeners to explore podcasts for themselves. To do this, we subjected all of the podcasts to intense natural language processing and normalization (similar to the work to generate the podcast similarity index), and then used a number of topic modeling algorithms as well as an intense experimentation process in order to produce the best topic model possible, that would provide usable and understandable topics that people could use to discover new podcasts.