Tag Archives: Lightlda

15 Years of News – Analyzing CNN Transcripts: Topics

High-level time-based visualization of topics in our corpus

High-level visualization of topics in CNN’s corpus

As we saw in the previous article, temporal analysis of individual keywords can be very interesting and uncover interesting trends, but it can be difficult to get an overview of a whole corpus. There’s just too much data.

One solution would be to group similar subjects together. This way, we could scale down a corpus of 500,000 keywords to a handful of topics. Of course, we’ll be losing some definition, but we’ll be gaining a 10,000 feet view of the corpus. In all cases, we can always refer to the individual keywords if we want to take a closer look. Continue Reading