With 15 years of CNN transcripts loaded a database, I could now run queries to visualize the occurrences of words – like names – across time. Since I used a textual database name ElasticSearch, I could use Kibana to chart the keywords. Kibana is a good tool to build dashboards, but it’s not really suited to analyze extensively time series because it lacks an easy way to add several search terms on the same chart. Also, it doesn’t easily show percentages of occurrences in a corpus for a giving time period instead of absolute occurrences. This makes Kibana a good tool for a quick look at the data or to debug an issue with our transcript scrapper.
Tag Archives: Politics
A while back, I saw that the Internet Archive hosted an archive of CNN transcripts from 2000 to 2012. The first thing that came to my mind was that this was an amazing corpus to study. It contained the last 12 years of news in textual form at the same place. I felt that it would be an amazing project to retrieve all the transcript from 2000 to today and someone went already to the trouble of downloading this corpus.
Unfortunately, the data was basically a dump of the transcripts pages from CNN. This isn’t a problem for archival purposes, but for analysis, it would make things a bit difficult. For my new project, it meant that I would need to find a way to download all the transcripts from CNN, parse them and dump them to a database. To make things even more difficult, the HTML from the early 2000s was more about form that function. In other words, the CNN webmasters (in the 2000, web designers or developers didn’t exist, they were webmasters!) would throw something that would render in Internet Explorer or Netscape Navigator and call it a day. There was no effort in making the layout and content organized.
Using the data cleaned and released by Cedric Sam and Thomas de Lorimier (available on Cyberpresse), I geocoded the data and applied a density map function. The map shows interesting financial patterns in Montréal for the Bloc Québecois and for the Liberal Party of Canada. I’ve chosen those two parties since they have a strong historical influence in Montréal. As we can see on the map, the western part of Montréal is clearly Liberal while the east is more aligned with the Bloc Québécois.
The most interesting clusters are on both sides of Mount Royal. One side, situated in Westmount, contributes noticeably to the PLC while the other side, in Outremont, donates more to the Bloc Québec. To anybody living in Montréal, it’s hardly a surprising fact, but I think it’s nice to see it on a map.
Less visually striking than my last project, this visualization shows the voting patterns of Canadian Members of Parliament. It uses a Principal Component Analysis (or PCA) transformation to convert the multidimensional voting record of each MP to a 2D (or Cartesian) form.
Each point on the chart represents an MP. The color of every MP follows their party affiliation. They are tightly clustered because of party discipline : in Canada, MPs normally vote in accordance to directions given by the Prime Minister.