I often use Wikipedia for my work, it’s a data mine. There are a few times, a series of articles noted the decline of Wikipedia. The article by Tom Simonite in the MIT Technology Review is very good and the academic paper by A. Halfaker, R. Stuart Geiger, J. Morgan and J. Riedl serves as a reference.
But I found there was something missing: the crossing between the number of articles and the number of contributors, two major characteristics highlighted in the two previous references. So I made this little graph to view the importance of the current problem. Wikipedia is a great project but it is not immortal. It is necessary to resolve this issue.
A methodological extension to the main graph The Endless Decline of Wikipedia. The idea is to apply this method to all versions of Wikipedia and get an unique and fluid signature.
I wanted to start a series of notes on Batman v Superman, the two most popular superheroes. I also wanted to use data from the New York Times. The stunning visualizations of Jer Thorp appeared to me as obvious.
For a first draft, I programmed two visualizations in R: the number of articles (above) and the percentage of articles (below) published by The New York Times where “Batman” and “Superman” are present.
The visualization was performed in R with following libraries: scales for color transparency and plotrix for circular grid. I still have some improvements to integrate. The data comes from The New York Times Chronicle.
Small comment. Since 1938, Superman is present in 9,663 articles and Batman in 6,178 articles published by the New York Times. The first thirty years, Superman is the reference, the New York Times has little interest in Batman. In 1966, with the film and the TV serie, Batman reaches 322 articles. It does not exceed Superman with 499 articles, but it enters as a superhero of interest to the New York Times. Batman will be less present in the 1980s until the Tim Burton’s movie. Thereafter, the New York Times’ preference will alternate between the two superheroes. Over the past three years, Batman is the clear winner: 439 articles for Batman vs 316 articles for Superman in 2012.
The percentage of articles published by the New York Times where the words Batman and Superman are present articles. This visualization is complementary to the previous one.
Small comment. The main difference is a greater weight given to two superheroes in the period 1966-1980 compared to the last three decades
I like mobilize data from Google Ngram Viewer. So, as the New York Times, it might be good to look at the evolution of books and size over time. This has now been done. Graphics and smoothings are made in R.
Chronicle is a tool similar to Google Ngram Viewer but not for books, for news and particularly that of the New York Times. An interesting tool, updated daily (as I saw). Before I start using it, I wanted to know the number of articles published by day by the New York Times since 1851. Here the result.
New professional life, new projects, new tumblr. This is my stream of notes about culture (comic books, movies and TV series), media (old and new, social or not) and sport (mainly football) made using data, making statistic and designing visualisations. I’m Christophe Cariou, I live and work on the island of Nantes (France). I’m a kind of freelance researcher with a Ph.D. in Economics.
Have a fun day.
— This tumblr is developed for my tablet. The original theme is The Default Network by Mark Boyce, which mobilizes scripts Infinite Infinite Scroll by Paul irish and Fit Vids by Chris Coyier + Paravel. The font is Open Sans by Steve Matteson. The pictograms are Entypo by Daniel Bruce. And this tumblelog is powered on Tumblr by David Karp. Thanks.