On March 2014, Neil deGrasse Tyson hosted the premiere of Cosmos: A Spacetime Oddysey to bring back one of the most influential scientific TV series of all time. Neal deGrasse Tyson emulated Carl Sagan’s Cosmos: A Personal Voyage. The show is not a sequel, but an up-to-date version of what Carl Sagan started back in the 80’s.

For those who watched both TV shows, the differences are obvious. But, what does the script “data” tell us about Carl Sagan vs Neil deGrasse Tyson’s shows? We will explore both TV shows and let the results speak for themselves.

What in specific will be analyzed?

Both shows scripts will be analyzed to find the most frequent words, word usage comparison, key word differences and anything else that will be interesting from the results.

How was the “data” processed?

A computer program was written to web scrape both TV shows from another website. After web extracting the texts into documents, these documents were processed to extract all the words and exclude the ones that did not matter (i.e stop words). Also, most of the significant plural words were converted into Singular form in order to get a better word count and analysis of meaningful data (i.e. galaxies to galaxy).

A technology called Apache Pig was used to process, manipulate and count the words used in the Cosmos series.

Comparing Neil deGrasse Tyson vs Carl Sagan

During the analysis, we found that Neil degrasse Tyson had approximately 5,935 unique words with an approximate 59,846 total words. Car Sagan, on the other hand, had 7,121 unique words with an approximate of 77082 total words.

Carl Sagan had about 17,000 more words used during his show even though Both Cosmos’s series had 13 episodes each.

Assumptions before looking at the data

After watching Cosmos: A spacetime oddyssey, it was clear that Neil deGrasse Tyson added a more personal approach into the scientific evidence that was presented in every show. So the main focus is to recognize anything that stands from word count analysis that was not obvious from watching the shows.

Here are the results of some basic data analysis:

Sigle word comparison

Comparison of both TV shows by the the frequency of each word.

Word Frequency


  • As expected, they both had very similar topics related to science. words such as stars, world, planet, and so on. Also, they had words like time, million, and billion in which measures or talks about time or distance.
  • The word “energy” appears 95 times in the top 15 words in Neil deGrasse Tyson’s list. In contrast, Carl Sagan only mentions “energy” a total of 35 times (160th).
  • The words “cosmos” (#10) and “human” (#12) are the only 2 words that appear in the Top 15  in Carl Sagan’s list that do not appear in Tyson’s. However, those words are in the 20th and 23rd  place respectively on Tyson’s most frequently used words.

Word Usage Comparison (Inclusive)

Most statistical significant words used that appears at least 1 time in both shows.

***Most significant words go from left to right in the charts.


  • Carl Sagan devotes segments about Kepler, Ptolemy and Eratosthenes during Cosmos: A Personal Voyage. That’s the reason for the high frequency in those names.
  • The word “whale” appears 46 times mainly because Carl Sagan actually spent a segment of Episode 11 “The Persistence of Memory” talking about them.
  • For some reason Carl Sagan uses the word “information” a lot. Tyson uses a mix of words to describe the same thing.
  • Neil deGrasse Tyson talks more about oil and coal in efforts to bring the issue about natural resources and the cause and effect on climate change.
  • NDT talks about “neutrinos” 21 times compared to 2 times from Sagan. During Sagan’s show, he talks about this “brand new field” called Neutrino Astronomy. This is probably the reason why NDT talks about it in more depth. Mainly because of science’s advances in the last 35 years.
  •  NDT spends a whole episode about female scientist and their contributions called “Sisters of the Sun” which explains why he mentions Cecilia Payne more often than Carl Sagan did.

Word Usage Comparison (Exclusive)

Most statistical significant words used by Sagan and Tyson that do not appear in each other’s show.

***Most significant words go from top to bottom in the charts.


  • Carl Sagan and Neil deGrasse Tyson devoted more time on specific scientists during their shows. Not sure why, but maybe it is a personal opinion on who they think contributed more to science.
  • Carl spent a segment talking about Holland (Netherlands). He was delighted about their “enlightenment” and openness to new ideas.
  • Tyson uses the chemical formula name for carbon dioxide (CO2) because he implies the general population is more familiar with that term.

Word Sequence Comparison (Inclusive)

The most statistical significant group of 2,3, or 4-word sequence that appear on both shows.

***Most significant words go from top to bottom in the charts.


  • “Nuclear weapons”  appear in Sagan’s speech. We can assume the reason was because of the Cold War Era.
  • Neil deGrasse Tyson mentioned “age of the earth” a total of 14 times. He was probably trying to make it clear for the crowd that believes the earth is 6000 years old *wink*
  • Neil deGrasse Tyson mentions the name “Carl Sagan” a total of 13 times . Tyson mentioned that Sagan was an inspiration to him and a great contributor to modern science.
  • Tyson’s list appears “pattern recognition” to describe the brain’s ability to recognize such and translate it into advance algorithms and computer models.

Interesting word findings (exclusive)


