Text Mining the CMG Archives

CMG is pleased to announce Mariano Maluf as Executive Conference Speaker
September 27, 2018
Experiences with Blockchain
October 9, 2018
CMG is pleased to announce Mariano Maluf as Executive Conference Speaker
September 27, 2018
Experiences with Blockchain
October 9, 2018

Text Mining the CMG Archives

By Richard Gimarc ([email protected])

Pictured below is a word cloud developed from the presentation titles from all CMG conferences dating from the first conference in 1976 through to the most recent in 2017.

High frequency (large) words are in the center of the cloud and low frequency (small) words are around the edges. To convince myself that this representation is correct, I plotted the frequency of the top 30 words in our collection of CMG presentation titles.

The frequency chart confirms what we are seeing in the word cloud:

  • Performance” is by far the most frequent word, that’s why it’s colored red and in the center of our cloud.
  • Working our way out from the center we have “system” and “capacity” as the second most frequent (green) words in the cloud.
  • And, finally, the set of six brown words follow in frequency order.

This blog describes what we can learn about CMG by “text mining” our archive of CMG conference presentation titles.

Scope and Focus

CMG has an extensive collection of 3,995 conference papers dating back to our first conference in 1976. The chart below shows how our archive has grown over the years.

Our text mining is restricted to examining the titles of presentations. A fuller examination of paper/presentation content out of scope.

Our goal is to see if we can answer the following three questions:

  • What can we learn by looking at the words used in CMG presentation titles?
  • How has our word choice changed over the years?
  • What does our word cloud tell us about the focus of CMG and its members?

Technique

The results presented in this blog were developed using R and related text mining packages such as tm and wordcloud. We used the following steps to prepare the presentation title text for analysis:

  1. Collect the set of all CMG presentation titles by year.
  2. Utilize standard text mining techniques to eliminate “stop words,” perform “stemming,” character case conversion, and removing extraneous punctuation and spacing. Below is a before/after example:

Before:Capacity Planning with Queueing Network Models: An IMS Case Study

After:    capacity plan queue network model ims case study

  1. Develop word frequency tables.
  2. Use the frequency tables to chart the word cloud.

Analysis – Word Choice

Our initial word cloud is based on the most frequent words used in all CMG presentation titles dating back to 1976. The following table shows the top 20 words and their frequency.

These words could easily be used to describe the range of job responsibilities for CMG members. It’s interesting that “performance” is twice as frequent as any other term. In fact, the word “performance” occurs in approximately one out of every three titles. Also, I would wager that most CMG members would use 3 to 4 of the top 20 words in their job description.

Analysis – Top 10 by Decade

Next, we look at the presentation titles by decade to see how things have changed over time. What similarities do we see over CMG’s 40+ years? Are there any changes in recent years?

The following table shows the most frequent 10 words by decade. Note the following:

  • Consistency6 of the top 10 are common across all five decades of CMG (shaded in green)
  • New focus plan” and “application” entered the Top 10 in the 80s and 90s, respectively
  • TutorialsCMG-T tutorial sessions (started in 2008) are now at #4 in our current Top 10

What have we learned?

The purpose of CMG is stated in our Bylaws (Article 2, #1)

Foster research and development, and the exchange and public dissemination of data pertaining to computer measurement, computer management, and computer performance evaluation, and underlying computer science.

It is encouraging to see quantitatively that the presentation titles at CMG conferences align with the organization’s stated purpose.

When you look at how CMG has changed over the years, starting with the mainframe, moving into client-server and distributed systems, the emergence of the web and today’s focus on cloud, apps, mobile and IoT, CMG has retained its focus: Evaluating and planning for the performance and capacity of today’s applications and environments.