Home‎ > ‎Technical Tips‎ > ‎

Viewing literature Corpus in R

posted Nov 18, 2015, 2:37 AM by SRamachandran Igib   [ updated Nov 18, 2015, 2:40 AM ]
Suppose we wish to ask the question - Get me a visual summary of the published papers in my area last week.
Here is one can go about it:

#Using pubcrawler get the abstracts for teh papers published last week. Save in text mode from the pubmed site. You may have to do it multiple times if many papers were published. Then use the following code.
recent_500 = readabs("recent_500.txt") #abstracts file name recent_500.txt
recent_500_words = word_atomizations(recent_500)
load("~/Desktop/Diab_research/meshterms_all.RData") #the complete set of MeSH terms are available from the NCBI through an user agreement
new_recent_500_words_MESH = intersect(as.character(new_recent_500_words$words),tolower(meshterms_all)) #We take out all the MeSH terms in these abstracts in lower case.
new_recent_500_words_MESH_indices = NULL; for(i in 1:1612){ new_recent_500_words_MESH_indices = c(new_recent_500_words_MESH_indices,which(as.character(new_recent_500_words$words) == new_recent_500_words_MESH[i]))  } #Get the indices of the MeSH terms from the recent_500_words word frequency table.
new_recent_500_words_MESH_table = new_recent_500_words[new_recent_500_words_MESH_indices,] #Generate a subtable
original_words = NULL;for (i in 1:1612){original_words = c(original_words,unlist(get_original_term2(new_recent_500_words_MESH_table$words[i],recent_500))[1])} #Get the original words as written in the papers
wordcloud(original_words,new_recent_500_words_MESH_table$Freq)#Generate wordcloud with default values of arguments
#Here is the result with diabetes. You can generate your own wordclouds for your own studies.