Generate word cloud in R

- April 26, 2016

TEXT (.txt)

This working script was tested on R 3.2.5. Code is adapted from https://github.com/gimoya/theBioBucket-Archives/blob/master/R/txtmining_pdf.R. It reads a text file, processes it to remove unnecessary words and plots it.

Code:

library(tm)
library(wordcloud)
library(Rstem)

filetxt <- "C:\\Users\\310211146\\Documents\\Other\\May_Report.txt"

txt <- readLines(filetxt)
txt <- tolower(txt)
txt <- removeWords(txt, c("\\f", stopwords()))

corpus <- Corpus(VectorSource(txt))
corpus <- tm_map(corpus, removePunctuation)

tdm <- TermDocumentMatrix(corpus)

m <- as.matrix(tdm)

d <- data.frame(freq = sort(rowSums(m), decreasing = TRUE))
d$stem <- wordStem(row.names(d), language = "english")
d$word <- row.names(d)
d <- d[nchar(row.names(d)) < 20,]

agg_freq <- aggregate(freq ~ stem, data = d, sum)
agg_word <- aggregate(word ~ stem, data = d, function(x)
x[1])

d <- cbind(freq = agg_freq[, 2], agg_word)
d <- d[order(d$freq, decreasing = T),]

wordcloud(d$word, d$freq)

Output:

PDF (.pdf)

I have coded following script based on the reference mentioned above and other three references as well that are :

http://data.library.virginia.edu/reading-pdf-files-into-r-for-text-mining/

https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf

https://eight2late.wordpress.com/2015/05/27/a-gentle-introduction-to-text-mining-using-r/

Code:

library(tm)
library(wordcloud)
library(Rstem)
library(SnowballC)

Rpdf <- readPDF(control = list(text = "-layout"))

corpus <-
Corpus(
URISource("C:\\Users\\310211146\\Documents\\PDF\\May_Report.pdf"),
readerControl = list(reader = Rpdf)
)

corpus <- tm_map(corpus, stripWhitespace)
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, stemDocument)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)

tdm <- TermDocumentMatrix(corpus)
m <- as.matrix(tdm)
d <- data.frame(freq = sort(rowSums(m), decreasing = TRUE))

wordcloud(row.names(d), d$freq, colors = brewer.pal(7, "Dark2"))

Output:

This R script builds a word cloud from a text file using the tm and wordcloud packages. It cleans the text by removing common stop words and punctuation, then sizes each word by how often it appears.

Word clouds give a fast, visual sense of the dominant terms in a document. They are good for quick exploration, though a frequency table is clearer for rigorous analysis.

Search This Blog

Computer Tips - Programming, Cybersecurity and Tech Tutorials

Generate word cloud in R

Comments

Post a Comment

Popular posts from this blog

[Solved] Error: No such keg: /usr/local/Cellar/gcc

Blogger post automation via Python

[How To] Unfollow Non-followers on Instagram