Words used in Portuguese Wikipedia
This data-package contains a dataset with words used in a random sample from ~15.000 pages from the Portuguese Wikipedia.
It can be installed using:
devtools::install_github("dfalbel/ptwikiwords")After installing the package, you can load the dataset using:
library(ptwikiwords)
data(ptwikiwords)
head(ptwikiwords)
#> # A tibble: 6 × 3
#>    word  count check
#>   <chr>  <int> <lgl>
#> 1    de 210954  TRUE
#> 2     a 109652  TRUE
#> 3     e 100028  TRUE
#> 4     o  87839  TRUE
#> 5    em  67040  TRUE
#> 6    do  59489  TRUEThe dataset contains 3 columns:
Here is a wordcloud of those words:
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(wordcloud))
words_filter <- ptwikiwords %>%
  filter(check == T) %>%
  slice(1:300)
wordcloud(words_filter$word, words_filter$count)