polyglotstats

word frequency lists for different languages
Log | Files | Refs | README

README.md (611B)


      1 # PolyglotStats
      2 
      3 This repository collects lists of most common words for different foreign
      4 languages and the scripts to calculate them.
      5 
      6 
      7 ## Languages
      8 
      9 - Croatian
     10 
     11 
     12 ## Data Sources
     13 
     14 - Wikipedia
     15 
     16 
     17 ## Open Issues
     18 
     19 Some special characters are not correctly filtered out, some numbers are listed
     20 as words. I guess that this can be fixed given some time to debug.
     21 
     22 I guess that - depending on what you want to achieve with your language
     23 skills - different data sources will lead to vastly different lists of
     24 important words. An encyclopedia like Wikipedia contains very different words
     25 than for example song lyrics.