kanji

Perform 漢字検定 (Japan Kanji Aptitude Test) level analysis on Japanese Kanji

https://github.com/fosskers/kanji

Version on this page:3.4.1
LTS Haskell 20.26:3.5.0
Stackage Nightly 2022-11-17:3.5.0
Latest on Hackage:3.5.0

See all snapshots kanji appears in

BSD-3-Clause licensed by Colin Woodbury
Maintained by [email protected]
This version can be pinned in stack with:kanji-3.4.1@sha256:d5752568c1e268b76d64933349a60c073ca8cdc46b1965fef8a5feb2657bbc58,1833

Module documentation for 3.4.1

Kanji

kanji is a Japanese Kanji library and analysation program written in Haskell. Its main function is to tell what Kanji belong to what Level of the Japanese National Kanji Examination (漢字検定).

kanji can be used to:

  • determine what Level individual Kanji belong to
  • determine the average Level (difficulty, in other words) of a group of Kanji
  • apply the above to whole files of Japanese

INSTALLING kanji

First, get the source files from:

https://github.com/fosskers/kanji

kanji is written in Haskell and uses the stack tool. Once stack is installed, move to the source directory and perform:

stack install

USAGE

Assuming you’ve made it so that you can run the executable, the following command-line options are available:

Usage: kanji [-d|--density] [-e|--elementary] [-l|--leveldist] [-s|--splits]
             ((-f|--file ARG) | JAPANESE)

Available options:
  -h,--help                Show this help text
  -d,--density             Find how much of the input is made of Kanji
  -e,--elementary          Find density of Kanji learnt in elementary school
  -l,--leveldist           Find the distribution of Kanji levels
  -s,--splits              Show which Level each Kanji belongs to
  -f,--file ARG            Take input from a file

NOTES ON CLOs

  • All options above can be mixed to include their analysis result in the output JSON.
  • -h will over-ride any other options or arguments, discarding them and printing a help message.

Examples

Single Kanji

$> kanji -s 日
{
    "levelSplit": {
        "Ten": [
            "日"
        ]
    }
}

A Japanese sentence

$> kanji -s これは日本語
{
    "levelSplit": {
        "Nine": ["語"],
        "Ten": ["本", "日"]
    }
}

All options

$> kanji -sled これは日本語。串と糞
{
    "levelSplit": {
        "Nine": ["語"],
        "Ten": ["本", "日"],
        "Unknown": ["糞"],
        "Two": ["串"]
    },
    "elementary": 0.6,
    "density": 0.5,
    "distributions": {
        "Nine": 0.2,
        "Ten": 0.4,
        "Unknown": 0.2,
        "Two": 0.2
    }
}

Changes

3.4.1

  • Drop support for GHC versions that haven’t implemented the Semigroup-Monoid Proposal.

3.4.0.2

  • Massaging bounds to support more GHC versions.

3.4.0

  • Removed adultDen and fixed the level range for highDen.

3.3.0

  • New Unknown level for Kanji above level Two
  • level is now a total function
  • Removed averageLevel, as it wasn’t correct nor useful
  • New densities function with associated type CharCat for getting more interesting character density information

3.2.1

  • ToJSON and FromJSON instances for Kanji derived via Generic instead of being hand-written.

3.2.0

  • averageLevel isn’t total, and so now returns in Maybe

3.1.0.1

  • Performance improvements

3.1.0

  • Overall simplification of library and reduction in boilerplate
    • AsKanji typeclass removed
    • Rank and Level are now just Level
    • More liberal use of Map and Set
  • nanq renamed to kanji and moved inside same project

2.0.0

  • New JSON-only output
  • Revamped, modernized backend