case-insensitive-match

A simplified, faster way to do case-insensitive matching. https://github.com/mikehat/case-insensitive-match

Latest on Hackage:0.1.1.0

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.

BSD3 licensed by Michael Hatfield

case-insensitive-match 0.1.1.0

Here is a simplified library for matching and comparing strings in a case-insensitive manner. The only dependencies are base, bytestring and text.

Usage is simple

-- normal string comparison
"href"    /=  "HREF"
"apples"  /=  "oranges"
"Smith"   <   "Jones

-- case-insensitive comparison
"href"    ^== "HREF"
"appples" ^/= "oranges"
"jones"   ^<  "Smith"

-- sorting some data structurue
get_names p = (last_name p,first_name p)
sortBy (caseInsensitiveComparing get_names) people

Benchmarks

The benchmarks are pretty comprehensive, offering comparisons with other algorithms, including the case-insensitive package and simple case folding using the base package. Before simply running the bench-others executable, check the source code or you'll end up with a long series of 360 benchmark tests. You'll want something like bench-others -m glob ByteString/Short/*/*, which runs only 36 benchmarks. The heirarchy is <data-type>/<string-length>/<match-type>/<algorithm>. As usual, performance comparisons depend heavily on use-cases, but for matching shorter strings that are often unequal this algorithm is clearly fastest.

There is also a real-world bench test that compares different algorithms while looking for links in an HTML file with Text.HTML.TagSoup. This bench involves a lot of work other than string comparison, so the differences between algorithms is slim, but usually measurable. Build an run:

$ cabal build bench-tagsoup
...

$ curl -s 'https://hackage.haskell.org/packages/names' > sample/hackage-names.html
$ dist/build/bench-tagsoup/bench-tagsoup < sample/hackage-names.html
...

Testing

It would be quite involved to build a perfectly comprehensive testing module, but the test-basics executable is tests multiple cases against all supported data types.

Sample

Here is a sample:

{-# LANGUAGE OverloadedStrings #-}

module Main ( main ) where

import           Data.List
import           Data.CaseInsensitive
import qualified Data.ByteString.Char8 as BS

main = do
    stdin <- BS.getContents
    let sorted_names = map join_name $ sortBy caseInsensitiveCompare $ map split_name $ BS.lines stdin
    mapM_ BS.putStrLn sorted_names


split_name name = (last,BS.drop 2 first)
    where (last,first) = BS.span (/= ',') name

join_name (last,first) = BS.concat [ last , ", " , first ]

Try it with:

$ cabal build readme-sample
...

$ dist/build/readme-sample/readme-sample < sample/declaration-signers.txt
comments powered byDisqus