Generates citations and bibliography from CSL styles.
|Version on this page:||0.4.0.1|
|LTS Haskell 20.23:||0.8.1|
|Stackage Nightly 2023-06-03:||0.8.1|
|Latest on Hackage:||0.8.1|
Module documentation for 0.4.0.1
This library generates citations and bibliography formatted according to a CSL style. Currently version 1.0.2 of the CSL spec is targeted.
This library is a successor to pandoc-citeproc, which was a fork of Andrea Rossato’s citeproc-hs. I always found it difficult to fix bugs in pandoc-citeproc and decided that implementing citeproc from scratch would give me a better basis for understanding. This library has a number of other advantages over pandoc-citeproc:
it is much faster (as a rough benchmark, running the CSL test suite takes less than 4 seconds with this library, compared to 12 seconds with pandoc-citeproc)
it interprets CSL more faithfully, passing more of the CSL tests
it has fewer dependencies (in particular, it does not depend on pandoc)
it is more flexible, not being tied to pandoc’s types.
Unlike pandoc-citeproc, this library does not provide an executable. It will be used in pandoc itself to provide integrated citation support and bibliography format conversion (so the pandoc-citeproc filter will no longer be necessary).
How to use it
The main point of entry is the function
citeproc from the
Citeproc. This takes as arguments:
CiteprocOptionsstructure (which currently just allows you to set whether citations are hyperlinked to the bibliography)
Style, which you will want to produce by parsing a CSL style file using
Lang, which allows you to override a default locale,
a list of
References, which you can produce from a CSL JSON bibliography using aeson’s
a list of
Citations (each of which may have multiple
It yields a
Result, which includes a list of formatted
citations and a formatted bibliography, as well any warnings
produced in evaluating the style.
The types are parameterized on a
which represents formatted content in your bibliographic
fields (e.g. the title). If you want a classic CSL processor,
you can use
CslJson Text. But you can also use another type,
such as a pandoc
Inlines. All you need to do is define
an instance of
CiteprocOutput for your type.
The signature of
parseStyle may not be self-evident:
the first argument is a function that takes a URL and
retrieves the text from that URL. This is used to fetch
the “indendent parent” of a dependent style. You can supply
whatever function you like: it can search your local file
system or fetch the content via HTTP. If you’re not using
dependent styles, you can get by with
\_ -> return mempty.
The citeproc executable
If the package is compiled with the
executable flag, an
citeproc will be built.
Inputs object from
stdin (or from
a file if a filename is provided) and writes
Result object to
stdout. This executable
can be used to add citation processing to non-Haskell projects.
citeproc --help will summarize usage information. See
the man page for more information.
Known bugs and limitations
Although this library is much more accurate in implementing the CSL spec than pandoc-citeproc was, it still fails some of the tests from the CSL test suite (67/862). However, most of the failures are on minor corner cases, and in many cases the expected behavior goes beyond what is required by the CSL spec. (For example, we intentionally refrain from capitalizing terms in initial position in note styles. It makes more sense for the calling program, e.g. pandoc, to do the capitalization when it puts the citations in notes, since some citations in note styles may already be in notes and in this case their rendering may not require capitalization. It is easy to capitalize reliably, hard to uncapitalize reliably.)
- Fix bug introduced by the fix to #61 (#74). In certain circumstances, we could get doubled “et al.”.
- Depend on unicode-collation unconditionally (#71). It is necessary even when text-icu is used, because of Text.Collate.Lang.
- Rename tests in extra/ so they fall into categories.
- We now use Lang from unicode-collation rather than defining our own. The type constructor has changed, as has the signature of parseLang.
- Use unicode-collation by default for more accurate sorting.
- text-icu will still be used if the icu flag is set. This may give better performance, at the cost of depending on a large C library.
- Change type of SortKeyValue so it doesn’t embed Lang. [API change] Instead, we now store a language-specific collator in the Eval Context.
- Move compSortKeyValues from Types to Eval.
- Add curly open quote to word splitters in normalizeSortKey.
- Improve date sorting: use the format YYYY0000 if no month, day, and YYYYMM00 if no day when generating sort keys.
- Special treatment of literal “others” as last name in a list (#61). When we convert bibtex/biblatex bibliographies, the form “and others” yields a last name with nameLiteral = “others”. We detect this and generate a localized “and others” (et al).
- Make abbreviations case-insensitive (#45).
- In parsing abbreviations JSON, ignore top-level fields besides “default” (#57), e.g. “info” which is used in Zotero’s default abbreviations file.
- Remove check for ASCII in case transform code. Previously we weren’t doing case transform on words containing non-ASCII characters.
- Fix infinite loop in
fixPunct(#49). In a few rare cases
- Add a space between “no date” term and disambiguator if the long form is used (#47).
- Improve disambiguation code. Add type signatures, move some functions to the top-level, and make the logic clearer and more efficient.
- Re-render after each stage of ambiguity resolution instead of relying on analysis of names and dates. This is necessary especially for styles like chicago-note-bibliography which use titles in citations. Closes #44. No measurable performance impact.
- Update test suite from upstream.
- Fix author-only citations (#43). We got bad results with some styles when a reference had both an author and a translator.
- Don’t use cite-group delimiter if ANY citation in group has
locator (#38). This seems to be citeproc.js’s behavior and it gives
better results for chicago-author-date: we want both
[@foo20; @foo21, p. 3]and
[@foo20, p. 3; @foo21]to produce a semicolon separator, rather than a comma.
- Better handle
initialize-withthat ends in a nonbreaking space. In this case, citeproc should not add an additional space or strip the nonbreaking space. Closes #37.
makeReferenceMapto return a cleaned-up list of references as well as a reference map. The cleanup-up list removes references with duplicate ids. When there are multiple references with the same id, the last one is included and the others discarded. [API change]
- FromJSON for Name: make straight quotes curly. Otherwise nothing will do this, when we are decoding JSON to (Reference a), a /= CslJson Text.
- Remove redundant pragmas and imports (Albert Krewinkel).
- Use custom prelude with GHC 8.6.* and older (Albert Krewinkel). This adds support for GHC 8.0.x.
CaseTransformState[API change]. This gave bad results with things like parentheses (#27).
Maybe Lang[API change]. This allows us to do locale-sensitive sorting (though this won’t matter much unless the
icuflag is used).
Maybe Langparameter on
initialize(since capitalization can be locale-dependent).
- Add cabal.project.icu for building with icu lib.
- Add (unexported) Citeproc.Unicode compatibility module.
This allows us to use the same functions whether or not
icuflag is used.
- Pay attention to citationNoteNumber in computing position. In calculating whether an item is alone in its citation, we need to take into account citationNoteNumber, since two citations may occur in the same note and they should not be ranked “alone.” See jgm/pandoc#6813, citation-style-language/documentation#121
- Ensure that uncited references are sorted last when it comes to assigning citation numbers (#22).
- Remove “capitalize initial term” feature. This is required by the test suite but not the spec. It makes more sense for us to do this capitalization in the calling program, e.g. pandoc. For some citations in note styles may already be in notes and thus not trigger separate footnotes. If initial terms had been capitalized, we’d need to uncapitalize, and that is hard to do reliably.
- Treat empty
FancyValas an empty value.
- Derive Functor, Traversable, Foldable for Result [API change].
- Better handling of author-only/suppress-author. Previously all results of “names” elements were treated as authors. But only the first should be (generally this is the author, but it could be the editor of an edited volume with no author). See jgm/pandoc#6765.
- Don’t enclose contents of e:choose in a Formatted element (#19). The e:choose element is “transparent” and the delimiter controlling its formatting should be inserted between the items it returns.
Fix sorting when no
<sorting>element given. The spec says: “In the absence of cs:sort, cites and bibliographic entries appear in the order in which they are cited.” This affects IEEE in particular. See jgm/pandoc#6741.
sameNamesand citation grouping. Preivously if a citation item had a prefix, it would not be grouped with following citations. See jgm/pandoc#6722 for discussion.
Remove unneeded import
citeprocexecutable: strip BOM before parsing style (#18).
- Initial release.