Managing bibliographic citations

Citation manager

import { CitationDatabaseManager } from '@phfaist/zoodb/citationmanager';

Note: See also CitationCompiler() in the FLM-related modules.

class CitationDatabaseManager()

Manage a database of bibliographic references, as well as a collection of source objects that are capable of fetching bibliographic citation information from various sources.

The constructor should be given a collection of sources in the first argument (sources). The sources should be an object where the keys correspond to a cite_prefix and where the values are citation source instances (e.g., CitationSourceArxiv() or CitationSourceBibliographyFile()).

Possible options:

  • cache_fs - ‘fs’-compatible module object to provide filesystem access for accessing the cache.

  • cache_file - the filesystem path where we should store the citation information cache.

  • cache_entry_default_duration_ms - the default duration of time (in milliseconds) that citation information for an entry should be stored in cache before being re-queried again from the source.

  • default_use_user_agent, default_user_agent - specify defaults as to whether or not to set a custom user agent when source fetch remote content, and if so, then which user agent to specify.

CitationDatabaseManager.CitationDatabaseManager
CitationDatabaseManager.get_citation(cite_prefix, cite_key)

Get the citation information object associated with the given cite_prefix and cite_key.

This method is a shorthand for concatenating the citation prefix and key together with a colon and calling get_citation_by_id().

CitationDatabaseManager.get_citation_by_id(id)

Return the citation information associated with the given citation key/id with prefix. The id argument is a string of the form cite_prefix:cite_key.

This method will look up the given id in the citation database and will follow chained citations as necessary.

Returns the JSON/CSL object data associated with the given citation information.

An Error is thrown if the given id is not found in the current database.

CitationDatabaseManager.keys()

Return a list of all citation keys available. The return value is an array of strings of the form cite_prefix:cite_key.

CitationDatabaseManager.load_cache()

Load citation information from the cache file. Does nothing if the cache file does not exist. This method is automatically called by initialize().

CitationDatabaseManager.purge_expired()

Remove any citation information entries whose expiration time has past.

CitationDatabaseManager.retrieve_citations(citations)

Retrieve a number of citations from the respective sources.

This method handles dispatching the citation pairs to the correct sources and calling the relevant methods on the sources (add_retrieve(), run(), add_retrieve_done(), etc.).

This method returns a promise. Make sure you await this method if you want to make sure that the citation manager’s database is correctly populated.

  • citations is an array of objects of the form {cite_prefix:..., cite_key:...}.

CitationDatabaseManager.save_cache()

Save the current citation information database to the cache file.

CitationDatabaseManager.store_citation(cite_prefix, cite_key, entry_csl_json, options)

Store citation information for the associated citation prefix and citation key.

This method will update the citation database to associate with the citation prefix/key pair (cite_prefix, cite_key) the citation information provided in entry_csl_json. The citation information in entry_csl_json should be provided as JSON/CSL object data.

To store a “chained” citation, see store_citation_chained().

You may specify some options in the fourth argument:

  • cache_duration_ms - the number of milliseconds which this information may be stored in the citation cache. If it is resource-intensive to query this citation information, or if the information is not likely to change any time soon, consider setting a large value here. Do this especially if you might be worried about hitting rate limits of the API wherever you are fetching the citation information. On the other hand, you might set a shorter cache duration for information that is easily fetched or that might change in the near future.

CitationDatabaseManager.store_citation_chained(cite_prefix, cite_key, new_cite_prefix, new_cite_key, set_properties)

Store a ‘chained citation.’ The pair (cite_prefix, cite_key) is registered to refer to the same citation information as (new_cite_prefix, new_cite_key) with any properties given in set_properties additionally set.

An example of a chained citation would be an arXiv reference to a paper that is published with a DOI. The arxiv citation source object will query the citation ('arxiv', '1234.56789'); if the corresponding entry has a valid DOI, then a chained citation is registered to ('doi', '10.1234/abcdef') with set_properties set to { arxivid: 'arxiv:1234.56789' }. As a consequence, a citation to arXiv:1234.56789 will use the citation information that was retrieved from doi:10.1234/abcdef with the additional property arxivid set.

Citation Sources

import {
    CitationSourceArxiv
} from '@phfaist/zoodb/citationmanager/source/arxiv.js';
import {
    CitationSourceDoi
} from '@phfaist/zoodb/citationmanager/source/doi.js';
import {
    CitationSourceManual
} from '@phfaist/zoodb/citationmanager/source/manual.js';
import {
    CitationSourceBibliographyFile
} from '@phfaist/zoodb/citationmanager/source/bibliographyfile.js';
class CitationSourceArxiv()

Fetch bibliographic citation information from the arXiv.

Options:

  • chain_to_doi - If true, then arXiv identifiers that refer to papers which have a DOI, i.e., which have been published in some publication venue, will be “chained” to the corresponding DOI citation. See “chained citations” in the citation manager object. The chained citation will have the cite_prefix set to ‘doi’.

  • override_arxiv_dois, override_arxiv_dois_file - Manually specify a list of DOIs that should be associated with certain arXiv IDs. The override_arxiv_dois is an object whose keys are arXiv IDs and whose values are the corresponding DOI. For a given arXiv identifier, if it is found in this object, then the DOI specified here overrides the DOI value that was fetched from the arXiv’s API. Instead of specifying override_arxiv_dois`, you may set override_arxiv_dois_file to a local path or URL of a JSON or YAML file that contains the mapping of arXiv identifiers to a DOI.

  • See CitationSourceBase() for further options.

CitationSourceArxiv.CitationSourceArxiv
class CitationSourceDoi()

Fetch bibliographic citation information from a DOI (cf. https://doi.org/).

See CitationSourceBase() for options.

CitationSourceDoi.CitationSourceDoi
class CitationSourceManual()

A citation “source” which interprets the “ID” directly as the FLM text content of the citation.

See CitationSourceBase() for options.

CitationSourceManual.CitationSourceManual
class CitationSourceBibliographyFile()

A citation source which looks up bibliographic citaiton information in one or several bibliography files, in JSON-CSL citation format. The files themselves can be JSON or YAML data files.

Options:

  • bibliography_files - A list of JSON/CSL (or YAML/CSL) files in which to look for citations.

  • See CitationSourceBase() for further options.

CitationSourceBibliographyFile.CitationSourceBibliographyFile

Citation Source Base Class

// base class, e.g. to write your own citation source implementation
import {
    CitationSourceBase
} from '@phfaist/zoodb/citationmanager/source/base.js';
class CitationSourceBase(override_options, options, default_options)

Base class for a citation source, i.e., an engine that is able to obtain bibliographic citation information based on a citation key. (E.g., the CitationSourceArxiv() queries information from the arXiv to obtain citation information for a given arXiv identifier.)

This class is meant to be subclassed to implement the relevant functions to fetch bibliographic citation information.

Subclasses should reimplement run_retrieve_chunk() to actually fetch bibliographic information from the relevant source. See documentation for that method for more information.

Furthermore, subclasses may reimplement source_initialize_run() and source_finalize_run() to run additional initial and final steps. These callbacks will be invoked at the beginning and at the end of a call to run().

Important options recognized by this base class are the following:

  • source_name - A descriptive name for this citation source. Should normally be set by the subclass in the override options. Mostly for use in debug messages.

  • fsRootFilePath - A filesystem path that should provide a reference root path for any filesystem access.

  • chunk_size, chunk_retrieve_delay_ms - We’ll split the citations to retrieve into chunks, each of size at most chunk_size. We’ll then make sure that the subclass’ run_retrieve_chunk() is called once per chunk. After each chunk is retrieved, we make sure that at least chunk_retrieve_delay_ms milliseconds have passed between two chunk retrievals before calling run_trieve_chunk() again.

  • cite_prefix - The citation prefix we have been associated with. An option value provided here will be overridden by set_citation_manager(). Mainly used for tracing errors/debugging.

  • chains_to_sources - An array of source names to which this source might “chain” to. For instance, the ‘arxiv’ source chains to the ‘doi’ source, because querying the bibliographic information for an arxiv identifier that is published will cause a further DOI lookup to get the information of the corresponding published article.

  • waiting_poll_timeout_ms - When we queried all the IDs but add_retrieve_done() was not yet called, we wait this amount of time (in milliseconds) before checking if we have new IDs to retrieve or if we’re done.

  • cache_store_options - Any options to use when calling the manager object’s store_citation() method, for instance, { cache_duration_ms: ... }. Subclasses should remember to pass this option on in calls to await this.citation_manager.store_citation(..., this.cache_store_options).

  • use_user_agent, user_agent - Specify a custom user agent when fetching remote content with fetch_url(). If use_user_agent is false, no custom user agent is set. If it is true, then the user agent user_agent is used. If either use_user_agent or user_agent are null/undefined, then the citation manager’s default_use_user_agent and default_user_agent are used.

Subclasses should call the superclass constructor with the following arguments to set the options correctly. Options are merged recursively using lodash/merge.

Arguments:
  • override_options (Object) – Any options that should be set to the given values, regardless of any user options.

  • options (Object) – The options provided by the user.

  • default_options (Object) – Any option defaults that should be set if the user didn’t provide any value.

CitationSourceBase.CitationSourceBase

Subclasses should call the superclass constructor with the following arguments to set the options correctly. Options are merged recursively using lodash/merge.

CitationSourceBase.add_retrieve(ids)

Add a list of citation IDs to the list of citation keys to retrieve. The IDs do NOT include the citation prefix. This method may be called multiple times, as we become aware of more citation keys to retrieve in this source.

CitationSourceBase.add_retrieve_done()

Indicate that no further IDs to look up will be provided. Once all requested citation IDs are retrieved, the run() function may terminate without waiting for any further IDs to be given to add_retrieve().

CitationSourceBase.fetch_url(url, fetch_options)

Utility method to fetch data from a remote source. Returns a promise that resolves to the requested data.

The url may be any URL or even simply a local path. If a local path is specified, it is interpreted as a path that is relative to the fsRootFilePath folder set in the class options.

Note

Make sure you use await on the return value of this function!

If the server responds by a code different than 200, then the error is displayed in console.error and an error is thrown. Remote calls are perfurmed using the fetch() method which is either directly provided by the browser or pulled in via the node-fetch npm package. The content of the remote resource is returned as a string.

The fetch_options are options that are directly specified to fetch(). (For instance: { method: 'post', body: `max_results=${id_list.length}&id_list=${id_list.join(',')}`, headers: ... }.)

If get_response_object is provided as one of the fetch_options, then this option is intercepted by this function and not provided to fetch(). If this option is set to true (default false), then the raw response object returned by await fetch() is returned directly, without attempting to read any data from the response. The response object is returned without any further processing, without even consulting the response status code. It is up to the caller to handle and report any errors.

CitationSourceBase.run()

Run this citation source. Any citation keys (IDs) to be retrieved set by add_retrieve() will be retrieved. Furthermore, this async function will regularly pause and check whether new IDs were set to be retrieved (by possible calls to add_retrieve()), until the add_retrieve_done() function was called. Then the async method terminates.

The callback method source_initialize_run() is called at the beginning of this method, and the source_finalize_run() is called at the end.

Keys to be retrieved are organized in chunks of relevant size (see options in the class documentation); each chunk causes a call to run_retrieve_chunk(). The latter method should be reimplemented by the subclass to actually perform the retrieval operation.

The subclass callbacks source_initialize_run(), run_retrieve_chunk(), and source_finalize_run() are all awaited, so they can be defined as async/they can return promises.

CitationSourceBase.run_retrieve_chunk()

Reimplement to actually retrieve a chunk of citation keys/IDs.

Consider using the utility method fetch_url() if you need to read data from a local file path or from a remote URL.

Once the citation information for a citation key/ID is retrieved, then we need to call await this.citation_manager.store_citation(...). See the corresponding documentation for CitationManager(). Use the cite prefix stored in this.cite_prefix. Don’t forget to pass on the options stored in this.cache_store_options!

CitationSourceBase.set_citation_manager(citation_manager, cite_prefix)

The citation manager object will call this method to identify itself. The source stores a reference to the citation manager and registers the citation prefix (cite_prefix) to be associated with.

CitationSourceBase.source_finalize_run()

Reimplement to perform any required processing at the end of the source run. The default implementation does nothing.

CitationSourceBase.source_initialize_run()

Reimplement to perform any required processing at the beginning of the source run. The default implementation does nothing.