Fork me on GitHub

MARCspec - A common MARC record path language

Carsten Klee (ZDB)

2015-11-30 (version 0.14draftrev4)

Table of Contents

1 Introduction

Since it became a common task to map MARC data to arbitrary formats, these mappings are normally based on a set of definitions of MARC fields and subfields called MARC field specification or short MARC spec.

There are already implementations of MARC specs in tools like marcspec, catmandu, solrmarc, easyM2R etc.. Each of them using a different flavour of MARC spec. This document is an approach to normalize such field specifications.

The hereby described specification MARCspec can help to build reusable MARCspec parsers and validators and facilitates the exchange of mapping definitions.

1.1 Status of this document

The current version of this proposal is a preliminary draft for open discussion. Feedback is welcome!

1.2 Terminology

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

See also Definition of MARC related terms used in this spec.

1.3 What is a MARCspec?

Machine-Readable Cataloguing (MARC) is a document based key-value exchange format for bibliographic and other library related data. A MARC record consists of three main sections: the leader, the directory, and the variable fields with the data content. There are two kinds of (variable) fields: (variable) control fields and (variable) data fields. The term fixed field stands for fields whose length does not vary like the leader and some of the control fields. The field content in the the fixed fields can be accessed through its character position or range. Only data fields are divided into subfields. Subfields can also be contextualized through indicators. There is an indicator 1 and an indicator 2 for all data fields, both are optional.

A MARCspec is a reference to field data of a MARC record and is very much like XPath for XML. With MARCspec one can reference data on different levels of a MARC record defined through the fields, character positions, subfields and indicators.

The data of the MARC record being referenced may be represented through a set of data, having zero or more data elements. MARCspec does neither define the form of this referenced set of data, nor the encoding of the referenced data content.

2 Limitations of MARCspec as string

A MARCspec might not fulfil all requirements of definition for a reference to the desired set of data like XPath does for XML. This is because of the nearly unlimited number of options accessing data in a MARC record, especially when it comes to delimiters based on cataloging rules. Thus a MARCspec has to concentrate on the basic references and let all other data processing to subsequent data processing functions of tools having implemented MARCspec.

To enable support for other ISO 2709 applications MARCspecs syntax does not distinguish between types of fields like in the MARC record structure. A valid MARCspec might violate the MARC record structure. It is led to the MARCspec aware tools weather to check for MARC record structure violation or not.

2.1 Basic references

A MARCspec allows the following basic references:

References a MARCspec does allow are:

2.2 Form of MARCspec as string

This section is normative.

The Augmented BNF for Syntax Specifications: ABNF RFC 5234 is used to define the form of the MARCspec as string.

The whole ABNF for MARCspec shows as follows

alphaupper        = %x41-5A
                    ; A-Z
alphalower        = %x61-7A
                   ; a-z
positiveDigit     = %x31-39
                    ;  "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
positiveInteger   = "0" / positiveDigit [1*DIGIT]
indicator         = alphalower / DIGIT
indicator1        = indicator
indicator2        = indicator
indicators        = "_" (indicator1 / "_") [indicator2 / "_"]
fieldTag          = 3(alphalower / DIGIT / ".") / 3(alphaupper / DIGIT / ".")
position          = positiveInteger / "#"
range             = position "-" position
positionOrRange   = range / position
characterSpec     = "/" positionOrRange
subfieldChar      = %x21-3F / %x5B-7B / %x7D-7E
                    ; ! " # $ % & ' ( ) * + , - . / 0-9 : ; < = > ? [ \ ] ^ _ \` a-z { } ~
subfieldCode      = "$" subfieldChar
subfieldCodeRange = "$" ( (alphalower "-" alphalower) / (DIGIT "-" DIGIT) )
                    ; [a-z]-[a-z] / [0-9]-[0-9]
index             = "[" positionOrRange "]"
fixedField        = fieldTag [index] [characterSpec]
variableField     = fieldTag [index] [indicators]
fieldSpec         = fixedField / variableField
subfieldSpec      = (subfieldCode / subfieldCodeRange) [index] [characterSpec]
comparisonString  = "\" *VCHAR
operator          = "=" / "!=" / "~" / "!~" / "!" / "?"
                    ; equal / unequal / includes / not includes / not exists / exists
abrFieldSpec      = index [ (characterSpec / indicators) ] / characterSpec / indicators
abrSubfieldSpec   = index [characterSpec] / characterSpec
abbreviation      = abrFieldSpec / abrSubfieldSpec
subTerm           = fieldSpec / subfieldSpec / comparisonString / abbreviation
subTermSet        = [ [subTerm] operator ] subTerm
subSpec           = "{" subTermSet *( "|" subTermSet ) "}"
MARCspec          = (variableField *subSpec *(subfieldSpec *subSpec)) / fixedField *subSpec

2.2.1 General form

Every MARCspec consists of a fixed field spec or variable field spec. Variable fields followed optionally by one or more subfieldSpecs. Both fieldSpec and subfieldSpec can be contextualized through subSpecs (see section SubSpecs).

fieldSpec = fixedField / variableField
MARCspec  = (variableField *subSpec *(subfieldSpec *subSpec)) / fixedField *subSpec

2.2.2 Reference to field data

A fieldSpec is a reference to field data of a field. It consists of the three character field tag, followed optionally

The field tag may consist of ASCII numeric characters (decimal integers 0-9) and/or ASCII alphabetic characters (uppercase or lowercase, but not both) or the character .. The character . is interpreted as a wildcard. E.g. “3..” is then a reference to the data elements in all fields beginning with “3”.

The special field tag LDR is the field tag for the leader.

alphaupper    = %x41-5A ; A-Z
alphalower    = %x61-7A; a-z
fieldTag      = 3(alphalower / DIGIT / ".") / 3(alphaupper / DIGIT / ".")
fixedField    = fieldTag [index] characterSpec
variableField = fieldTag [index] indicators

A fieldSpec without an explicitly given index is always a reference to all repetitions of the referenced field(s) (see MARCspec interpretation for implicit rules). One MUST also conclute that a fieldSpec without an explicitly given index is an abbreviation of a reference with the starting index 0 and the ending index # (see Reference to repetitions examples).

Reference to field data of the leader.

LDR

Reference to all field data of fields having a field tag starting with 00.

00.

Reference to all field data of fields having a field tag starting with 7.

7..

Reference to data elements of all repetitions of the “100” field.

100

2.2.3 Reference to substring

A characterSpec is a reference to a character or a range of characters within a field or subfield. It consists of a position or range prefixed with the character /.

characterSpec = "/" positionOrRange

A position or range is either a postion or a range.

The postion is either a positive integer or the character # as a symbol for the last character of the referenced data content.

The range consists of two positions concatenated with the character -.

positiveDigit   = %x31-39
                    ;  "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
positiveInteger = "0" / positiveDigit [1*DIGIT]
position        = positiveInteger / "#"
range           = position "-" position
positionOrRange = range / position

Interpretation of a range differs through the position of the special position character # as a symbol for the last character of the referenced data content (see MARCspec interpretation for implicit rules).

Reference to substring of field data in the leader from character position 0 to character position 4 (5 characters).

LDR/0-4

Reference to data in the leader at character position 6 (1 character).

LDR/6

Reference to data in the control field 007 at character position 0 (1 character).

007/0

Reference to all data but the first character in the control field “007”.

007/1-#

Reference to the last character in the control field “007”.

007/#

Reference to the last two characters of the value of the subfield “a” of field “245”.

245$a/#-1

2.2.4 Reference to data content

The subfieldSpec is a reference to the data content (value) of a subfield. It either consists of a subfieldCode or a subfieldCodeRange followed optionally by an index and a characterSpec.

subfieldSpec = (subfieldCode / subfieldCodeRange) [index] [characterSpec]

A subfieldCode is a subfieldChar prefixed by the character $.

A subfieldCodeRange is prefixed by the character $ and restricted to either two alphabetic or two numeric characters both concatenated with the character -.

A subfieldChar is a lowercase alphabetic, a numeric character or a special character.

subfieldChar      = %x21-3F / %x5B-7B / %x7D-7E
subfieldCode      = "$" subfieldChar
subfieldCodeRange = "$" ( (%x61-7A "-" %x61-7A) / (%x30-39 "-" %x30-39) )

Reference to value of the subfield “a” of field “245”.

245$a

Reference to the value of the subfields “a”, “b” and “c” of field “245”.

245$a$b$c

Same as above, but with the use of a subfield code range.

245$a-c

Reference to the value of the subfields “_" and “$” of field “300”.

300$_$$

2.2.5 Reference to repetitions

For repeatable fields and subfields each repetition can be referenced by its index. An index is a position or range enclosed with the characters [ and ]. The first repetition of a field or a subfield is always referenced with the index [0]. The last repetition of a field or a subfield is referenced with the index [#].

index = "[" positionOrRange "]"

Reference to the first “300” field.

300[0]

Reference to the second of the “300” field.

300[1]

Reference to the first, second and third of the “300” field.

300[0-2]

Reference to all but the first of the “300” field.

300[1-#]

Reference to the last of the “300” field.

300[#]

Reference to the last two of the “300” field.

300[#-1]

Reference to value of the subfield “a” of the first “300” field.

300[0]$a

Reference to the value of the first subfield “a” of the field “300”

300$a[0]

Reference to the value of the last subfield “a” of the field “300”

300$a[#]

Reference to the value of the last two repetitions of subfield “a” of the field “300”

300$a[#-1]

2.2.6 Reference to contextualized data

2.2.6.1 Indicators

Indicators are prefixed by the character _. There are two indicators: indicator 1 and indicator 2. Both are optional and either represented through a lowercase alphabetic or a numeric character. If indicator 1 is not specified, it MUST be replaced by the character _. If indicator 2 is not specified it might be replaced by the character _ or left blank.

indicator  = alphalower / DIGIT
indicator1 = indicator
indicator2 = indicator
indicators = "_" (indicator1 / "_") [indicator2 / "_"]

Reference to data content in the subfield “a” within the context of indicator 1 with the value “1”.

245_1$a

or

245_1_$a

Reference to the value of the subfield “a” within the context of indicator 1 with the value “1” and indicator 2 with the value “0”.

245_10$a

Reference to the value of the subfield “a” within the context of indicator 2 with the value “0”.

245__0$a

Reference to value of the subfield “a” of the first three repetitions of field “307” within the context of indicator 1 with the value “8”. This will NOT reference the first three 307 fields that are in the context of indicator 1.

307[0-3]_8$a

2.2.6.2 SubSpecs

With a subSpec the preceding fieldSpec or subfieldSpec gets contextualized. Every subSpec MUST be validated either true or false. Is a subSpec true, the preceding spec is used to reference data. Is a subSpec false, the preceding spec doesn’t get used to reference data.

A subSpec is enclosed with the characters { and }. A subSpec consists of one or more sets of subTerms (the left hand subTerm and the right hand subTerm) and an operator. This combination of subTerms and an operator can be chained through the character | (OR) within a subSpec. Multiple subSpecs can also be repeated one after another (AND).

subTerm    = fieldSpec / subfieldSpec / comparisonString
subTermSet = [ [subTerm] operator ] subTerm
subSpec    = "{" subTermSet *( "|" subTermSet ) "}"

The operator is one of

  • = (as a symbol for “equal”),
  • != (as a symbol for “unequal”),
  • ~ (as a symbol for “includes”),
  • !~ (as a symbol for “not includes”)
  • ! (as a symbol for “not exists”) or
  • ? (as a symbol for “exists”).

    operator = “=” / “!=” / “~” / “!~” / “!” / “?”

A subTerm is one of

  • fieldSpec
  • subfieldSpec
  • comparisonString.

It is possible to abbreviate a contextualized fieldSpec by only using

  • index or
  • index and characterSpec or
  • index and indicators or
  • characterSpec or
  • indicators

    abrFieldSpec = index [ (characterSpec / indicators) ] / characterSpec / indicators

as a subTerm (see SubSpec abbreviation and [Abbreviation of fieldSpec or subfieldSpec] for examples).

It is possible to abbreviate a contextualized subfieldSpec by only using

  • index or
  • index and characterSpec or
  • characterSpec

    abrSubfieldSpec = index [characterSpec] / characterSpec

as a subTerm (see SubSpec abbreviation and [Abbreviation of fieldSpec or subfieldSpec] for examples).

By omitting the left hand subTerm, this implicitly makes the preceding spec outside the subfieldSpec the left hand subTerm (see MARCspec interpretation for implicit rules). For subSpecs with omitted left hand subTerm the operator can also be omitted. Omitting the operator this implies the use of the operator ? (exists).

A comparisonString can be every combination of ASCII characters prefixed by the \ character. For unambiguousness in a comparisonString the following characters MUST be escaped by the character \:

  • $
  • {
  • }
  • !
  • =
  • ~
  • ?
  • |

In a comparisonString a whitespace MUST be encoded as the character combination \s.

comparisonString = "\" *VCHAR

Checking dependencies via string comparison

If Leader/06 = t: Books

Reference to character with position “18” of field “008”, if character with position “06” in Leader equals “t”.

008/18{LDR/6=\t}

Checking dependencies via string comparison alternatives

If Field 007/00 = a and t

Reference to subfield “b” of field “245”, if character with position “0” of field 007 equals “a” OR “t”.

245$b{007/0=\a|007/0=\t}

Checking dependencies via string comparison chains

If Leader/06 = a and Leader/07 = a, c, d, or m: Books

Reference to character with position “18” of field “008”, if character with position “06” in Leader equals “a” AND character with position “07” in Leader equals “a”, “c”, “d” OR “m”.

008/18{LDR/6=\a}{LDR/7=\a|LDR/7=\c|LDR/7=\d|LDR/7=\m}

Checking dependencies via string comparison and content comparison Example data:

100 1#6880 − 01aZilbershtain, Yitshak ben David Yosef.
880 1#6100 − 01/(2/ra, יצחק יוסף בן דוד.

Reference data content of subfield “a” of field “880”, if data content of subfield “6” of field “100” includes the string “-01” (characters with index range 3-5 of field “800”) and the string “880”.

880$a{100_1$6~$6/3-5}{100_1$6~\880}

Checking existence of fields

Reference data content of subfield “c” of field “020”, if subfield “a” of field “020” exists.

020$c{$a}

Checking (non) existence of fields

Reference data content of subfield “z” of field “020”, if subfield “a” of field “020” does not exist.

020$z{!$a}

Abbreviation of fieldSpec or subfieldSpec

As of MARCspec interpretation a MARCspec without an explicitly given index is always an abbreviations of n references this example shows how these specs are interpreted.

Example Data:

020 ##$a0394170660$qRandom House$c$4.95
020 ##$a0491001304

Reference to data content of subfield “q” of field “020” if subfield “c” exists.

020$q{$c}

same as

020[0-#]$q[0-#]{$c[0-#]}

same as

020[0]$q[0]{?020[0]$c[0]} OR // true
020[1]$q[0]{?020[1]$c[0]} // false

Example Data:

020 ##$a0394170660$qRandom House$qpaperback$c$4.95
020 ##$a0394502884$qRandom House$qhardcover$c$12.50 

Reference to data content of subfield “c” if data content of one repetition of subfield “q” equals the comparison string “paperback”.

020$c{$q=\paperback}

same as

020[0-#]$c[0-#]{$q[0-#]=\paperback}

same as

020[0]$c[0]{020[0]$q[0]=\paperback} OR // false
020[0]$c[0]{020[0]$q[1]=\paperback} OR // true
020[1]$c[0]{020[1]$q[0]=\paperback} OR // false
020[1]$c[0]{020[1]$q[1]=\paperback}    // false

Reference to data of the first repetition of field “800”, if data content of subfield “a” within the context of indicator 2 is “1” of the preceding fieldSpec includes the comparisonString “Poe”.

800[0]{800[0]__1$a~\Poe}

An abbreviated subTerm like __1$a in

800[0]{__1$a~\Poe}

is invalid! An abbreviated subterm MUST only be one of fieldspec or subfieldspec.

Reference of data content of subfield “a” of field “245”, if last character of the preceding spec equals the comparisonString “/”.

245$a{/#=\/}

same as

245$a{245$a/#=\/}

2.3 MARCspec interpretation

Because of the limited expressivity of the MARCspec there must be some kind of implicit interpretation.

  1. A MARCspec without subfield codes or position or range is a reference to all data elements of the field.
  2. A fieldSpec or a subfieldSpec without an explicitly given index is always an abbreviation of a reference with the starting index 0 and the ending index # (see [Abbreviation of fieldSpec or subfieldSpec]).
  3. Omitted indicators in a MARCspec are interpreted as wildcards for variable field indicators in the MARC record.

2.3.1 Interpretation order

  1. For repeatable fields referenced by index and indicators the fields MUST first be referenced by index. Indicators work like a filter on the referenced fields as a second order.

2.3.2 Character position or range and field indizes interpretation

  1. The postion character # is always a reference to the last character in the data content.
  2. For character range, if the positive integer used for the character starting position is greater than the positive integer used for the character ending position, the current spec MUST NOT reference any data.
  3. For character range, if the character # is used for the character starting position, the character indices MUST be interpreted backwards (like character ending position 0 for the last character, 1 for the last but one character, 2 for the last but two characters etc.).
  4. These above rules also apply for indices (index).

2.3.3 SubSpec interpretation

  1. For chained subTermSets, if one subTermSet gets validated as true, the preceding spec gets referenced (OR) as long as all other repeated SubSpecs are validated as true.
  2. For repeated subSpecs, if one subSpec gets validated as false, the preceding spec doesn’t get referenced (AND).
  3. For abbreviated fieldSpec or subfieldSpec as a subTerm, the last explicitly given fieldTag is the current fieldTag.
  4. As a shortcut, the left hand subTerm might be omitted. This implicitly makes the last explicitly given fieldTag plus the last explicitly given characterSpec or subfieldCodeSpec the current (left hand) subTerm.
  5. If the left hand subTerm is omitted, as a shortcut for the operator ?, the operator can also be omitted.

2.3.4 SubSpec abbreviation

The following tableshows how SubSpec abbreviation MUST be interpreted.

corresponding spec type corresponding spec end with abbreviated spec begins with interpretation example
fieldSpec index index valid FieldSpec with index ...[2]{[1]} => ...[2]{...[1]}
fieldSpec index characterSpec valid fieldSpec with index and characterSpec ...[1]{/0-3} => ...[1]{...[1]/0-3}
fieldSpec index indicators valid fieldSpec with index and indicators ...[1]{_01} => ...[1]{...[1]_01}
fieldSpec characterSpec index valid fieldSpec with index .../0-7{[0]} => .../0-7{005[0]}
fieldSpec characterSpec characterSpec valid fieldSpec with characterSpec .../0-7{/0=\2} => .../0-7{.../0}
fieldSpec characterSpec indicators invalid fieldspec since characterSpec denotes a fixedField, which can’t be used with indicators .../0-7{_1} => .../0-7{.../0-7_1}
fieldSpec indicators index valid fieldSpec with index ...[1]_1{[0]} => ...[1]_1{...[0]}
..._1{[1]} => ..._1{...[1]}
...[1]_1{[0]_0} => ...[1]_1{...[1]_1}
fieldSpec indicators characterSpec invalid fieldspec since indicators denotes a variableField, which can’t be used with characterSpec 245_00{/0-2} => 245_00{245_00/0-2}
fieldSpec indicators indicators valid fieldSpec with indicators ..._1{_01} => ..._1{..._01}
subfieldSpec index index valid subfieldSpec with index ...$a[0]{[1]} => ...$a[0]{...$a[1]}
subfieldSpec index characterSpec valid subfieldSpec with index and characterSpec ...$a[0]{/0} => ...$a[0]{...$a[0]/0}
subfieldSpec characterSpec index valid subfieldSpec with index ...$a/1{[1]} => ...$a/1{...$a[1]}
...$a/1{[1]/1} => ...$a/1{...$a[1]/1}
subfieldSpec characterSpec characterSpec valid subfieldSpec with characterSpec ...$a/1{/0} => ...$a/1{...$a/0}

2.3.5 SubSpec validation

SubSpecs get validated by the following rules:

A subSpec is true, if

  • with the operator = one of the referenced values of the left hand subTerm is equal to one of the referenced values of the right hand subTerm.
  • with the operator != none of the referenced values of the left hand subTerm is equal to one of the referenced values of the right hand subTerm.
  • with the operator ~ one of the referenced values of the left hand subTerm includes one of the referenced values of the right hand subTerm.
  • with the operator !~ none of the referenced values of the left hand subTerm includes one of the referenced values of the right hand subTerm.
  • with the operator ? by the right hand subTerm referenced data exists.
  • with the operator ! by the right hand subTerm no referenced data exists.
  • one of the chained subTermSets is validated as true (OR) and all other repeated subSpecs are validated as true.
  • all of the repeated subSpecs are validated as true (AND).

A subSpec is false, if

  • the left hand subTerm does not reference any data (null).
  • with the operator = none of the referenced values of the left hand subTerm is equal to one of the referenced values of the right hand subTerm.
  • with the operator != one of the referenced values of the left hand subTerm is equal to one of the referenced values of the right hand subTerm.
  • with the operator ~ none of the referenced values of the left hand subTerm includes one of the referenced values of the right hand subTerm.
  • with the operator !~ one of the referenced values of the left hand subTerm includes one of the referenced values of the right hand subTerm.
  • with the operator ? by the right hand subTerm no referenced data exists (null).
  • with the operator ! by the right hand subTerm referenced data exists.
  • all of the chained subTermSets are validated as false (OR).
  • one of the repeated subSpecs is validated as false (AND).

2.3.6 SubTerm validation table

operator right is null left equals right right is subpart of left left is subpart of right other
= false true false false false
!= true false true true true
~ false true true false false
!~ true false false true true
? false true true true true
! true false false false false

4 References

4.1 Normative references

4.2 Informative references