Since it became a common task to map MARC data to arbitrary formats, these mappings are normally based on a set of definitions of MARC fields and subfields called MARC field specification or short MARC spec.
There are already implementations of MARC specs in tools like marcspec, catmandu, solrmarc, easyM2R etc.. Each of them using a different flavour of MARC spec. This document is an approach to normalize such field specifications.
The hereby described specification MARCspec can help to build reusable MARCspec parsers and validators and facilitates the exchange of mapping definitions.
The current version of this proposal is a preliminary draft for open discussion. Feedback is welcome!
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
See also Definition of MARC related terms used in this spec.
Machine-Readable Cataloguing (MARC) is a document based key-value exchange format for bibliographic and other library related data. A MARC record consists of three main sections: the leader, the directory, and the variable fields with the data content. There are two kinds of (variable) fields: (variable) control fields and (variable) data fields. The term fixed field stands for fields whose length does not vary like the leader and some of the control fields. The field content in the the fixed fields can be accessed through its character position or range. Only data fields are divided into subfields. Subfields can also be contextualized through indicators. There is an indicator 1 and an indicator 2 for all data fields, both are optional.
A MARCspec is a reference to field data of a MARC record and is very much like XPath for XML. With MARCspec one can reference data on different levels of a MARC record defined through the fields, character positions, subfields and indicators.
The data of the MARC record being referenced may be represented through a set of data, having zero or more data elements. MARCspec does neither define the form of this referenced set of data, nor the encoding of the referenced data content.
A MARCspec might not fulfil all requirements of definition for a reference to the desired set of data like XPath does for XML. This is because of the nearly unlimited number of options accessing data in a MARC record, especially when it comes to delimiters based on cataloging rules. Thus a MARCspec has to concentrate on the basic references and let all other data processing to subsequent data processing functions of tools having implemented MARCspec.
To enable support for other ISO 2709 applications MARCspecs syntax does not distinguish between types of fields like in the MARC record structure. A valid MARCspec might violate the MARC record structure. It is led to the MARCspec aware tools weather to check for MARC record structure violation or not.
A MARCspec allows the following basic references:
References a MARCspec does allow are:
This section is normative.
The Augmented BNF for Syntax Specifications: ABNF RFC 5234 is used to define the form of the MARCspec as string.
The whole ABNF for MARCspec shows as follows
alphaupper = %x41-5A
; A-Z
alphalower = %x61-7A
; a-z
positiveDigit = %x31-39
; "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
positiveInteger = "0" / positiveDigit [1*DIGIT]
indicator = alphalower / DIGIT
indicator1 = indicator
indicator2 = indicator
indicators = "_" (indicator1 / "_") [indicator2 / "_"]
fieldTag = 3(alphalower / DIGIT / ".") / 3(alphaupper / DIGIT / ".")
position = positiveInteger / "#"
range = position "-" position
positionOrRange = range / position
characterSpec = "/" positionOrRange
subfieldChar = %x21-3F / %x5B-7B / %x7D-7E
; ! " # $ % & ' ( ) * + , - . / 0-9 : ; < = > ? [ \ ] ^ _ \` a-z { } ~
subfieldCode = "$" subfieldChar
subfieldCodeRange = "$" ( (alphalower "-" alphalower) / (DIGIT "-" DIGIT) )
; [a-z]-[a-z] / [0-9]-[0-9]
index = "[" positionOrRange "]"
fixedField = fieldTag [index] [characterSpec]
variableField = fieldTag [index] [indicators]
fieldSpec = fixedField / variableField
subfieldSpec = (subfieldCode / subfieldCodeRange) [index] [characterSpec]
comparisonString = "\" *VCHAR
operator = "=" / "!=" / "~" / "!~" / "!" / "?"
; equal / unequal / includes / not includes / not exists / exists
abrFieldSpec = index [ (characterSpec / indicators) ] / characterSpec / indicators
abrSubfieldSpec = index [characterSpec] / characterSpec
abbreviation = abrFieldSpec / abrSubfieldSpec
subTerm = fieldSpec / subfieldSpec / comparisonString / abbreviation
subTermSet = [ [subTerm] operator ] subTerm
subSpec = "{" subTermSet *( "|" subTermSet ) "}"
MARCspec = (variableField *subSpec *(subfieldSpec *subSpec)) / fixedField *subSpec
Every MARCspec consists of a fixed field spec or variable field spec. Variable fields followed optionally by one or more subfieldSpecs. Both fieldSpec and subfieldSpec can be contextualized through subSpecs (see section SubSpecs).
fieldSpec = fixedField / variableField
MARCspec = (variableField *subSpec *(subfieldSpec *subSpec)) / fixedField *subSpec
A fieldSpec is a reference to field data of a field. It consists of the three character field tag, followed optionally
The field tag may consist of ASCII numeric characters (decimal integers 0-9) and/or ASCII alphabetic characters (uppercase or lowercase, but not both) or the character .
. The character .
is interpreted as a wildcard. E.g. “3..” is then a reference to the data elements in all fields beginning with “3”.
The special field tag LDR
is the field tag for the leader.
alphaupper = %x41-5A ; A-Z
alphalower = %x61-7A; a-z
fieldTag = 3(alphalower / DIGIT / ".") / 3(alphaupper / DIGIT / ".")
fixedField = fieldTag [index] characterSpec
variableField = fieldTag [index] indicators
A fieldSpec without an explicitly given index is always a reference to all repetitions of the referenced field(s) (see MARCspec interpretation for implicit rules). One MUST also conclute that a fieldSpec without an explicitly given index is an abbreviation of a reference with the starting index 0
and the ending index #
(see Reference to repetitions examples).
Reference to field data of the leader.
LDR
Reference to all field data of fields having a field tag starting with 00.
00.
Reference to all field data of fields having a field tag starting with 7.
7..
Reference to data elements of all repetitions of the “100” field.
100
A characterSpec is a reference to a character or a range of characters within a field or subfield. It consists of a position or range prefixed with the character /
.
characterSpec = "/" positionOrRange
A position or range is either a postion or a range.
The postion is either a positive integer or the character #
as a symbol for the last character of the referenced data content.
The range consists of two positions concatenated with the character -
.
positiveDigit = %x31-39
; "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
positiveInteger = "0" / positiveDigit [1*DIGIT]
position = positiveInteger / "#"
range = position "-" position
positionOrRange = range / position
Interpretation of a range differs through the position of the special position character #
as a symbol for the last character of the referenced data content (see MARCspec interpretation for implicit rules).
Reference to substring of field data in the leader from character position 0 to character position 4 (5 characters).
LDR/0-4
Reference to data in the leader at character position 6 (1 character).
LDR/6
Reference to data in the control field 007 at character position 0 (1 character).
007/0
Reference to all data but the first character in the control field “007”.
007/1-#
Reference to the last character in the control field “007”.
007/#
Reference to the last two characters of the value of the subfield “a” of field “245”.
245$a/#-1
The subfieldSpec is a reference to the data content (value) of a subfield. It either consists of a subfieldCode or a subfieldCodeRange followed optionally by an index and a characterSpec.
subfieldSpec = (subfieldCode / subfieldCodeRange) [index] [characterSpec]
A subfieldCode is a subfieldChar prefixed by the character $
.
A subfieldCodeRange is prefixed by the character $
and restricted to either two alphabetic or two numeric characters both concatenated with the character -
.
A subfieldChar is a lowercase alphabetic, a numeric character or a special character.
subfieldChar = %x21-3F / %x5B-7B / %x7D-7E
subfieldCode = "$" subfieldChar
subfieldCodeRange = "$" ( (%x61-7A "-" %x61-7A) / (%x30-39 "-" %x30-39) )
Reference to value of the subfield “a” of field “245”.
245$a
Reference to the value of the subfields “a”, “b” and “c” of field “245”.
245$a$b$c
Same as above, but with the use of a subfield code range.
245$a-c
Reference to the value of the subfields “_" and “$” of field “300”.
300$_$$
For repeatable fields and subfields each repetition can be referenced by its index. An index is a position or range enclosed with the characters [
and ]
. The first repetition of a field or a subfield is always referenced with the index [0]
. The last repetition of a field or a subfield is referenced with the index [#]
.
index = "[" positionOrRange "]"
Reference to the first “300” field.
300[0]
Reference to the second of the “300” field.
300[1]
Reference to the first, second and third of the “300” field.
300[0-2]
Reference to all but the first of the “300” field.
300[1-#]
Reference to the last of the “300” field.
300[#]
Reference to the last two of the “300” field.
300[#-1]
Reference to value of the subfield “a” of the first “300” field.
300[0]$a
Reference to the value of the first subfield “a” of the field “300”
300$a[0]
Reference to the value of the last subfield “a” of the field “300”
300$a[#]
Reference to the value of the last two repetitions of subfield “a” of the field “300”
300$a[#-1]
Indicators are prefixed by the character _
. There are two indicators: indicator 1 and indicator 2. Both are optional and either represented through a lowercase alphabetic or a numeric character. If indicator 1 is not specified, it MUST be replaced by the character _
. If indicator 2 is not specified it might be replaced by the character _
or left blank.
indicator = alphalower / DIGIT
indicator1 = indicator
indicator2 = indicator
indicators = "_" (indicator1 / "_") [indicator2 / "_"]
Reference to data content in the subfield “a” within the context of indicator 1 with the value “1”.
245_1$a
or
245_1_$a
Reference to the value of the subfield “a” within the context of indicator 1 with the value “1” and indicator 2 with the value “0”.
245_10$a
Reference to the value of the subfield “a” within the context of indicator 2 with the value “0”.
245__0$a
Reference to value of the subfield “a” of the first three repetitions of field “307” within the context of indicator 1 with the value “8”. This will NOT reference the first three 307 fields that are in the context of indicator 1.
307[0-3]_8$a
With a subSpec the preceding fieldSpec or subfieldSpec gets contextualized. Every subSpec MUST be validated either true or false. Is a subSpec true, the preceding spec is used to reference data. Is a subSpec false, the preceding spec doesn’t get used to reference data.
A subSpec is enclosed with the characters {
and }
. A subSpec consists of one or more sets of subTerms (the left hand subTerm and the right hand subTerm) and an operator. This combination of subTerms and an operator can be chained through the character |
(OR) within a subSpec. Multiple subSpecs can also be repeated one after another (AND).
subTerm = fieldSpec / subfieldSpec / comparisonString
subTermSet = [ [subTerm] operator ] subTerm
subSpec = "{" subTermSet *( "|" subTermSet ) "}"
The operator is one of
=
(as a symbol for “equal”),!=
(as a symbol for “unequal”),~
(as a symbol for “includes”),!~
(as a symbol for “not includes”)!
(as a symbol for “not exists”) or?
(as a symbol for “exists”).
operator = “=” / “!=” / “~” / “!~” / “!” / “?”
A subTerm is one of
It is possible to abbreviate a contextualized fieldSpec by only using
indicators
abrFieldSpec = index [ (characterSpec / indicators) ] / characterSpec / indicators
as a subTerm (see SubSpec abbreviation and [Abbreviation of fieldSpec or subfieldSpec] for examples).
It is possible to abbreviate a contextualized subfieldSpec by only using
characterSpec
abrSubfieldSpec = index [characterSpec] / characterSpec
as a subTerm (see SubSpec abbreviation and [Abbreviation of fieldSpec or subfieldSpec] for examples).
By omitting the left hand subTerm, this implicitly makes the preceding spec outside the subfieldSpec the left hand subTerm (see MARCspec interpretation for implicit rules). For subSpecs with omitted left hand subTerm the operator can also be omitted. Omitting the operator this implies the use of the operator ?
(exists).
A comparisonString can be every combination of ASCII characters prefixed by the \
character. For unambiguousness in a comparisonString the following characters MUST be escaped by the character \
:
$
{
}
!
=
~
?
|
In a comparisonString a whitespace MUST be encoded as the character combination \s
.
comparisonString = "\" *VCHAR
Checking dependencies via string comparison
If Leader/06 = t: Books
Reference to character with position “18” of field “008”, if character with position “06” in Leader equals “t”.
008/18{LDR/6=\t}
Checking dependencies via string comparison alternatives
If Field 007/00 = a and t
Reference to subfield “b” of field “245”, if character with position “0” of field 007 equals “a” OR “t”.
245$b{007/0=\a|007/0=\t}
Checking dependencies via string comparison chains
If Leader/06 = a and Leader/07 = a, c, d, or m: Books
Reference to character with position “18” of field “008”, if character with position “06” in Leader equals “a” AND character with position “07” in Leader equals “a”, “c”, “d” OR “m”.
008/18{LDR/6=\a}{LDR/7=\a|LDR/7=\c|LDR/7=\d|LDR/7=\m}
Checking dependencies via string comparison and content comparison Example data:
100 1#6880 − 01aZilbershtain, Yitshak ben David Yosef.
880 1#6100 − 01/(2/ra, יצחק יוסף בן דוד.
Reference data content of subfield “a” of field “880”, if data content of subfield “6” of field “100” includes the string “-01” (characters with index range 3-5 of field “800”) and the string “880”.
880$a{100_1$6~$6/3-5}{100_1$6~\880}
Checking existence of fields
Reference data content of subfield “c” of field “020”, if subfield “a” of field “020” exists.
020$c{$a}
Checking (non) existence of fields
Reference data content of subfield “z” of field “020”, if subfield “a” of field “020” does not exist.
020$z{!$a}
Abbreviation of fieldSpec or subfieldSpec
As of MARCspec interpretation a MARCspec without an explicitly given index is always an abbreviations of n references this example shows how these specs are interpreted.
Example Data:
020 ##$a0394170660$qRandom House$c$4.95
020 ##$a0491001304
Reference to data content of subfield “q” of field “020” if subfield “c” exists.
020$q{$c}
same as
020[0-#]$q[0-#]{$c[0-#]}
same as
020[0]$q[0]{?020[0]$c[0]} OR // true
020[1]$q[0]{?020[1]$c[0]} // false
Example Data:
020 ##$a0394170660$qRandom House$qpaperback$c$4.95
020 ##$a0394502884$qRandom House$qhardcover$c$12.50
Reference to data content of subfield “c” if data content of one repetition of subfield “q” equals the comparison string “paperback”.
020$c{$q=\paperback}
same as
020[0-#]$c[0-#]{$q[0-#]=\paperback}
same as
020[0]$c[0]{020[0]$q[0]=\paperback} OR // false
020[0]$c[0]{020[0]$q[1]=\paperback} OR // true
020[1]$c[0]{020[1]$q[0]=\paperback} OR // false
020[1]$c[0]{020[1]$q[1]=\paperback} // false
Reference to data of the first repetition of field “800”, if data content of subfield “a” within the context of indicator 2 is “1” of the preceding fieldSpec includes the comparisonString “Poe”.
800[0]{800[0]__1$a~\Poe}
An abbreviated subTerm like __1$a
in
800[0]{__1$a~\Poe}
is invalid! An abbreviated subterm MUST only be one of fieldspec or subfieldspec.
Reference of data content of subfield “a” of field “245”, if last character of the preceding spec equals the comparisonString “/”.
245$a{/#=\/}
same as
245$a{245$a/#=\/}
Because of the limited expressivity of the MARCspec there must be some kind of implicit interpretation.
0
and the ending index #
(see [Abbreviation of fieldSpec or subfieldSpec]).#
is always a reference to the last character in the data content.#
is used for the character starting position, the character indices MUST be interpreted backwards (like character ending position 0
for the last character, 1
for the last but one character, 2
for the last but two characters etc.).?
, the operator can also be omitted.The following tableshows how SubSpec abbreviation MUST be interpreted.
corresponding spec type | corresponding spec end with | abbreviated spec begins with | interpretation | example |
---|---|---|---|---|
fieldSpec | index | index | valid FieldSpec with index | ...[2]{[1]} => ...[2]{...[1]} |
fieldSpec | index | characterSpec | valid fieldSpec with index and characterSpec | ...[1]{/0-3} => ...[1]{...[1]/0-3} |
fieldSpec | index | indicators | valid fieldSpec with index and indicators | ...[1]{_01} => ...[1]{...[1]_01} |
fieldSpec | characterSpec | index | valid fieldSpec with index | .../0-7{[0]} => .../0-7{005[0]} |
fieldSpec | characterSpec | characterSpec | valid fieldSpec with characterSpec | .../0-7{/0=\2} => .../0-7{.../0} |
fieldSpec | characterSpec | indicators | invalid fieldspec since characterSpec denotes a fixedField, which can’t be used with indicators | .../0-7{_1} => .../0-7{.../0-7_1} |
fieldSpec | indicators | index | valid fieldSpec with index | ...[1]_1{[0]} => ...[1]_1{...[0]} ..._1{[1]} => ..._1{...[1]} ...[1]_1{[0]_0} => ...[1]_1{...[1]_1} |
fieldSpec | indicators | characterSpec | invalid fieldspec since indicators denotes a variableField, which can’t be used with characterSpec | 245_00{/0-2} => 245_00{245_00/0-2} |
fieldSpec | indicators | indicators | valid fieldSpec with indicators | ..._1{_01} => ..._1{..._01} |
subfieldSpec | index | index | valid subfieldSpec with index | ...$a[0]{[1]} => ...$a[0]{...$a[1]} |
subfieldSpec | index | characterSpec | valid subfieldSpec with index and characterSpec | ...$a[0]{/0} => ...$a[0]{...$a[0]/0} |
subfieldSpec | characterSpec | index | valid subfieldSpec with index | ...$a/1{[1]} => ...$a/1{...$a[1]} ...$a/1{[1]/1} => ...$a/1{...$a[1]/1} |
subfieldSpec | characterSpec | characterSpec | valid subfieldSpec with characterSpec | ...$a/1{/0} => ...$a/1{...$a/0} |
SubSpecs get validated by the following rules:
A subSpec is true, if
=
one of the referenced values of the left hand subTerm is equal to one of the referenced values of the right hand subTerm.!=
none of the referenced values of the left hand subTerm is equal to one of the referenced values of the right hand subTerm.~
one of the referenced values of the left hand subTerm includes one of the referenced values of the right hand subTerm.!~
none of the referenced values of the left hand subTerm includes one of the referenced values of the right hand subTerm.?
by the right hand subTerm referenced data exists.!
by the right hand subTerm no referenced data exists.A subSpec is false, if
=
none of the referenced values of the left hand subTerm is equal to one of the referenced values of the right hand subTerm.!=
one of the referenced values of the left hand subTerm is equal to one of the referenced values of the right hand subTerm.~
none of the referenced values of the left hand subTerm includes one of the referenced values of the right hand subTerm.!~
one of the referenced values of the left hand subTerm includes one of the referenced values of the right hand subTerm.?
by the right hand subTerm no referenced data exists (null).!
by the right hand subTerm referenced data exists.operator | right is null | left equals right | right is subpart of left | left is subpart of right | other |
---|---|---|---|---|---|
= | false | true | false | false | false |
!= | true | false | true | true | true |
~ | false | true | true | false | false |
!~ | true | false | false | true | true |
? | false | true | true | true | true |
! | true | false | false | false | false |