Parser submodule

class marcxml_parser.parser.MARCXMLParser(xml=None, resort=True)[source]

Bases: object

This class parses everything between <root> elements. It checks, if there is root element, so please, give it full XML.

controlfields is simple dictionary, where keys are field identificators (string, 3 chars). Value is always string.

datafields is little more complicated; it is dictionary made of arrays of dictionaries, which consists of arrays of MARCSubrecord objects and two special parameters.

It sounds horrible, but it is not that hard to understand:

.datafields = {
    "011": ["ind1": " ", "ind2": " "]  # array of 0 or more dicts
    "012": [
        {
            "a": ["a) subsection value"],
            "b": ["b) subsection value"],
            "ind1": " ",
            "ind2": " "
        },
        {
            "a": [
                "multiple values in a) subsections are possible!",
                "another value in a) subsection"
            ],
            "c": [
                "subsection identificator is always one character long"
            ],
            "ind1": " ",
            "ind2": " "
        }
    ]
}
leader

string – Leader of MARC XML document.

oai_marc

bool – True/False, depending if doc is OAI doc or not

controlfields

dict – Controlfields stored in dict.

datafields

dict of arrays of dict of arrays of strings – Datafileds stored in nested dicts/arrays.

Constructor.

Parameters:
  • xml (str/file, default None) – XML to be parsed. May be file-like object.
  • resort (bool, default True) – Sort the output alphabetically?
add_ctl_field(name, value)[source]

Add new control field value with under name into control field dictionary controlfields.

add_data_field(name, i1, i2, subfields_dict)[source]

Add new datafield into datafields and take care of OAI MARC differencies.

Parameters:
  • name (str) – Name of datafield.
  • i1 (char) – Value of i1/ind1 parameter.
  • i2 (char) – Value of i2/ind2 parameter.
  • subfields_dict (dict) – Dictionary containing subfields (as list).

subfields_dict is expected to be in this format:

{
    "field_id": ["subfield data",],
    ...
    "z": ["X0456b"]
}

Warning

For your own good, use OrderedDict for subfields_dict, or constructor’s resort parameter set to True (it is by default).

Warning

field_id can be only one character long!

get_i_name(num, is_oai=None)[source]

This method is used mainly internally, but it can be handy if you work with with raw MARC XML object and not using getters.

Parameters:
  • num (int) – Which indicator you need (1/2).
  • is_oai (bool/None) – If None, oai_marc is used.
Returns:

current name of i1/ind1 parameter based on oai_marc property.

Return type:

str

i1_name

Property getter / alias for self.get_i_name(1).

i2_name

Property getter / alias for self.get_i_name(2).

get_ctl_field(controlfield, alt=None)[source]

Method wrapper over controlfields dictionary.

Parameters:
  • controlfield (str) – Name of the controlfield.
  • alt (object, default None) – Alternative value of the controlfield when controlfield couldn’t be found.
Returns:

record from given controlfield

Return type:

str

getDataRecords(datafield, subfield, throw_exceptions=True)[source]

Deprecated since version Use: get_subfields() instead.

get_subfields(datafield, subfield, i1=None, i2=None, exception=False)[source]

Return content of given subfield in datafield.

Parameters:
  • datafield (str) – Section name (for example “001”, “100”, “700”).
  • subfield (str) – Subfield name (for example “a”, “1”, etc..).
  • i1 (str, default None) – Optional i1/ind1 parameter value, which will be used for search.
  • i2 (str, default None) – Optional i2/ind2 parameter value, which will be used for search.
  • exception (bool) – If True, KeyError is raised when method couldn’t found given datafield / subfield. If False, blank array [] is returned.
Returns:

of MARCSubrecord.

Return type:

list

Raises:

KeyError – If the subfield or datafield couldn’t be found.

Note

MARCSubrecord is practically same thing as string, but has defined MARCSubrecord.i1() and MARCSubrecord.i2 methods.

You may need to be able to get this, because MARC XML depends on i/ind parameters from time to time (names of authors for example).