Subject Cataloging by Norwegian Cataloging Agencies

Abstract This article reviews the practices of subject assignment by the two main Norwegian cataloging agencies serving the public library domain, Biblioteksentralen and Bokbasen, analyzing 47,235 records representing media cataloged by both agencies, published between 2012 and 2019. In addition to descriptive statistics representing these practices, we apply the Panofsky/Shatford model, previously used in the analysis of artworks and images, to distinguish aspects of these practices associated with levels of meaning. We find that Biblioteksentralen tends to use more abstract terms in their descriptions, while Bokbasen tends to use more general terms.


Introduction
Currently (2023), Norwegian public libraries obtain bibliographic records predominantly from two sources: Bokbasen 1 (Den norske bokdatabasen) and Biblioteksentralen 2 (Bibbi-data).As a part of a research project, 3 we carried out a partial comparison of these two agencies, and during the study, we observed a difference in the way the agencies assigned subject terms to the records they prepare and distribute.The purpose of the current paper is to analyze the assignment of subject terms by the same agencies, and their respective vocabularies, as manifest in the bibliographic records.
The research questions are: • How do the indexing practices and the underlying vocabularies of the agencies differ across domains and time?• How do the subject terms align with the Panofsky/Shatford categories?
To answer the first question we used a quantitative method entailing calculating relative frequencies of subject terms in subdivisions of record-pairs.To answer the second question we carried out a qualitative study using the Panofsky/Shatford categories.
To enable this analysis we have downloaded bibliographic records created by the agencies over an eight-year period (2012-2019).We compared the subject terms assigned to parallel publications, that is, publications that have been cataloged by both agencies, and identified by common ISBNs.

Subject indexing
Subject indexing is the practice of describing literature with subject terms taken from controlled vocabularies. 4Such vocabularies can have different forms: alphabetic-subject languages and classification languages. 5In this paper, we study two alphabetic-subject languages: one thesaurus and one subject authority list where terms are combined according to a set of syntax rules.
Controlled vocabularies aid users in performing subject searches.They are often employed in situations where high recall is paramount. 6ocabularies that have been studied include the Library of Congress Subject Headings (LCSH), 7 the Australian Education Index (AEI), 8 and Medical Subject Headings (MeSH). 9The automatic assignment of subject terms has also been the focus of research, most notably MeSH-terms. 10 A controlled subject vocabulary includes terms from three sources.Firstly, from the vocabulary in the literature, it is intended to describe.Secondly, from terms that real users (and librarians) use for searching.Finally, from terms that have a structural function, for example, to group a set of more specific terms.In the literature, these three sources of terms are referred to as literary warrant, 11 use warrant, and structural warrant, 12 respectively.
Our two vocabularies share similar literary and use warrants.But because their structures differ-one is a thesaurus and the other one a synthetic language-their structural warrant differs.In the subject vocabulary of Biblioteksentralen, compound subjects are precoordinated.The pre-coordinated subject headings are created according to Hjortsaeter 13 and share similarities with the Sears List of Subject Headings, a controlled vocabulary with subject headings for small and medium sized libraries mainly in the USA. 14Bokbasen assigns post-coordinated terms from their thesaurus when indexing documents.They also supplement the thesaurus with educational terms from the Udir 15 dictionary. 16hen it comes to how terms are formulated, both vocabularies follow the same rules given in Hjortsaeter. 17Most subject terms are nouns or noun phrases.The terms should describe the subject of the document as a whole, neither broader nor narrower.

Categorizing subject terms
In this study, we will use the Panofsky/Shatford model to categorize subject terms.The model has been used for categorizing subject indexing of many visual collections. 18anofsky identified three levels of meaning in Renaissance art: the pre-iconographical description, the iconographical analysis, and the iconological interpretation. 19Panofsky's model, as interpreted by Markey, 20 Shatford, 21 and others, has been influential in the development of systems for subject access to images. 22Shatford 23 extended and revised Panofsky's model.She categorized the subjects of pictures as Generic of, Specific of, and Abstract.Shatford also added four facets: who, what, where, and when.These correspond to Ranganathan's fundamental categories Personality, Matter, Energy, Time, and Space, although Shatford reduced Ranganathan's five categories to four. 24This resulted in a 3 × 4 matrix for the classification of image descriptions (see Table 8).
The Panofsky/Shatford-model we use corresponds to categories of subject headings presented in the rules given by Hjortsaeter, 25 where syntax rules are based on categories like units, actions, space, and time.Due to this correspondence, we believe that the model can be meaningful when categorizing subject terms primarily formulated to describe the aboutness of books.The inclusion of four facets makes the model interesting to apply to books, as both the facets of a thesaurus and the syntactic rules of a synthetic language use categories originating from Ranganathan's fundamental categories. 26he term "facet" is widely used when dealing with subject descriptions. 27n our categorizaton, we use only four facets already identified in the Panofsky/Shatford-model.The distinction in the model between specifics, generics, and abstracts (levels of meaning) gives the model a potential to reveal additional differences between the two agencies' indexing practices and underlying vocabularies, and potential gaps in the subject access for Norwegian media in general.

Brief history of the agencies
Historically there has been no common subject vocabulary in Norway.Biblioteksentralen's subject headings list, used by the majority of Norwegian public libraries, has been a de-facto standard in public and school library catalogs. 28This list has its origins in the late 1950s, and it was first published in 1963 and consists of pre-coordinated strings.
Biblioteksentralen is owned by municipalities and county municipalities in Norway.They offer books, metadata, and other services to libraries.
Bokbasen was established in its initial form in 1984 by Forlagsentralen. 29n 2007, it was separated from Forlagssentralen as its own company, and is now owned by a number of Norway's leading publishing groups.Bokbasen provides metadata and digital services to virtually all Norwegian publishers, book retailers, and some libraries.In the 1980s, Bokbasen started to develop a hierarchical thesaurus with controlled subject terms, and its cataloging department maintained it.
Both agencies provide bibliographic records for practically all publications published in Norway.Terms from their controlled vocabularies are applied to these records.
Before 2016, each public library decided whether to purchase centrally cataloged records and from where they would purchase them.Most libraries used Biblioteksentralen as their record vendor, some used records from Bokbasen, and a minority did not purchase records at all.In 2016 the National Library, acting as a directorate under the Ministry of Culture, changed the distribution of bibliographic records in Norway, 30 and entered a cooperation with Bokbasen, for the purchase of centrally cataloged records of books published by Norwegian publishers. 31However, Biblioteksentralen continued to deliver records as well, and many libraries continued to use them as a record supplier.

Datasets
The project uses three datasets: • Bibliographic records created by Biblioteksentralen and Bokbasen for the same publications published between 2012 and 2019 inclusive.• Biblioteksentralen's vocabulary.• Bokbasen's vocabulary.

The bibliographic records
The 2017-2019 records for both agencies were available online using REST services which allowed us to search for records more precisely and enabled a more exhaustive download of records for media published in a certain year.This is also the case for earlier Biblioteksentralen records, from the period 2012 to 2016, but not for Bokbasen records from this earlier period.Here we were granted access to the Bokbasen API, which does not have a similar search facility.This meant that all records changed or registered since January 1, 2012, had to be downloaded and then filtered for the applicable publication years.This may have resulted in some missing records for this period.
For the entire period, 2012-2019, we have downloaded a total of 185,804 records, 79,717 from Bokbasen and 106,087 from Biblioteksentralen.We identified 51,075 parallel publications (matched by ISBN).Of these, 47,235 have assigned subject terms (MARC fields 600, 610, 611, 630, 640, 650, 651, 653, and 656) from at least one of the agencies, and thus comprise our subset as presented in Table 1.

The vocabularies
Biblioteksentralen's list consists of pre-coordinated strings, whereas Bokbasen's thesaurus is hierarchic and contains five main categories: topic, form, genre, time, and place.In addition to Bokbasen's own vocabulary, the agency makes extensive use of a Norwegian-English dictionary of basic education maintained by Udir (Utdanningsdirektoratet-The Norwegian Directorate for Education and Training) for cataloging education-related textbooks.We do not study this dictionary as a vocabulary, but as it is a part of Bokbasen's indexing policy, we study the usage of Udir terms in the downloaded bibliographic records.
As vocabularies are used as a source for terminology for subject terms, we have obtained downloads of the vocabularies used by the agencies.Each of the vocabularies features both of the official Norwegian written languages, Bokmål and Nynorsk, of which we only regard the Bokmål part.
When it comes to fields that are normally assigned from name authority files (like personal names), both agencies had their own proprietary name-authority files before 2017.These have, after 2017, been used as the basis for contributing to the common authority file held by the national library. 32

Technical layout of the imported data
The bibliographic records were modeled in a relational database structure that facilitates detailed scrutiny and comparison of records.Both the Biblioteksentralen subject headings 33 and the Bokbasen thesaurus 34 were supplied to us modeled as RDF (Resource Description Framework) files conforming to the SKOS (Simple Knowledge Organization System) ontology and are available via the Skosmos system developed by the National Library of Finland. 35The Udir dictionary is available for download as an XML-file.After download, the files were adapted to our database model and imported into our database for further use.
Biblioteksentralen's vocabulary consists of strings.When a subject heading includes subdivisions (terms), they are delimited in the string by a hyphen with a blank to each side ("-").Sometimes a qualifier is appended at the end, to state a discipline of the subject.The qualifier is delimited by a colon with a blank to each side (":").An example is the string Farlig gods-Norge-Transport: lov og rett (Dangerous goods-Norway: Transportation: Legislation) (see Figure 1).To facilitate the analyses, the terms were extracted from the vocabulary and stored in the database, each term pointing to the string it is a part of (strings were also stored in the database as separate entities).Thus, we do not study the syntax or the strings, only their components (terms), such that each of these terms is compared separately.In this example, the member Lov og rett (legislation) (subfield $0) is omitted from the comparison, as in Bokbasen's records it typically goes into the Genre denotation, which is not part of our analysis. 36he Bokbasen thesaurus is hierarchical, and complex subjects, such as Philosophy, have one or more subordinate levels (see Figures 2 and 3).We do not include the hierarchy as such in this study, but subordinate terms are modeled as see-references for the Panofsky/Shatford analysis (Section 5).

See-references
Whereas the Biblioteksentralen records employ see-type reference fields explicitly (using field tag 950), the Bokbasen records lack these fields.The reason for this may be that Bokbasen terms are drawn from a thesaurus (see Section 3.2.2).Nearly half of the preferred terms in the Bokbasen thesaurus have alternative labels which are used as see references for the terms with which they are associated.One example is the term" Moderne filosofi" (modern philosophy, see Figures 2 and 3), which among its alternative labels has Positivisme (positivism) and Postmodernisme (post-modernism).Bokbasen seems to assume that subscribing libraries, having access to this thesaurus, can use the thesaurus for facilitating see references.For these reasons, terms from the see references were not used in the statistical occurrence analysis and comparisons, but we do include them in the Panofsky/Shatford analysis.To facilitate that, we artificially remodeled the Bokbasen bibliographic records, automatically introducing 950-field entries with See references (alternative labels) to each of any record's existing 650-tagged field (general subject term).This process sometimes resulted in records featuring tens of 950 entries.

Statistics of vocabulary usage
Table 2 shows the vocabularies' (unique terms) usage in the subject terms fields in our bibliographic dataset.As indicated in Section 3.2.2, the Bokmål only versions are counted 37

Statistical analysis of subject term occurrences in bibliographic dataset
In this section, we statistically describe occurrences of subject terms in our records.We start by comparing occurrences between the two agencies in the entire dataset and proceed to compare subdivisions of the material.

Types and principles of comparison
We analyzed occurrences of terms found in the bibliographic records as well as for subsets of those, based on: • years of publication (chronological) • domain of publication represented by the first digit of the Dewey classification code, i.e., main classes (where records from both vendors share these) Two comparison principles were used: • term-wise, aggregating terms across subsets of record-pairs for either agency into term-sets and comparing the sets.• record-wise, aggregating and comparing the sets of occurring terms across records pairs, counting record pairs where term-sets are equal, where term-sets intersect, and where term-sets are disjoint (see examples in Table 3).
For the sake of these analyses, we extracted subfields $a, $x, and $z from the subject fields (MARC fields 600, 610, 611, 630, 640, 650, 651, 653, and 656). 38,39When it comes to fields like 600, 610, 611, and 630 that are mostly updated from authority files, the authority files of the agencies, though originally proprietary, have been converging in recent years, including post-editing of older records. 40This means that we do not expect that name-forms will be different, and when including these fields in our analysis, we actually compare the agencies' interpretation of the work as having (or not having) the named person, organization, etc. as a subject.
The agencies use different vocabularies, and while there are subject indexing rules for controlling permissible word-forms, 41 different forms (inclinations, prefixes, suffixes, etc.) of the same word do account for some of the differences. 42Early thoughts about harmonizing word forms against Ordvev (the Norwegian version of the Wordnet lexical resource) 43 or applying lemmatization, were not pursued, because it was assumed that this would introduce its own noise into the analysis, offsetting any benefits.Moreover, in the analysis of subject terms using the Panofsky/Shatford categories (Section 5), we compare different grammatical forms of words and count them in different categories.Thus, a lemmatization would not benefit that analysis and the association between the analyses.

Common and different terms in the entire set
In Figure 4(a), we show the intersection and differences of unique terms across all the records in our dataset.Figure 4(b) depicts how many common terms (x ∈ {1, 2, 3, 4, 5, 6+}) are shared by different proportions (y ∈ [0.0, 1.0]) of record pairs.We see that almost half of the record-pairs share no common terms whereas very few share four terms or more.

Comparing subject term assignment over time
In Table 4, we show the intersection and differences of unique terms across all records-pairs belonging to each year since 2012.Looking at the percent columns to the right, there is a marked increase in the percentage of common subject terms after 2016.Table 5 lists the number and percentages of record pairs for which terms used are equal, intersecting, or disjoint.Also along this dimension, we see assignment practices coming closer.In Figure 5, we repeat the analysis of Figure 4(b) for subsets representing the year of publication, showing the rate of record pairs that share a number (x) of identical subject terms.For 2017-2019, we see a decrease in the rate of the record-pairs having no term in common (n = 0), and a visible increase in the rate of pairs sharing two subject terms.Both analyses indicate a closer practice of subject assignment between the agencies toward the end of the time period.

Comparing subject term assignment across domains represented by Dewey main classes
Unlike years of publication, Dewey classes do not represent a linear development along an obvious dimension.Wishing to examine how the class of the book affects the assignment of subject terms, we counted occurrences of unique terms for either of the agencies in all records from the respective agency having the first digit of the main Dewey classification code of the record (Table 6).We also counted the usage of the terms across record pairs within those classification groups (Table 7).
For the 900-999 classes, History and Geography, the share of disjoint record pairs is relatively small, which can be explained by the extensive usage of geographical names.The share of common unique terms is also higher here, but not as markedly different as for the record-pair similarity.This can be explained by the lack of lemmatization explained in Section 4.1.Likewise, the high share of equal sets of terms for the books classified as natural sciences (500-599) may indicate that practices of assignment (selection from vocabulary) are more similar as the subjects of these books are more well-defined In Figures 6-8, we show, for different subsets of the material (not classified, classified, and classified 3XX, 44 respectively), occurrences/co-occurrences of unique main terms in the subsets ((a)-sub-figures), as well as the rate of the parallel records sharing one, two, three, etc. terms ((b)-sub-figures).
We have not fully analyzed the details here, but do see that there are interesting variations.

Summary of data presentation
There are indications that the practices of subject assignment were more similar in 2017-2019 than they were in previous years, probably due to the change in the distribution of bibliographic records from the National Library of Norway.Their cooperation with Bokbasen from 2016, delivering data to potentially more public libraries from January 2017, appears to have changed their indexing practice.The cooperation demanded changes from Bokbasen.But it is also possible that Biblioteksentralen, risking a loss of customers, changed their records as well.

An analysis of subject terms using the Panofsky/Shatford categories
To compare subject term assignment by the two agencies, we categorized the subject terms of 490 randomly chosen nonfiction books published in 2019 into Panofsky/Shatford categories as described in Section 2.2.We chose to analyze a sample of the most recently published nonfiction books in our dataset, to get an updated view of the indexing practice.With the selection of a single year, we also hoped to find records from a stable indexing practice not influenced by change of policy.As our statistical analysis above indicates, the practices in 2019 were otherwise the most comparable.Four researchers annotated the subject terms from our selected record pairs.The annotation was carried out in an Excel spreadsheet with columns for titles, authors, and terms, with separate columns for the annotations (see excerpt in Figure 9).Bayerl et al. 45 provide an overview of the factors that influence intercoder agreement in manual annotations of this nature.Subsequently, the following description is based on those factors and aims to elucidate the circumstances under which the terms were annotated.Our annotation process solely focused on subject terms, and the potential subject matters were extensive and could cover any topic discussed within a nonfiction book.All annotators were metadata experts who work with library metadata on a daily basis.However, none of us are experts in all possible subjects that could be discussed within the published books.The annotators are fluent in Norwegian, and all subject terms were written in Norwegian.The study employed four annotators, with one annotating 130 books, two annotating 250 books, and the remaining one annotating 270 books.Each book, or record pair, was annotated by two researchers.The annotators had an initial training period working with the Panofsky/Shatford categories and annotating a random sample of subject terms.Any divergent opinions were discussed, and a list of examples from the random sample of books was compiled to serve as a reference for the annotators when in doubt.
The annotation process involved twelve categories, with some categories geared toward visual culture objects that infrequently occurred in the material.Among the remaining categories, the selection process was challenging.The presence of more categories further complicates the process of achieving agreement between annotators.8) to each of the terms.the pink frames encircle the annotations, where, e.g.g1 corresponds to "generics/Who." the "Biblioteksentralen" section of the same book is hidden to save space.
If we look at the broad categories, we find substantial differences between the agencies.
Bokbasen uses generic subject terms relatively more often than Biblioteksentralen (54 vs. 41%, see Table 10).Within Specifics and Abstracts categories, it is the opposite.Biblioteksentralen tends to use a higher percentage of subject terms compared to Bokbasen (Specifics: 26 vs. 31%, Abstracts: 20 vs. 28%, see Table 10).
When comparing the facets, the subject terms from Biblioteksentralen and Bokbasen are quite similar, all categories show differences smaller than three percentage points (see Table 11).
Biographies may be used to illustrate the differences between the agencies.Are they about the person only, or also about a subject?This depends on the specific book, but it can also be the result of the subject analysis.Out of the 490 books in our sample, 35 have metadata that indicate they are biographies.One example is the autobiography Min historie (My story), by and about cross-country skier Petter Northug.Biblioteksentralen uses only his name to describe the subject, while Bokbasen also uses the terms Langrenn (Cross-country skiing) and Idrettsutøvere (Athletes).While we disagreed on whether Langrenn (Cross-country skiing) is a generic or abstract term in our categorization, Idrettsutøvere (athletes) is undoubtedly a generic term.Thus, this is one of the books where Bokbasen applied a generic term, while Biblioteksentralen did not.Bokbasen has included subject terms that explain the role of the persons described in the biography, such as Idrettsutøvere (Athletes) in the previous example.This may be a useful subject term, but on the other hand, we may also see it as a violation of the rule that subject terms should only describe the specific subject of the book.Min historie is not about athletes in general, but about one specific athlete, named Petter Northug.Thus, according to the rule of specificity, 46 this term would be too broad.
In Table 12, we include a category distribution where only the works that are biographies are included.We can see that Bokbasen has a larger share of subject terms categorized as generic and abstract, compared to Biblioteksentralen.Biblioteksentralen also has applied more specific terms than generic, while Bokbasen has the opposite pattern: more generic terms than specific.This confirms our impression that the book Min historie (My story) is a typical example of how Bokbasen and Biblioteksentralen differ when it comes to biographies.The facet distribution for biographies (Table 13) resembles that for the whole material (Table 11), with larger differences for the Who and What facets.
Biblioteksentralen has a larger share of subject terms categorized as Abstract-What, Bokbasen has more subject terms categorized as Generics-What (see Table 9).These numbers are uncertain because categorizing Generics-What and Abstracts-What is difficult.On the other hand, all subject terms applied to one specific book, from both Biblioteksentralen and Bokbasen were always categorized by the same person.Thus, the distinction between Abstract-What and Generics-What for subject terms applied to the same book is considered by the annotators.All that said, Biblioteksentralen tends to use more abstract versions of words when assigning subject headings.The reasons for that may lie in the practices and traditions of the agencies, and this is something that might be further investigated qualitatively.

Specific subject terms
Biblioteksentralen tends to apply more subject terms categorized with Specifics-Who, Specifics-What, and Specifics-Where, Individually named persons, groups, things, events, actions, and geographical locations.Many of the Specifics-Who-terms are names of persons.We have not detected any difference when it comes to personal names.Most biographies have a personal name applied as a subject, from both vendors.For books that are not clear biographies, but include substantial biographical information, we find no systematic pattern: Sometimes one of them includes a personal name as a subject, sometimes the other does, and sometimes none or both.But all together Bokbasen applies a higher number of subjects to biographies compared to Biblioteksentralen, as they do with the other books as well.
Bokbasen rarely uses names of laws as subjects, even when a specific law is the topic of the book.Laws are also rare as related terms.Instead, Bokbasen uses words to describe what the law is about, like criminality or kindergartens.Biblioteksentralen uses the name of laws and thus does not always include words to describe what the law is about.This is also the case for books about some other named entities, like Grotten (a state-owned residence lent out to merited artists for the remainder of their lives), Apollo 11, Apex legends (video game), or Olsenbanden (film).

Specifics-When
Bokbasen tends to have more terms that name specific time periods.They also have more standardized subject terms about time and use them regardless of the time period covered in the topic of a book.Examples are 1,500-tallet, and 2000-2009, which designate a century and a decade, respectively.Biblioteksentralen also has established time-periods as subject terms, but they are not as systematic.Thus it seems like time needs to be a more explicit part of the topic for Biblioteksentralen to apply a time-related subject heading.

Generic subject terms, Generics-Who and Generics-What
Bokbasen uses more Generics-Who and Generics-What-categorized subject terms compared to Biblioteksentralen.One reason can be their tendency to apply broader index terms.One example is the book Informerte borgere?(Informed citizens?).Here Biblioteksentralen applied one term: Borgerdeltagelse (citizen participation).Bokbasen applied three different terms: Medier, Demokrati, and Sosiologi (respectively Media, Democracy, and Sociology).Together, these terms encircle the topic of the book but do not directly express the specific topic.Biblioteksentralen on the other hand, matches the term to the scope of the book.The differences in the number of Generics-Who-terms and Generics-What-terms here, are a result of Bokbasen's general tendency to apply more broad terms, rather than what categories the terms belong to.Another contribution to Bokbasen's higher number of Generics-Who and Generics-What terms, originates from Bokbasen's tendency to apply more terms to biographies.

Abstract subject terms, Abstracts-What
Biblioteksentralen has more Abstracts-What-categorized subject terms than Bokbasen.We have so far not identified systematic differences between the agencies that account for such a large difference.It often seems like simply different wording, where Biblioteksentralen tends to end up with Abstract-What terms more often than Bokbasen.This corresponds to the fact that Bokbasen has more Generics-Who-and Generics-What-categorized terms.Many subjects can be named with words that are either Generics-Who (bakverk/baked goods, sykkel/bicycle), Generics-What (baking/baking, sykling/biking), or Abstracts-What (bakerfag/bakery as a domain, sykkelfaget/ bicycles as a trade).In those cases, both Bokbasen and Biblioteksentralen use only one of the words, but we have not observed a systematic pattern for when either uses which word category.But altogether, Biblioteksentralen has a tendency to choose Abstracts-What-terms more often than Bokbasen.
For the remaining categories, such as Generics-Where, Generics-When, and Abstract-Where, there are only minor differences between the agencies when it comes to differences observable through our categorization.

Udir terms
Bokbasen uses a combination of terms from their own thesaurus and Udir terms.This is mainly the case for books intended for use in schools.If we leave out Udir terms, the distribution of Panofsky/Shatford categories changes slightly.The changes affect three of the Panofsky/Shatford categories: Abstract-What, Generic-Who, and Generic-What all include Udir terms.This corresponds with Udir terms containing terms that name school subjects, like physics or Norwegian.
The Udir terms also raise questions about what can be a subject.Some of the terms that Bokbasen applies express the intended use of the book more than its aboutness.One example is the book Kjemien stemmer where Biblioteksentralen simply applied the term Kjemi (Chemistry).Bokbasen on the other hand, applied five terms: Studiespesialisering, 47 Realfag vg3 (Sciences for 3rd high school year), Kjemi 2 (Chemistry 2), VG3 (3rd high school year), and Grunnbøker (basic level textbooks).None of the terms expresses the aboutness directly, instead, they all express aspects of the intended use of the book.However, the term Kjemi 2 (chemistry 2) includes the word Kjemi (Chemistry) that expresses aboutness, although the formulation strictly points to the level of chemistry knowledge you are supposed to achieve during your second year of reading chemistry.As a result, the aboutness of the book, chemistry, is searchable, but only indirectly expressed in the subject term.
Using the Udir terms, Bokbasen supposedly sees them as useful, especially for school libraries, and they probably are.But many of them do not express a book's aboutness.As there is no room for intended use or relation to discipline elsewhere in the record, Bokbasen has included those aspects as subject terms.
We do not know how Bokbasens' subject terms would be if they did not use the Udir terms at all.But the combination of the thesaurus and the Udir terms constitutes which terms Bokbasen's catalogers can use when they apply subject terms.Without Udir terms Bokbasen would probably apply fewer Abstract-What, Generic-Who, and Generic-What terms.But they could also have found a way to include such terms in their own thesaurus.

Discussion and conclusion
In the statistical comparison, we have found that records from Bokbasen and Biblioteksentralen were more similar after 2016.The two vendors have more subject terms in common during the years 2017-2019, compared to the years before.This corresponds to the change in policy by the National Library of Norway that happened in 2016.The imposed change in the distribution of bibliographic records appears to have had a harmonizing effect on the subject description practices of the two agencies (as prescribed by the tender mentioned in Preminger et al. 48).
When examining the subject terms themselves, we found many similarities between the agencies.They more or less follow the Norwegian rules for subject term assignment.But they also have some practices that differ.Sometimes the agencies simply chose different words for their subject descriptions.These can be different synonyms, with similar meanings.It could also be because their subject analysis of the book differs slightly.
When looking at the Panofsky/Shatford categorization, some differences between the agencies are more interesting.Bokbasen sometimes applies more subject terms that we have categorized as generic, and Biblioteksentralen sometimes applies more abstract terms.One example is the book Dybdelaering i naturfag, where Biblioteksentralen uses the term Undervisning (Teaching), while Bokbasen uses Pedagogikk (Pedagogy).We can see this in the number of terms categorized as abstract (Abstracts-Who) and generic (Generics-Who and Generics-What).But when looking at the books, it also seems that Biblioteksentralen's many abstract (Abstracts-Who) terms are a result of a tendency to choose the abstract version of a concept more often.Bokbasen's relatively more generic terms (Generics-Who and Generics-What) may also be a result of the same mechanism, where they choose the more concrete version of a concept more often.But our analysis also shows that Bokbasen quite often applies terms that violate the rule of applying the most specific term possible.This is visible in Bokbasen's relatively fewer number of terms categorized as specific (Specifics-Who, Specifics-What, Specifics-Where), but also within categories.One example of the latter is the book Supertorsken, where Biblioteksentralen has the term Torsk (Codfish) and Bokbasen the term Fisk (Fish), both categorized as Generics-Who.
We have stated that Biblioteksentralen and Bokbasen share a similar literary and use warrant, and we have observed many similarities.But some of the differences can be a result of differences in use warrant between the two agencies.Bokbasen's subject terms could be influenced by their slightly different view of the users of their data, where they have put emphasis on subject descriptions aimed at school libraries.Biblioteksentralen on the other hand, has a longer tradition as a vendor for public libraries.
When subject terms are too general, one can imagine consequences for precision and recall when searching.If users search for a specific topic, they may get zero hits even though there is a book about the topic in the collection.To find it, users must search with a slightly more general term.On the other hand, if users search with a more general term, they may find what they search for, and topics close to that.But if the collection is large, the hit list may be too long to look through.The usefulness of specific terms thus depends on how users behave and the size of the collection.
Before the advent of universal bibliographic control, every library would produce their own bibliographic records and decide what level of specificity was appropriate for each subject.If the number of documents within a certain subject was low, libraries would apply more general subject terms, thus helping users find what little they had.If the number of documents was high, they would apply more specific terms to help users find a reasonable number of hits.It seems Bokbasen has a practice that gives a similar result.We can see this as an indication of a collection warrant, or a literary warrant where the level of specificity is tuned according to the number of documents in the collection.
In this paper, we have identified several differences between subject vocabularies and their use.These changes are owed to differences in vocabulary as well as differences in the practices and policies of the agencies.It would take a more qualitative research design to try and isolate the effects of any of these factors.Another path for further research is to compare the assignment of subject descriptions to subject searches taken from libraries' search logs.

Figure 2 .
Figure 2. an example entry from the Bokbasen vocabulary as displayed in the skosmos interface.the figure includes the hierarchy for the term.

Figure 3 .
Figure 3. rDf/XMl-version of the example in figure 2.

Figure 4 .
Figure 4. the whole dataset.(a) number of unique terms across sets (b) rate of parallel record pairs (y-axis) sharing n terms (x axis).

Figure 5 .
Figure 5. year-wise rates of record pairs (y-axis) sharing x terms (x axis).

Figure 6 .
Figure 6.unique terms and rates of intersection for the subset not classified.

Figure 7 .
Figure 7. unique terms and rates of intersection for the subset classified.

Figure 8 .
Figure 8. unique terms and rates of intersection for the subset classified 300-399.

Figure 9 .
Figure 9. an example of annotating the Bokbasen terms assigned to one book.two of the annotators assigned categories (representing the cells in table8) to each of the terms.the pink frames encircle the annotations, where, e.g.g1 corresponds to "generics/Who." the "Biblioteksentralen" section of the same book is hidden to save space.

Table 1 .
our subset, number of record pairs by the year.

Table 2 .
number of subject terms taken from the bibliographic dataset along with unique vocabulary terms in use.

Table 3 .
examples of record-pairs with equal, intersecting, and disjoint terms.

Table 4 .
annual usage of unique terms across agencies.

Table 5 .
Comparing record pairs per year: How many record pairs (in a specific year) use entirely the same terms, how many intersect, and how many are disjoint?

Table 6 .
usage of unique terms across the agencies by Dewey main classes.

Table 7 .
Comparing numbers and percentages of record pairs within class-code groups, for which term usage equals, intersects, or is disjoint.

Table 10 .
Distribution of broad categories by the agencies in our sample.a a Hanneman, Kposowa, and riddle, Basic Statistics for Social Research, 290-2.

Table 11 .
Distribution of broad facets by the agencies in our sample.a a Hanneman, Kposowa, and riddle, Basic Statistics for Social Research, 290-2.

Table 12 .
Distribution of broad categories for biographies by the agencies in our sample.a a Hanneman, Kposowa, and riddle, Basic Statistics for Social Research, 290-2.

Table 13 .
Distribution of broad facets for biographies by the agencies in our sample.a a Hanneman, Kposowa, and riddle, Basic Statistics for Social Research, 290-2.