Classification Quality Control

This vignette provides information about the classificationQC() function included in the correspondenceTables package, which is used to perform quality control on classifications.

library(correspondenceTables) 

Overview

The main function classificationQC() performs structural and logical quality control on hierarchical classifications. It returns a list of data frames, including an enriched version of the classification (QC_output) and additional tables flagging potential issues such as orphan codes, duplicate labels, or sequencing problems.

The quality‑control checks identify several types of potential structural or logical issues commonly observed in official classifications:

Main arguments

The main arguments of the function are:

It is important to note that not all detected issues necessarily indicate errors. The quality‑control checks are diagnostic signals intended to support expert review of classification quality and consistency, and they do not impose constraints on the hierarchical structure itself. In particular:

Auxiliary Tables for Classification Validation

The validation procedures rely on a small set of auxiliary tables that define structural constraints, such as expected code lengths, single‑child rules, and sequencing between levels.

We load three auxiliary tables used for classification validation.

Definition of expected code lengths using the mandatory lengths argument

The lengths table specifies the character positions at which each hierarchical level of a classification code starts and ends. Specifically, column charb indicates the starting position of the segment (character beginning), while column chare indicates the ending position (character end).

For example, the following definition indicates that:

  • level 1 codes start at the first position and end at the second,
  • level 2 codes start at the third position and end at the fourth,
  • level 3 codes start at the fifth position and end at the seventh.

An example of such a structure is shown below:

lengths_example <- data.frame(
  charb = c(1, 3, 5),
  chare = c(2, 4, 7)
)

knitr::kable(
  lengths_example,
  caption = "Example of expected code lengths by hierarchical level",
  align = "c"
)
Example of expected code lengths by hierarchical level
charb chare
1 2
3 4
5 7

Single‑child code constraints

In some classifications, specific coding conventions are used to distinguish between situations where a parent code has a single child and situations where it has multiple children. These conventions do not restrict the hierarchical structure itself and do not limit the number of children per node.

Instead, they verify whether observed codes comply with predefined coding patterns when a single‑child or multiple‑child situation occurs.

The singleChildCode table defines these admissible patterns and contains the following columns:

  • level: the hierarchical level at which the rule applies.
  • singleCode: the expected coding pattern when a parent has exactly one child (for example, retaining the same code).
  • multipleCode: the expected coding pattern when a parent has multiple children (for example, using a sequence of numeric or alphanumeric suffixes).

These checks do not modify the classification and do not enforce a specific hierarchical shape. They merely flag cases where observed coding does not match the declared conventions, which may indicate inconsistencies in code design.

singleChildCode <- read.csv(
  system.file("extdata/test", "SingleChild.csv",
              package = "correspondenceTables")
)

knitr::kable(
  singleChildCode,
  caption = "Single-child code rules",
  align = "c"
)
Single-child code rules
level singleCode multipleCode
2 0 10
3 0 1
4 0 1

Sequencing rules between hierarchical levels

Sequencing checks are not intended to impose an ordering on hierarchical trees. In a pure tree structure, only parent‑child relationships matter.

However, in many official classifications, code values themselves convey implicit structure (for example numeric or alphanumeric sequences). In such systems, sibling codes are often expected to follow predefined ranges or patterns.

The purpose of sequencing checks is therefore diagnostic, not normative: they aim to detect gaps or breaks in otherwise structured code spaces, which may indicate missing, omitted, or inconsistently defined codes.

Sequencing rules are defined through a table with the following columns:

  • level: the hierarchical level at which sequencing rules apply.
  • multipleCode: the expected pattern or range of sibling codes used to detect potential gaps under the same parent.

Sequencing anomalies do not invalidate the hierarchy, but they may point to classification maintenance issues or incomplete implementations of official coding schemes.


sequencing <- read.csv(
  system.file("extdata/test", "Sequencing.csv",
              package = "correspondenceTables")
)

knitr::kable(
  sequencing,
  caption = "Example of sequencing rules by hierarchical level",
  align = "c"
)
Example of sequencing rules by hierarchical level
level multipleCode
2 1.020304e+196
3 1.234568e+08
4 1.234568e+08

Example 1: Basic quality control using hierarchy definitions

The following example applies classificationQC() to the NACE Rev.2 classification using additional parameters.

In this example, the user provides:

This example demonstrates how different parameters of classificationQC() are used to perform structural and logical quality checks.

classification <- read.csv(
  system.file("extdata/test", "Nace2_long.csv", package = "correspondenceTables")
)

lengths <- data.frame(
  charb = c(1, 2, 3, 5),
  chare = c(1, 2, 4, 5)
)

We now apply the classificationQC() function using the previously defined classification and hierarchy structure. The function performs structural and logical quality checks on the NACE Rev.2 classification. For illustration purposes, the output is summarised by reporting the number of detected issues for selected quality checks.

output <- classificationQC(
  classification   = classification,
  lengths          = lengths,
  fullHierarchy    = TRUE,
  labelUniqueness  = TRUE,
  labelHierarchy   = TRUE,
  singleChildCode  = NULL,
  sequencing       = NULL
)

qc_summary <- data.frame(
  Check            = c("No levels", "Orphan codes", "Childless codes"),
  Number_of_issues = c(
    nrow(output$QC_noLevels),
    nrow(output$QC_orphan),
    nrow(output$QC_childless)
  )
)

knitr::kable(
  qc_summary,
  caption = "Summary of quality control checks",
  align = "c"
)
Summary of quality control checks
Check Number_of_issues
No levels 0
Orphan codes 88
Childless codes 21

Codes with no hierarchy level (QC_noLevels)

In this example, all classification codes have a properly defined hierarchy level. As a result, the quality check QC_noLevels does not produce any output.

QC_noLevels

  • Rows: 0
  • Columns: 16

Orphan codes (QC_orphan)

Orphan codes are codes that have no parent code at a higher hierarchical level. This usually indicates breaks in the hierarchical structure.

QC_orphan

  • Rows: 88
  • Columns: 18
Orphan codes (First 5 rows (first 7 columns))
Code Label Level Parent Include Include_Also Exclude level
1 01 Crop and animal production, hunting and related service activities 2 0 NA This division also includes service activities incidental to agriculture, as well as hunting, trapping and related activities. Agricultural activities exclude any subsequent processing of the agricultural products (classified under divisions 10 and 11 (Manufacture of food products and beverages) and division 12 (Manufacture of tobacco products)), beyond that needed to prepare them for the primary markets. The preparation of products for the primary markets is included here. The division excludes field construction (e.g. agricultural land terracing, drainage, preparing rice paddies etc.) classified in section F (Construction) and buyers and cooperative associations engaged in the marketing of farm products classified in section G. Also excluded is the landscape care and maintenance, which is classified in class 81.30. 2
40 02 Forestry and logging 2 0 NA NA Excluded is further processing of wood beginning with sawmilling and planing of wood, see division 16. 2
49 03 Fishing and aquaculture 2 0 NA Also included are activities that are normally integrated in the process of production for own account (e.g. seeding oysters for pearl production). Service activities incidental to marine or freshwater fishery or aquaculture are included in the related fishing or aquaculture activities. This division does not include building and repairing of ships and boats (30.1, 33.15) and sport or recreational fishing activities (93.19). Processing of fish, crustaceans or molluscs is excluded, whether at land-based plants or on factory ships (10.20). 2
56 05 Mining of coal and lignite 2 0 NA NA This division does not include coking (see 19.10), services incidental to coal or lignite mining (see 09.90) or the manufacture of briquettes (see 19.20). 2
61 06 Extraction of crude petroleum and natural gas 2 0 NA NA This division excludes: - oil and gas field services, performed on a fee or contract basis, see 09.10 - oil and gas well exploration, see 09.10 - test drilling and boring, see 09.10 - refining of petroleum products, see 19.20 - geophysical, geologic and seismic surveying, see 71.12 2

Childless codes (QC_childless)

Childless codes are codes at high level that have no descendants at lower hierarchical levels. This can be expected at the lowest level of a classification, but may indicate structural issues at higher levels.

QC_childless

  • Rows: 21
  • Columns: 19
First 5 rows (first 7 columns)
Code Label Level Parent Include Include_Also Exclude level
976 A AGRICULTURE, FORESTRY AND FISHING 1 NA NA NA NA 1
977 B MINING AND QUARRYING 1 NA NA NA This section excludes: - processing of the extracted materials, see section C (Manufacturing) - usage of the extracted materials without a further transformation for construction purposes, see section F (Construction) - bottling of natural spring and mineral waters at springs and wells, see 11.07 - crushing, grinding or otherwise treating certain earths, rocks and minerals not carried on in conjunction with mining and quarrying, see 23.9 1
978 C MANUFACTURING 1 NA NA NA NA 1
979 D ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY 1 NA NA Also included is the provision of steam and air-conditioning supply. This section excludes the operation of water and sewerage utilities, see 36, 37. This section also excludes the (typically long-distance) transport of gas through pipelines. 1
980 E WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES 1 NA NA Activities of water supply are also grouped in this section, since they are often carried out in connection with, or by units also engaged in, the treatment of sewage. NA 1

Example 2: Quality control with single‑child coding rules

The following example illustrates the quality control of the NACE Rev.2 classification from CELLAR using additional parameters, including the singleChildCode argument.


singleChildCode <- read.csv(
  system.file("extdata/test", "SingleChild.csv", package = "correspondenceTables")
)
knitr::kable(
  singleChildCode,
  caption = "singleChildCode argument",
  align = "c"
)
singleChildCode argument
level singleCode multipleCode
2 0 10
3 0 1
4 0 1

output2 <- classificationQC(
  classification   = classification,
  lengths          = lengths,
  fullHierarchy    = TRUE,
  labelUniqueness  = TRUE,
  labelHierarchy   = TRUE,
  singleChildCode  = singleChildCode,
  sequencing       = NULL
)

This table lists orphan codes, i.e. codes that do not have a valid parent at the immediately higher hierarchical level.

QC_orphan

First 5 rows (first 7 columns)
Code Label Level Parent Include Include_Also Exclude level
1 01 Crop and animal production, hunting and related service activities 2 0 NA This division also includes service activities incidental to agriculture, as well as hunting, trapping and related activities. Agricultural activities exclude any subsequent processing of the agricultural products (classified under divisions 10 and 11 (Manufacture of food products and beverages) and division 12 (Manufacture of tobacco products)), beyond that needed to prepare them for the primary markets. The preparation of products for the primary markets is included here. The division excludes field construction (e.g. agricultural land terracing, drainage, preparing rice paddies etc.) classified in section F (Construction) and buyers and cooperative associations engaged in the marketing of farm products classified in section G. Also excluded is the landscape care and maintenance, which is classified in class 81.30. 2
40 02 Forestry and logging 2 0 NA NA Excluded is further processing of wood beginning with sawmilling and planing of wood, see division 16. 2
49 03 Fishing and aquaculture 2 0 NA Also included are activities that are normally integrated in the process of production for own account (e.g. seeding oysters for pearl production). Service activities incidental to marine or freshwater fishery or aquaculture are included in the related fishing or aquaculture activities. This division does not include building and repairing of ships and boats (30.1, 33.15) and sport or recreational fishing activities (93.19). Processing of fish, crustaceans or molluscs is excluded, whether at land-based plants or on factory ships (10.20). 2
56 05 Mining of coal and lignite 2 0 NA NA This division does not include coking (see 19.10), services incidental to coal or lignite mining (see 09.90) or the manufacture of briquettes (see 19.20). 2
61 06 Extraction of crude petroleum and natural gas 2 0 NA NA This division excludes: - oil and gas field services, performed on a fee or contract basis, see 09.10 - oil and gas well exploration, see 09.10 - test drilling and boring, see 09.10 - refining of petroleum products, see 19.20 - geophysical, geologic and seismic surveying, see 71.12 2

This table lists childless codes, i.e. codes that have no descendants at the immediately lower hierarchical level

QC_childless

First 10 rows (first 7 columns)
Code Label Level Parent Include Include_Also Exclude level
976 A AGRICULTURE, FORESTRY AND FISHING 1 NA NA NA NA 1
977 B MINING AND QUARRYING 1 NA NA NA This section excludes: - processing of the extracted materials, see section C (Manufacturing) - usage of the extracted materials without a further transformation for construction purposes, see section F (Construction) - bottling of natural spring and mineral waters at springs and wells, see 11.07 - crushing, grinding or otherwise treating certain earths, rocks and minerals not carried on in conjunction with mining and quarrying, see 23.9 1
978 C MANUFACTURING 1 NA NA NA NA 1
979 D ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY 1 NA NA Also included is the provision of steam and air-conditioning supply. This section excludes the operation of water and sewerage utilities, see 36, 37. This section also excludes the (typically long-distance) transport of gas through pipelines. 1
980 E WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES 1 NA NA Activities of water supply are also grouped in this section, since they are often carried out in connection with, or by units also engaged in, the treatment of sewage. NA 1
981 F CONSTRUCTION 1 NA NA This section also includes the development of building projects for buildings or civil engineering works by bringing together financial, technical and physical means to realise the construction projects for later sale. If these activities are carried out not for later sale of the construction projects, but for their operation (e.g. rental of space in these buildings, manufacturing activities in these plants), the unit would not be classified here, but according to its operational activity, i.e. real estate, manufacturing etc. 1
982 G WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES 1 NA NA NA NA 1
983 H TRANSPORTATION AND STORAGE 1 NA NA NA This section excludes: - major repair or alteration of transport equipment, except motor vehicles, see group 33.1 - construction, maintenance and repair of roads, railways, harbours, airfields, see division 42 - maintenance and repair of motor vehicles, see 45.20 - rental of transport equipment without driver or operator, see 77.1, 77.3 1
984 I ACCOMMODATION AND FOOD SERVICE ACTIVITIES 1 NA NA NA This section excludes the provision of long-term accommodation as primary residences, which is classified in real estate activities (section L). Also excluded is the preparation of food or drinks that are either not fit for immediate consumption or that are sold through independent distribution channels, i.e. through wholesale or retail trade activities. The preparation of these foods is classified in manufacturing (section C). 1
985 J INFORMATION AND COMMUNICATION 1 NA NA NA NA 1

Example 3: Quality control with sequencing constraints

In this final example, the sequencing parameter is used to detect potential gaps in structured sequences of sibling codes within the hierarchy.

Sequencing rules are applied at hierarchical levels 3 and 4, as specified in the sequencing input table. At these levels, the function identifies missing or inconsistent code values within predefined numeric or alphanumeric ranges, which may indicate incomplete or faulty classification structures.


singleChildCode <- read.csv(
  system.file("extdata/test", "SingleChild2.csv", package = "correspondenceTables")
)

sequencing <- read.csv(
  system.file("extdata/test", "Sequencing.csv",
              package = "correspondenceTables")
)

output3 <- classificationQC(
  classification   = classification, 
  lengths          = lengths,
  fullHierarchy    = TRUE,
  labelUniqueness  = TRUE,
  labelHierarchy   = TRUE,
  singleChildCode  = singleChildCode,
  sequencing       = sequencing
)

The QC_gapBefore argument identifies gaps in expected code sequences among sibling codes within the same parent.

QC_gapBefore

QC_gapBefore. First 10 rows (first 7 columns)
Code Label Level Parent Include Include_Also Exclude level
2 01.1 Growing of non-perennial crops 3 01 NA NA NA 3
9 01.19 Growing of other non-perennial crops 4 01.1 NA NA This class excludes: - growing of non-perennial spices, aromatic, drug and pharmaceutical crops, see 01.28 4
10 01.2 Growing of perennial crops 3 01 NA NA NA 3
20 01.3 Plant propagation 3 01 NA NA NA 3
22 01.4 Animal production 3 01 NA NA This group excludes: - farm animal boarding and care, see 01.62 - production of hides and skins from slaughterhouses, see 10.11 3
30 01.49 Raising of other animals 4 01.4 NA NA This class excludes: - production of hides and skins originating from hunting and trapping, see 01.70 - operation of frog farms, crocodile farms, marine worm farms, see 03.21, 03.22 - operation of fish farms, see 03.21, 03.22 - boarding and training of pet animals, see 96.09 - raising and breeding of poultry, see 01.47 4
31 01.5 Mixed farming 3 01 NA NA NA 3
33 01.6 Support activities to agriculture and post-harvest crop activities 3 01 NA Also included are post-harvest crop activities, aimed at preparing agricultural products for the primary market. NA 3
38 01.7 Hunting, trapping and related service activities 3 01 NA NA NA 3
41 02.1 Silviculture and other forestry activities 3 02 NA NA NA 3

This table lists the last sibling codes within each group of children, used to assess sequence completeness.

QC_lastSibling

QC_lastSibling. First 10 rows (first 7 columns)
Code Label Level Parent Include Include_Also Exclude level
9 01.19 Growing of other non-perennial crops 4 01.1 NA NA This class excludes: - growing of non-perennial spices, aromatic, drug and pharmaceutical crops, see 01.28 4
19 01.29 Growing of other perennial crops 4 01.2 NA NA This class excludes: - growing of flowers, production of cut flower buds and growing of flower seeds, see 01.19 - gathering of tree sap or rubber-like gums in the wild, see 02.30 4
30 01.49 Raising of other animals 4 01.4 NA NA This class excludes: - production of hides and skins originating from hunting and trapping, see 01.70 - operation of frog farms, crocodile farms, marine worm farms, see 03.21, 03.22 - operation of fish farms, see 03.21, 03.22 - boarding and training of pet animals, see 96.09 - raising and breeding of poultry, see 01.47 4
37 01.64 Seed processing for propagation 4 01.6 NA NA This class excludes: - growing of seeds, see groups 01.1 and 01.2 - processing of seeds to obtain oil, see 10.41 - research to develop or modify new forms of seeds, see 72.11 4
38 01.7 Hunting, trapping and related service activities 3 01 NA NA NA 3
47 02.4 Support services to forestry 3 02 NA NA NA 3
52 03.12 Freshwater fishing 4 03.1 NA This class also includes: - gathering of freshwater materials This class excludes: - processing of fish, crustaceans and molluscs, see 10.20 - fishing inspection, protection and patrol services, see 84.24 - fishing practiced for sport or recreation and related services, see 93.19 - operation of sport fishing preserves, see 93.19 4
53 03.2 Aquaculture 3 03 NA In addition, “aquaculture” also encompasses individual, corporate or state ownership of the individual organisms throughout the rearing or culture stage, up to and including harvesting. NA 3
55 03.22 Freshwater aquaculture 4 03.2 NA NA This class excludes: - aquaculture activities in salt water filled tanks and reservoirs, see 03.21 - operation of sport fishing preserves, see 93.19 4
59 05.2 Mining of lignite 3 05 NA NA NA 3

This table contains the full classification enriched with all quality‑control flags produced by the checks

QC_output

First 10 rows (first 7 columns)
nace2 Label Level Parent Include Include_Also Exclude level
01 Crop and animal production, hunting and related service activities 2 0 NA This division also includes service activities incidental to agriculture, as well as hunting, trapping and related activities. Agricultural activities exclude any subsequent processing of the agricultural products (classified under divisions 10 and 11 (Manufacture of food products and beverages) and division 12 (Manufacture of tobacco products)), beyond that needed to prepare them for the primary markets. The preparation of products for the primary markets is included here. The division excludes field construction (e.g. agricultural land terracing, drainage, preparing rice paddies etc.) classified in section F (Construction) and buyers and cooperative associations engaged in the marketing of farm products classified in section G. Also excluded is the landscape care and maintenance, which is classified in class 81.30. 2
01.1 Growing of non-perennial crops 3 01 NA NA NA 3
01.11 Growing of cereals (except rice), leguminous crops and oil seeds 4 01.1 NA NA This class excludes: - growing of rice, see 01.12 - growing of sweet corn, see 01.13 - growing of maize for fodder, see 01.19 - growing of oleaginous fruits, see 01.26 4
01.12 Growing of rice 4 01.1 NA NA NA 4
01.13 Growing of vegetables and melons, roots and tubers 4 01.1 NA NA This class excludes: - growing of chillies, peppers (capsicum sop.) and other spices and aromatic crops, see 01.28 - growing of mushroom spawn, see 01.30 4
01.14 Growing of sugar cane 4 01.1 NA NA This class excludes: - growing of sugar beet, see 01.13 4
01.15 Growing of tobacco 4 01.1 NA NA This class excludes: - manufacture of tobacco products, see 12.00 4
01.16 Growing of fibre crops 4 01.1 NA NA NA 4
01.19 Growing of other non-perennial crops 4 01.1 NA NA This class excludes: - growing of non-perennial spices, aromatic, drug and pharmaceutical crops, see 01.28 4
01.2 Growing of perennial crops 3 01 NA NA NA 3