Questions of the influence of quality VIN on the safety of vehicle operation from the point of view of critical information systems

Abstract Today, the development of modern vehicles is primarily intended for electronics, control systems, artificial intelligence and electromobility. There is an increasing number of information interactions between the vehicles, the vehicle and the transport infrastructure, the vehicle and its manufacturer or service point, the vehicle and rescue system, the vehicle and critical information systems managed by the government or third parties. This communication is associated with the quality of data in the information systems involved in the communication, especially in terms of unique vehicle identification worldwide using the VIN (Vehicle Identification Number). The paper describes the analysis in which an average error rate of 8% was found in this key identifier in various governmental information systems. Incorrect vehicle identification can have a major impact on the speed and quality of the intervention of the components of the integrated system in a crashed vehicle and thus directly endanger human health or life. The paper, based on statistically oriented research, analyzes the causes of the error and proposes, describes ways to ensure the accuracy of the VIN identifier for vehicles in various information systems, especially for the purposes of security and rescue services, which belong to critical infrastructure.


PUBLIC INTEREST STATEMENT
The basis of information, record-keeping, security and forensic practice is always the unambiguous individual identification of every object with which we work in any way. In the case of vehicles, this is the so-called VIN-Vehicle Identification Number. This is a 17-digit alphanumeric string. During its manual entry into various information systems, in practice, an error rate of 3-5% occurs. In the case of critical infrastructure information systems, even this relatively low error rate can have major negative impacts on certain processes, such as searching for stolen vehicles, eliminating various frauds, or providing assistance to the vehicle crew in the event of a crash. The paper deals with the issue of what methods can be used to effectively prevent unwanted effects that are the result of errors in the VIN identifier.

Introduction
Data quality is an important attribute of every information system, for which we expect a certain functionality, efficiency and reliability (E.V, 2018;V.G, 2019;Wang et al., 2018, Reedy, 2020. Data quality is crucial, especially for so-called key object identifiers (unique primary keys in computer terminology), which uniquely identify a given object . In the area of motor vehicles and information systems, which store vehicle information for various purposes, the object identifier is the VIN (Vehicle Identification Number; Cruz-Jesus et al., 2018;E.V, 2017;Kumar et al., 2021;V.a, 2018).
The quality and error-free entry of the VIN (Figure 1) in information systems, determines whether a vehicle, searched or examined for various reasons, will be found in the computer database at all (Clady et al., 2008;. Today, the effectiveness of the Police and other state security forces (including the fight against organised crime or terrorism) [11], the effectiveness of rescue services in the event of a vehicle accident (pan-European eCALL project (Rak & Zrubak, 2012), the control activities of state administration bodies, post-sales (service) and other services of the automotive industry, the services of insurance companies, leasing companies, and other various third parties in the commercial and non-commercial sector depend on the VIN quality.
The paper deals with the analysis of data quality in government information systems. However, this quality does not match the modern data collection, acquisition and control technologies that these technologies offer today. There are numerous errors in the primary VIN identifier in the information systems because this identifier is still manually transferred from vehicle documents (where there may be errors, forged or altered) to both the state and private information systems without acknowledging that there may be a big number of errors (Elagin et al., 2020;Mendiboure  Source: Roman Rak et al., 2018). In specific cases, these errors can lead to fatal consequences-failure to find a stolen or safety defective vehicle, failure to provide the necessary information for the activities of the emergency services, i.e., in extreme cases, endangering the health and life of persons involved in a serious traffic accident (Böhm et al., 2020;Matuszak et al., 2015), frauds in car purchases, damage settlement, civil disputes, etc (Li et al., 2018).

Material and methods
The globally unique VIN identifier (Vehicle Identification Number) is defined worldwide using internationally valid ISO 3779:1983 standards (since 1986)-Road vehicles-Vehicle identification number (VIN)-Content and structure; ISO 3780:1983-Road vehicles-World manufacturer identifier (WMI) code, and ISO 4030:19,831,983-Road vehicles-Vehicle identification number (VIN)-Location and attachment. The mentioned standards specify and implement the unambiguous vehicle identification worldwide.
A VIN is a string of alphanumeric characters of the precise length of 17 characters. To avoid visual similarity and inaccuracies, the O, Q, and I characters are prohibited. The VIN has 3 basic components (see, Figure 2): • WMI-World Manufacturer Identifier. Three-character sequences identify the vehicle manufacturer (factory make). This part is internationally standardised.
• VDS-Vehicle Description Section. The section specifies the technical and other characteristics of the vehicle, their structure, coding depends on the vehicle manufacturer. The 9th position of the VIN (i.e. the 6th position of the VDS) can be used for so-called check digit mechanism. If the check digit mechanism is used, a globally defined standard is used. This mechanism is mandatory in the USA, but not in other countries. Again, it depends on the voluntariness, the will of the manufacturer, whether or not to use the mechanism. The check digit then determines whether there was an error in the VIN entry (copy).
The check digit mechanism is very important to ensure data quality in any information system.
• VIS-Vehicle Identifier Section. This section always contains the vehicle sequence, serial number. At the manufacturer's discretion, the year of manufacture, or so-called model year and factory, or other information (especially for vehicles produced in small batches) may be included there.
A data sample of 4,059,009 records from the Vehicle Inspection Register managed by the Ministry of Transport of the Czech Republic was used for the VIN quality analysis. The data sample is valid as of 31 December 2018.
The VIN the distressed vehicle in a fraction of a second, without having to link to the various national registers of every EU member state.
The VINdecoder application is based on special algorithms and knowledge database. This database contains VIN structures and related vehicle information. The database holds all the VIN information on a global scale, defined by vehicle manufacturers since 1986, when the VIN was defined as a worldwide unique identifier and made mandatory in automotive practice. The knowledge database contains approximately 8,900 basic type models for decoding the VIN structure of vehicles operating in the European Union. The knowledge database thus analytically covers 99.8 % of registered vehicles (including motorcycles, trucks, buses, tractors, semi-trailers, trailers, work machines, etc.).
Quality of this sample was checked using the special VINdecoder application (product name VINexpert). This application was originally designed and is still used in the Czech and Slovak Republics to decode the basic information contained in the VIN of a specific vehicle for the needs of the integrated rescue system within the framework of the pan-European eCALL project in cases where a vehicle has crashed and sent a distress signal (Jurecki & Jaskiewicz, 2012), containing, among other things, its VIN identifier (Balasubramaniam et al., 2021;Khoshavi et al., 2021).
The VIN of the crashed vehicle is transmitted from the vehicle using the network of mobile telephone operators and subsequently decoded. This provides the emergency services with basic information about.

Factors affecting the quality of VINs in information systems
There are two main factors, direct (objective) and indirect (subjective), that determine how well a VIN is entered into a computer application: • Technology of Entering the VIN in the Information System • Control mechanisms for verifying the VIN reality and correctness

Technology of entering the VIN in the information system
This is a way to technically enter the VIN into a computer application, its database. Historically, the methods of entry, the possibilities of its execution have changed, depending on the evolution of Source: Roman Rak technology, computer technology, and peripherals (Makridakis & Christodoulou, 2019). There are the following three basic options for entering VINs into records dealing with motor vehicles: • Manual entry (RT1, 1 see, Figure 4) • Opto-electronic entry (RT2) • Digital entry from vehicle control units (RT3) The entry technology is considered to be a direct, objective factor affecting the quality of VINs in information systems. The VIN is always entered into computer applications using a certain technology, which can include a simple manual copy from the document submitted with the vehicle (MacHardy et al., 2018).

Manual entry
The VIN is entered by the user using a keyboard from its paper template, from the vehicle registration document, COC 2 sheet, etc. Manual entry is still the most common way of capturing data in vehicle registration information systems. This method features the lowest quality of data captured, i.e., VIN error rate. Incorrect, unrealistic VIN values may be shown in the documents. During the copying process, the user may make unknown, unintentional errors (misreading the VIN entry from the document, typing it incorrectly, swapping adjacent VIN characters due to finger motor skills on the keyboard, etc.). Deliberate errors in the VIN entry in order to change the identity of the vehicle, so that it cannot be checked against, for example, police tracking systems containing stolen vehicles, are quite common as well.

Opto-electronic entry
An opto-electronic interface (peripheral) is connected to a computer application working with vehicle registration, which ultimately makes an electronic entry of the VIN without the need to use a keyboard. The interface can be a digital camera with special software that converts the visual (image) form of the VIN into an electronic, text form; a 2D or 3D code scanner or an OCR reader to extract the text content from the vehicle document. The camera or scanners can be used to capture the VIN from the homologation or data plates of the vehicle, from the VIN located under the windscreen or from the VIN physically stamped into the vehicle body.
This procedure of VIN acquisition guarantees a high quality VIN. Unintentional errors in the VIN due to human fatigue, inattention, inability to read or write correctly are excluded in this case. However, deliberate errors cannot be completely excluded, where a person deliberately makes a VIN using opto-electronic peripherals from another document, vehicle, etc. In order to exclude this type of error, a series of additional photographs are taken of the overall object from which the details were taken.

Digital entry from the vehicle control units
The VIN is transferred from the vehicle control unit, using a standardised OBD II interface, directly to the vehicle registration application, completely automatically, excluding any human factor. The human role is only to connect the connector to the vehicle interface at a standardised location in the driver's workstation area (Soltani & Hosseini Seno, 2019). The data transfer is usually implemented wirelessly (Corea, 2019;Mutawa et al., 2019).
This procedure eliminates both intentional and unintentional human errors. There are currently on average over 80 electronic control units (ECUs) in a modern vehicle (Electronic Control Unit). Many of these contain a digital VIN or other identifier Mansor et al., 2016). In addition, differences in VIN values may reveal unauthorised fitting (replacement) of major vehicle components (Dirnbach et al., 2020), which may even come from illegal activities-stolen or scrapped vehicles etc (Afgan et al., 2018;Jánský & Tušer, 2022).

Control mechanisms for verifying the VIN reality and correctness
The basic task when entering a VIN into a computer application is to ensure it features no errors, i.e., its necessary data quality. This is due to the fact that the VIN is the basic identifier of the vehicle and also the linking, primary key among various databases.
Verification mechanisms for checking the reality and correctness (error-free) may or may not be used in practice. It is up to the responsibility and knowledge of the owner of the vehicle registration system whether the checking mechanisms are implemented and whether they insist on their application in daily practice without any exceptions. We are talking about an indirect, subjective factor influencing the final quality of VINs in information systems.
The most important vehicle register, which also serves as a reference for other information systems and related processes, is the national vehicle register. This register exists in every country and is usually under the responsibility of the Ministry of Transport (or other similar institution), exceptionally under the Ministry of the Interior. As this is the national reference register from which information on the vehicle (its owners, operators, technical condition, etc.) is taken, the quality of the VIN must be absolutely perfect.
Before its actual entry, the VIN can be entered into the registry database, a number of logical checks can be carried out automatically to confirm that the registered vehicle is in order-not stolen, not searched for, etc.
In the EU countries, when entering a VIN in the national vehicle register, information is checked in particular in: • The national, Police records of stolen vehicles (see, Figure 5, ref1); • International, Police Schengen vehicle registration (ref2); • International Police records of stolen vehicles of the EU member states (ref3); • National vehicle registers of the EU member states using the EUCARIS interface (ref4).
Searching for a vehicle via its VIN in police records has one basic specificity that we must always keep in mind: the fact that we cannot find the vehicle we are looking for in police records (national and international) on the basis of its VIN does not mean that the vehicle is OK! We must take these facts into account: (1) The vehicle owner (who is on holiday by air, for example) has not yet discovered that his vehicle has been stolen and; therefore, could not report the loss to the Police. The vehicle might not have entered the Police's search systems.
(2) The perpetrator, a well-organised gang, transported the stolen vehicle from one country to another and registered it abroad within a very short time (hours). Organised gangs are quicker than Police processes, so the transfer of information about a stolen vehicle among any national and international search records will not take place in time.
(3) The perpetrator deliberately changes the vehicle identity (its VIN) or the user sending the VIN for checking in Police (and other IS) makes an error in the description, artificially, inadvertently creates a VIN of another vehicle, so that he changes the vehicle identity and in response receives information about a completely different vehicle.
(4) The vehicle owner is part of an organised criminal group. The owner sells his vehicle abroad himself or through intermediaries, where he waits for the new owner to register it in a national vehicle register. The reference checking mechanisms in the Police information systems do not work because the vehicle is not yet reported as stolen. It is only after the vehicle has been successfully registered abroad that the perpetrator reports the vehicle as stolen in their home country and fraudulently obtains the insurance amount from the insurance company.
In all of the aforementioned cases, a query to Police information systems results in the erroneous information that the vehicle is not searched for and has not been stolen.

The issue of the complexity of interconnected information systems
Today's era is characterised by a very dynamic exchange of data/information that is essential for correct and timely decision making. Vehicle data is stored in various information systems, so that it is necessary to link diverse information systems to obtain a comprehensive picture of the overall situation ( Figure 6). In the case of vehicles, the linking key is the VIN. The VIN is globally unique and is physically located on the vehicle at several standardised locations, so that it is always possible to link the physical identity of the vehicle to its identity in the information systems. The quality of the VIN entry in each information system separately then determines the searchability of all the interlinked information systems.

What is the quality (error rate) of the VIN in the information systems? Baseline study data
This was a fundamental question that was one of the main objectives of the research conducted.
The research was carried out on a data sample of 4 million vehicles from the vehicle inspection register managed by the Ministry of Transport of the Czech Republic. This system is characterised by the fact that no checks are carried out when a vehicle roadworthiness test is entered. The VIN is copied manually from the vehicle document presented for technical inspection, without any check in the information system against any previous inspections of the same vehicle, without any technical means of obtaining the VIN from the vehicle or its documents. Nor is there any check of the existence of the vehicle being checked against the national vehicle register. There is also no check on the formal, logical structure of the VIN (length of the VIN string, prohibited characters), no calculation of the check digit either.

Data analysis-Determining the quality (error rate) of VINs in information systems
Using the VINdecoder application (VINexpert), every record of the 4 million record sample set was batch analysed. Each VIN of all the vehicles in the Czech Vehicle Inspection Register was examined and evaluated for its decodability and checked for correctness. If any decoding was incorrect, the causes were sought. The final report featured the statuses for every record listed in Table 1.

Discussion
We live in the age of modern information technology. Therefore, it should be fully assumed that we will work with accurate, correct, up-to-date data, which is stored in information systems for various purposes-production, administrative, control, public transparency and efficiency, health and property protection, security, etc. It should also be assumed that data processing, including its acquisition (primary entry) into information systems in the public administration will be supported by automated processes, modern technological means, which are already in common use in our everyday practice.
However, this is not the case in practice in the performance of state agencies (registers) working mainly with motor vehicles. There are objective and subjective reasons for this. The issue of vehicle registration as such is very specific in that there are a number of diverse manufacturers creating completely new products and global standardisation in some areas (also related to registration practice) is not sufficiently flexible and at the same time not consistently observed. On the one hand, there is technological, major globalisation in terms of technical or consumer aspects; on the other hand, the relevant legislation or standardisation is also intended to be global (or at least pan-European), but its implementation is delayed by many years due to the adoption and implementation of European directives and regulations in national legislation.
In practice, the basic purposes of any systematic creation of various records and registers are often forgotten. In the past, every object entered into information systems was usually physically checked to ensure the quality of the information, especially key identifiers and other important characteristics.
The current registration processes, specifically in the case of motor vehicles, are only a "paper" matter as they are formally separated from each other. Vehicle registration is based only on the documents submitted, and there is a relatively large margin for error or even deliberate manipulation. In other words, one cannot, for example, technologically read the VIN by optically scanning its physical stamping from the vehicle body, read the VIN digitally from the vehicle control units or scan the barcode. In the registration process, the vehicle is not physically present at the place of registration. This issue can theoretically be solved by carrying out quality technical inspections and vehicle originality checks that physically take place elsewhere and at different times. Unfortunately, even here the potential of opto-electronic or electronic (digital) technologies that are naturally available cannot be effectively used, because there is no standardised support for the uniform use of barcode or other technologies for recording VINs on vehicles, reading digital VINs from vehicles by manufacturers (Kubjatko et al., 2018). Not everyone uses barcodes or QR codes. There is no uniform device for reading (extracting) digital VINs from vehicle control units today that is capable of reading these values in general from all models that are at least simultaneously produced. For every manufacturer (manufacturing concern), it is necessary to have its proprietary technology available, which is not possible in independent inspection practice.
The basic research results obtained, presented in Tables 1, 2 and 3 correspond to the practice of manual data acquisition and, at the same time, the lack of understanding of the seriousness of the vehicle identification issue in the design of the information system. The analysis shows that within the VIN item 8.79 % percent of records do not correspond to ISO standards imposed on this identifier. In other words, the error rate for the key identifier VIN is almost 9 %; i.e., one in 11 vehicles is problematic in terms of its unambiguous identification. The length of the VIN string is less than or greater than 17 characters.

8
Incorrect VIN length. The VIN contains prohibited characters as well.
The length of the VIN string is less than or greater than 17 characters.

Source: Authors
A closer analysis reveals that 5.48 % (4.15 + 0.04 + 1.29; see 3) of all the VINs are incorrect. This is due to the incorrect identifier length (different from the 17 standard characters) and the use of prohibited characters O, Q, and I. These characters must not be used in the VIN structure in order to avoid optical confusion of character pairs such as 0-O, 0-Q, I-J, I-1, etc., because then the object of interest cannot be found correctly in the search. The analysis also shows (see , Table 3), that in practice, data is entered in the VIN entry, items which have a completely different predictive value and certainly do not belong in the VIN entry. Clerks enter various official numbers, file marks. In numerous cases, this includes shortening (front or back) the VIN, usually to only 6-8 positions, because the official thinks that this sequence (reminiscent of the pre-1986 body serial number entries) is sufficient to identify a vehicle. This issue is trivially solvable at the level of information system design because it is sufficient to check the length of the VIN identifier for 17 positions and for the forbidden characters O, Q, and J. Records that do not meet these criteria must be brought to the attention of the information system operator and such records must not normally be entered into the computer database. This Source: Authors type of error is of an objective nature (incorrect design of the functionality of the information system) and can be corrected retrospectively at minimal cost so that further errors do not occur.
The analysis also shows that an additional 3.31 % of all the VINs are erroneous, and the errors are due to human factors (Matuszak et al., 2017;Tušer & Hoskova-Mayerova, 2020;Tušer & Hošková-Mayerová, 2020) in particular fatigue, inattention of an unintentional nature and possible fraudulent behaviour (Kurilovska & Hajdukova, 2021) to change the vehicle identity (Van de Weijer Sga et al., 2019). A single character of the 17-digit VIN can be mistyped or misspelled and a new, "artificial" or fictitious VIN is created, which either does not formally exist or belongs to a completely different vehicle. These errors can only be eliminated by using the check digit mechanism in the VIN and/or by checking the inserted VIN using so-called VINdecoders which check the VIN structure. As such, the VIN check digit mechanism only works in full if vehicle manufacturers in a given country are legally obliged to have this mechanism built into the vehicles they sell. An example is the USA. In Europe and other continents, there it is then necessary to use suitable VIN decoders that operate in real time. Table 4 shows the analysis of errors in VINs in relation to their avoidance in information systems. As practice shows, if we check the length of the VIN to 17 characters when entering a VIN entry using trivial computer algorithms and, at the same time, prevent the entry of prohibited characters (letters O, Q, I, and all the special characters such as %,_, ´,*, etc.), we solve 56.64% of the percentage of all the error occurrence. The remaining errors in the VIN are caused by the user's inattention during enrolment through changes in the logical structure of the VIN given by the vehicle manufacturer, which cannot be checked by simple algorithms. To implement this, we need to have a reference, knowledge-based database of all the permissible VIN combinations for all the vehicle models, which can then be used to find errors. This kind of error can be detected and eliminated by applications like VINdecoder. Checks on the formal VIN structure alone will not help, because every vehicle manufacturer has its own internal specific VIN structure that is different from other manufacturers. The VIN cannot be treated, for example, as a personal social security number, which is always structurally identical in one country.
What are the practical possibilities to prevent errors in the VIN identifier in practice when manually copying the VIN into the information system? One extra "T" character is entered in the VIN entry when typing.

TNK52012024
The user or clerk writes only the first part of the VIN, not the entire 17-character string. UU2TAQB02768 Rak et al., Cogent Engineering (2022) When entering a VIN entry into the information system manually, i.e., by manual transcription from documents, there are basically 3 data quality control mechanisms:

Source: Authors
(1) Use of digit mechanism control in the VIN; (2) Formal control in the form field when entering the VIN for its length and the presence of prohibited characters; (3) Use of a VINdecoder type application to control the logical structure of the VIN.

Use of digit mechanism control in the VIN
ISO standards give the option for manufacturers to insert a digit mechanism control into the VIN structure for every vehicle model. If the manufacturer does so, a simple, standard calculation can be used to check whether the VIN entered into the information system is correct. If the calculation is not correct, the VIN is incorrect. The user then has to recheck the vehicle document to see if they have copied the VIN correctly (and correct their mistake) or check the originality of the document (whether illegal changes have been made). If the controls are negative, it is necessary to physically, visually inspect the vehicle or perform a forensic examination. Until now, this is the theory.
But the problem is that in the European market, vehicle manufacturers are not obliged to use a VIN with a control digit. According to analyses resulting from our research, only 40% of all vehicles (including trailers, semi-trailers, motorcycles, etc.) on the European market have a control digit in the VIN. Unfortunately, without detailed knowledge, we do not know which vehicles have a VIN with a control digit and which do not. It is, therefore, not possible to automatically run the control calculation on all the vehicles. Otherwise, we will get incorrect results. In order to run the control digit automatically, we have to use a VINdecoder type application, which has the information that the vehicle type has a control digit in its VIN and only in this case it is possible to start a completely correct calculation on the control digit. This is the situation in Europe, and on all the continents, except for the USA and Canada.
In the USA and Canada the situation is much more favourable for ensuring the quality of VIN data in information systems. There is relevant national, very strong legislation that mandates all vehicle manufacturers and importers on the North American continent to use VIN digit mechanism control. A vehicle cannot be sold if this is not complied with. This means that any manufacturer or importer of a vehicle into this market must have digit mechanism control built into the structure of the vehicle being manufactured or imported. It is then possible, under any circumstances, and whenever a VIN is entered into the information system (vehicle register), to trigger the control calculation completely automatically and thus absolutely rule out the introduction of an incorrect VIN into the relevant database. In practice, European and Asian vehicle manufacturers have an interesting paradox: vehicles delivered to the North American continent have different VIN structures! One structure has a control digit in the VIN, the other does not, for every model produced. The exception is vehicles where visionary European manufacturers (BMW, Audi, Škoda, Renault) use a VIN with a control digit for all world markets equally.

Formal check in the form field when entering the VIN for its length and the presence of prohibited characters
This mechanism for preventing errors in VINs in information systems is trivial and does not involve any cost. It is sufficient to check the length of the VIN (17 characters) and the existence of prohibited characters at the level of the form field from which the VIN is entered into the information system after it has been copied by the user. Unfortunately, in many information systems this basic control is missing. With this mechanism we can prevent 56.64% of errors, which is very high efficiency.

Use of a VINdecoder type application to control the logical structure of the VIN
When a VIN is written, the user can change the logic of the VIN structure, which is determined by its structure defined by the manufacturer, by mistyping a character (see, Figures 3 and 8). This type of error accounts for 43.46% of all VIN errors. In order to detect this type of error, we need to make a comparison with a reference data source, a knowledge base. And these are applications like VINdecoder.

Other possibilities to prevent errors in VIN-digital input to IS
The VIN has been stored digitally in vehicle control units for over 10 years and can be read by an OBD reader and directly transferred to the relevant information system. In order to use this technology, 2 prerequisites must be met.
(1) The existence of an OBD reader capable of reading the digiVIN for all factory brands (which is currently still being addressed); (2) A suitable opportunity to physically inspect the vehicle when the reading can be made. The current registration practice is no longer linked to physical inspection of the vehicle as in the past, and registration in the vehicle register is essentially based on "presented paper documents", which poses the risk of the just addressed VIN errors and also the risk of fraudulent actions (putting into service a stolen vehicle with an altered VIN identity).
In the European Union the possibility of taking vehicle information (of a technical, administrative and administrative nature), including the digital VIN, from reference databases directly from vehicle manufacturers or institutions responsible for their homologation for the European market, based on the existence of CoC (Certificate of Conformity) documents, is now being addressed in the vehicle registration process. The vehicle's registration data would no longer be recorded, but the registered vehicle would first be traced in the relevant database and all its data would be automatically transferred to the national vehicle register. However, quality checks on the data in these primary reference databases are a prerequisite.

Transferability of knowledge gained about the occurrence of errors in the VIN to other applications
To analyse the VIN error rate, we used a Czech database with data on 4 million vehicle technical inspections. This database was freely accessible, and the number of records was ideal for our purposes, as was the fact that there are no mechanisms to detect and prevent VIN errors.
The fundamental question is whether the results are transferable to other records or vehicle registers, and whether this is also relevant for foreign systems. Another interesting fact is that in professional discussions with our foreign colleagues we often encounter the opinion that there cannot be errors in their IS! After all, these are national vehicle registers, and errors are not admitted there! Errors certainly do exist, but the question is what value they have. Manual human work still prevails in the registration practice, and the laws of physiological fatigue and inattention are certainly at work there.
In the past, we have compared error rates in foreign systems using insights from the EUCARIS 4 CBE (Cross Border Exchange) data exchange module, which provides a European data exchange for

Source: Roman Rak
cross-country traffic solutions. The error rate there was 3-5%. However, the analysis was only performed on a few thousand record samples, which in our view is not sufficient, and it is desirable to repeat these studies in detail on a much larger data base in the future. At the same time, in the years 2010-2020, we performed analyses of VIN error rates in the databases of Czech and Slovak insurance companies such as Allianze, Generali, Uniqua, Kooperativa, Direct pojišt'ovna, etc. The studies were conducted on the provided data pools of tens of thousands of records. There, the resulting VIN error rates ranged from 6-11%, depending on the size of the insurance company and its presence in international markets. For bigger insurers, the error rate was lower, while for purely national insurers the VIN error rate was higher. In these cases, data quality was not the insurer's priority; winning clients was the key. Based on our experience there is a hypothesis that the highest data quality is in national vehicle registries under governmental control, while the quality is usually worse in commercial entities. There, numerous factors matter, including registration processes, interconnection of data pools between different entities, controls on information exchange, etc. The quality of VIN data is usually lower in police information (especially search) systems than in national registers. There is a huge potential for further research on large amounts of data from different information sources of different entities (state institutions, commercial entities, police and other security forces) and different countries.

Source: Roman Rak
For the VIN analysis we used a VINdecoder with the commercial name VINexpert. In our search for a suitable VINdecoder, we tried several dozen different free applications, but found that each of these applications is primarily usually only strong in decoding certain factory brands, and only for passenger vehicles. Many of the applications were created by just one person, and the data is not officially guaranteed in any way. For the purpose of the tests, we were looking for a VINdecoder that can not only decode the VIN for passenger vehicles, but also for all other vehicle categories Source: Roman Rak found in the national vehicle registers (trucks, vans, buses, motorcycle, trailers and semi-trailers, tractors, and work machines). Only the VINexpert application met this criterion.
When evaluating the outputs of this paper, it is important to note that this issue has not yet been addressed in detail anywhere; there are inevitably a number of open questions that only the future will provide answers to. Addressing this issue is also linked to the possibility of obtaining a sufficiently large database of VINs from real systems for further research, which is often coupled with strong national legislative constraints to obtain them.

Conclusions
The correctness and factual correctness of key object identifiers (e.g., motor vehicles) is one of the basic prerequisites for the functionality of any information system (Raghavan, 2013). If this unique identifier is not correct (it is incorrect), then it is impossible to find an unambiguous result in any record (computer database) by a simple, single query (Kolitschova & Kerbic, 2018).
Is the 9% error rate in a key (vehicle) register acceptable? From our professional point of view, it is not. We must also be aware that information systems are today interconnected, especially the state ones (e.g., the vehicle register administered by the Ministry of Transport and the register of stolen or interest vehicles maintained by the Police). It can be assumed that the error rate in the VIN identifier exists objectively in all the information systems and is approximately the same. Thus, if, for example, two information systems are linked together by a VIN, the probability that the link will not occur is then twice as high, i.e., 18 %. The real world of linking records containing motor vehicles is very complex, see, Figure 6. But today, the key national registers of state and public administrations are no longer just a static matter for internal needs to "record something", but must also serve in an active mode for quick solutions, such as security threats of various natures. An example is the pan-European eCALL project to provide online information about a vehicle in distress (e.g., in case of a crash) from the vehicle register to the Integrated Rescue System forces for conducting rescue operations. The key identifier for the link is the VIN (Figure 7). If there is an error in it, it may mean that rescuers do not get the necessary information about the vehicle, its technical characteristics in time for their work, which may fundamentally affect the technology of intervention and; therefore, in certain cases endanger the health or lives of the accident participants (Mendiboure, 2022;Moravcik & Jaskiewicz, 2018). Similarly, counter-terrorism forces may not obtain information about the vehicle and its owner at critical moments Mostafa, 2019).
If we can eliminate trivial errors in the VIN caused by inaccurate entry of its length or the presence of the O, Q, and J forbidden characters in the VIN (see Figure 9), the error rate still remains 3-5 %. This is due to human factors (inattention, fatigue, intent-fraud by which the vehicle identity is changed, etc.). Based on our research , the 3-5 % error rate generally applies to all the European countries where additional sophisticated checks on the formal and content accuracy of the VIN by decoding it using "VINdecoder" applications are not in place, and where the calculation using the check digit in the VIN cannot be simply applied. It has been noted that the error rate for private entities (banks, insurance companies, leasing companies, etc.) is significantly greater than in the government IS. In order to eliminate this type of error rate, three basic procedures can be recommended for the acquisition of data (VIN identifiers) into the information systems of the public administration: taking data directly from vehicle manufacturers in electronic form; multiple verification of vehicle identity between different information systems; and systematic use of the VINdecoder that checks the online VIN when it enters the information systems. The error rate of 5% is still high, because it doubles every time two information systems are linked, and this means in practice that one in ten vehicles is practically, unequivocally unidentifiable! This is unacceptable for critical infrastructure information systems and must be addressed satisfactorily.