The Systematized Nomenclature of Medicine, Clinical Terms (SNOMED-CT) is a standard vocabulary for diseases, symptoms, signs, specimen types, living organisms, procedures (includes diagnostic, surgical, and nursing procedures), chemicals (includes biological chemicals such as enzymes and proteins and compounds used in drug preparations), drugs, anatomy, physiological processes and functions, occupations, and social contexts (e.g., religion and ethnic group). SNOMED-CT has codes for more than 360,000 concepts in human and veterinary medicine. SNOMED-CT provides significant coverage of the data elements in Table 32.2.
SNOMED International created SNOMED-CT by merging an earlier version of SNOMED called SNOMED-RT with the Read Codes, a similar standard vocabulary developed in the United Kingdom. SNOMED-CT codes for veterinary medicine supersede the now obsolete Standard Nomenclature of Veterinary Diseases and Operations (SNOVDO).
SNOMED International has an editorial board composed of representatives from several stakeholder groups, including physicians, nurses, and medical informatics specialists. No license fee is necessary to use SNOMED-CT in the United States or the United Kingdom because the governments of those countries have contracted with SNOMED International to make SNOMED-CT freely available (SNOMED International, 2004).
It will still be some time, however, before any clinical information systems send SNOMED-CT encoded data because SNOMED-CT is a relatively new terminology (it debuted in 2002) and became available for free in the United States only in 2004. Vendors are just beginning to incorporate it into their products, either to support new functionality or to replace of nonstandard vocabularies and standard vocabularies such as ICD-9-CM that are insufficient for clinical use. Once incorporated in new versions of vendors' products, SNOMED-CT encoded data will still not be available from healthcare organizations for biosurveillance purposes until those organizations buy the new products and perhaps do substantial work internally to convert proprietary codes or free text into SNOMED-CT codes. Finally, as the developers of SNOMED-CT have recognized, SNOMED-CT by itself may be insufficient for the entry of diagnoses by clinicians into clinical information systems (Spackman et al., 1997). For example, they note that clinical information systems may need lists of frequently used diagnoses and abbreviations to facilitate input (Spackman et al., 1997). Because the interim period until healthcare organizations widely implement SNOMED-CT may be a decade or longer, developers of biosurveillance systems will face a choice of waiting or creating adaptors that map proprietary codes (or parse free text) to SNOMED-CT codes for diagnoses, signs, symptoms, anatomy, and living organisms such as bacteria and viruses. As with mapping proprietary laboratory test codes to LOINC, mapping proprietary codes for organisms to SNOMED-CT is resource intensive, requiring personnel with expertise with SNOMED-CT and personnel familiar with the laboratory's testing procedures and its use of proprietary codes or free text.
We discuss HL7 as a grammar in the next section. Here we discuss standard vocabularies developed by the HL7 organization. HL7 has defined nearly 300 tables of codes that have become de facto vocabulary standards for encoding such data elements as a patient's gender, marital status, race, religion, and the order status and specimen source of laboratory tests. For the most part, these codes do not overlap with SNOMED-CT, although there is overlap of codes for specimen source. Given the widespread adoption of HL7 by healthcare information technology vendors, it is likely that the specimen source code set of HL7 will be the de facto standard for some time.
Clinical information systems that send data in HL7 format are likely to use HL7 codes for those data elements for which an HL7 table exists. However, because of the optionality allowed by HL7 even for basic data elements such as the gender of a patient, the encodings may exhibit variability from hospital to hospital. In our experience, registration systems in hospitals use the HL7 codes "M" (male) and "F'' (female) in the "administrative sex'' field of HL7 messages consistently, but these systems inconsistently use codes for other "genders,'' using a blank field versus "U'' (unknown),"O'' (other),"A'' (ambiguous), and
"N'' (not applicable). The inconsistent use of this field, with just six possible values in HL7 2.x standards, highlights the option-ality in HL7 that leads to language incompatibilities even among systems that use the HL7 standards.
The International Classification of Diseases (ICD) is a standard vocabulary for diseases, health status, types of patient visits to doctors and other health providers, and external causes of injuries. The purpose of ICD is to enable comparison of causes of death among nations. The latest version is ICD-10 (released in 1989), which the United States has used since 1999 to report mortality statistics to the World Health Organization.
The World Health Organization creates and maintains ICD (since 1948), but countries modify ICD for other purposes. In the United States, the National Center for Health Statistics (NCHS) created a "clinical modification'' of ICD-9 (ICD-9-CM) in 1979 for the purpose of coding diagnoses and procedures when submitting health care insurance claims to the government. Private payers have since also adopted ICD-9-CM for the same purpose.
There are two major parts of ICD-9-CM: a classification of diseases and a classification of procedures. The NCHS maintains the classification of diseases, and the Center for Medicare and Medicaid Services (CMS) maintains the classification of procedures. NCHS and CMS have added and deleted codes every year since 1986.
You may encounter ICD-9-CM-encoded diagnoses and procedures when obtaining data from hospital information systems. Healthcare providers (e.g., physicians and hospitals) encode diagnoses and procedures (hospitals only) with ICD-9-CM for the purpose of billing. Because providers do not get paid if they do not use standard billing vocabularies to submit claims, ICD-9-CM is a widely implemented vocabulary for the purposes of billing and reimbursement.
If you encounter nonstandard diagnosis or procedure codes (or free text), then you should convert them to SNOMED-CT and not ICD-9-CM. The reason is that SNOMED-CT is the emerging standard for encoding diseases, symptoms, and signs for purposes other than billing and reimbursement. If you encounter a mix of proprietary codes and ICD-9-CM, you can convert the ICD-9-CM codes to SNOMED-CT codes by using an adaptor created by the Unified Medical Language System (UMLS, which we discuss below). The Unified Medial Language System also provides a version of ICD-9-CM that computers can use; the NCHS releases ICD-9-CM in a text document from which it is extremely difficult to create automatically a database of ICD-9-CM codes.
Current Procedural Terminology (CPT) is a standard vocabulary for surgical procedures, minor procedures that physicians perform in the office, radiology tests, and a small number of laboratory tests (approximately 1,000). Whereas hospitals use ICD-9-CM for billing, physicians use CPT to bill for their services. Thus, CPT covers laboratory tests that physicians and/or their staff perform in office settings.
The American Medical Association (AMA) created the first version of CPT in 1966 and until 1984 released new versions every 4 years. Since 1984 it has released a new version annually. CPT requires a license fee for its use.
Because the purpose of CPT is billing, distinctions among codes often relate to the level of effort typically required to perform a procedure. For example, codes 11620 through 11624 and 11626 (six codes total) all refer to Excision, malignant lesion, except skin tag (unless listed elsewhere), scalp, neck, hands, feet, genitalia. The difference is that the codes refer to different size lesions; presumably larger lesions require more effort to remove and thus provide greater reimbursement.
You may encounter CPT-encoded procedures when obtaining claims data. If you are building or purchasing an adaptor, it should map proprietary laboratory test codes to LOINC, as LOINC is the standard for laboratory test codes. The LOINC committee, with the support of the AMA, is creating a mapping from CPT laboratory test codes to LOINC with funding from the National Library of Medicine (NLM) (Anonymous, 2004).
As mentioned at the beginning of this chapter, the UPC standard and its successor standard, the GTIN system, are standard vocabularies for manufactured products. GTIN facilitated building the NRDM because virtually every manufacturer and retailer has implemented GTINs in their systems. Every product sold—including over-the-counter healthcare products—in every major retail chain has a GTIN barcode that enables scanning of products at checkout.
The Uniform Code Council (UCC) and EAN International (the European counterpart to the UCC) created the GTIN system by merging the UPC and European Article Numbering (EAN) systems. They have also merged organizationally, forming a standards-developing organization of global scope called GS1. The UCC is a member organization of GS1 and oversees the creation and maintenance of GTINs in the United States. Similarly, other countries also have organizations that are members of GS1 and assign GTINs in their respective countries.
Each manufacturer in the United States who wishes to print a GTIN barcode on its products must first pay a fee to join the UCC. The UCC then assigns the manufacturer its own manufacturer code. The manufacturer then assigns unique product codes to each product and creates GTINs by concatenating its manufacturer code with the product codes and computing a check digit.
GTINs uniquely identify products down to the level of packaging, thus there is a different GTIN for different flavors of the otherwise same cough syrup and for two-bottle packs of cough syrup versus single bottles. Because GTINs are so specific, their use in biosurveillance requires grouping them into categories that are meaningful for surveillance. For example, the NRDM has pediatric electrolyte, antidiarrheal, and pediatric cough syrup categories.
Unlike the other vocabularies we discuss here, no easily accessible database of GTINs exists. You must purchase a listing of GTINs from organizations such as AC Nielsen that analyze market share of the retail industry and thus collect data on GTINs. Even then, some GTINs will be missing because some retailers do not provide data to companies such as AC Nielsen, and thus the GTINs of products sold only by those retailers will not appear. This situation occurs, for example, when a manufacturer creates special packaging of its products for a single retail chain, resulting in the assignment of a different GTIN.
Most retailers have converted from UPCs to GTINs already. However, should you encounter both UPCs and GTINs in your work, the conversion of a UPC to a GTIN is simple: pad the UPC with two leading zeros (Uniform Code Council, 2004).
The National Drug Code (NDC) directory is a vocabulary for prescription drug products in the United States. Other countries have similar vocabularies for prescription drugs (e.g., the Canadian Drug Identification Number). NDCs are similar to GTINs in that the code identifies products down to the level of the packaging. Thus a 50-tablet bottle of Cipro XR 500 mg has NDC 0026-8889-50, whereas the 100-tablet bottle has NDC 0026-8889-51.
The U.S. Food and Drug Administration (FDA) has maintained the NDC Directory since 1968 (Anonymous, 1969).The Drug Listing Act of 1972 mandated that all manufacturers of prescription drugs register with the FDA and provide on an ongoing basis a listing of all the prescription drug products they manufacture. This listing must include the NDC for each drug.
The FDA releases NDCs on its Web site as 10-digit numbers, but automated pharmacies almost always use 11-digit NDCs.Thus, if you are going to use NDCs in a biosurveillance system, it is important to know these two different formats and how to convert between them. Ten-digit NDCs consist of a four- or five-digit labeler code (analogous to the GTIN manufacturer code), a three- or four-digit product code, and a one- or two-digit package code, leading to the following possible configurations of the 10 digits: 5-4-1,5-3-2, and 4-4-2, respectively. The de facto standard is the 11-digit NDC, where four-digit labeler codes, three-digit product codes, and one-digit package codes are padded with a leading zero to produce a uniform 5-4-2 configuration with no hyphens. For example, the two NDCs above for Cipro XR 500 mg translate into the following 11-digit NDCs, respectively: 00026888950 and 00026888951.
In addition to the issue of 10-digit versus 11-digit NDCs, the format of NDCs is likely to change. The FDA has plans to revise the NDC as part of its regulatory initiative to require barcodes on all prescription and some over-the-counter drug products (Food and Drug Administration, 2004). The FDA does not intend to require new NDCs for existing products, and thus current NDCs will continue to be valid. Similar to the UPC-to-GTIN conversion, however, converting existing NDCs to the new format might involve, for example, increasing the number of digits in an NDC.
If one or more of the components in your biosurveillance system is a pharmacy information system, it is likely that you will encounter NDCs. As with GTINs, however, NDCs specify products at a level of detail too specific for biosurveillance, and thus you will need to group NDCs into categories for monitoring for various public health events. No standard categories of NDCs exist for this purpose.
RxNorm is a second vocabulary for prescription drugs. RxNorm provides a set of codes for clinical drugs, which are the combination of active ingredients, dose form, and strength of a drug. For example, the RxNorm code for ciprofloxacin 500 mg 24-hour extended-release tablet (the generic name for Cipro XR 500 mg) is RX10359383, regardless of brand or packaging.
The NLM, in consultation with the FDA and the HL7 Vocabulary Technical Committee, created RxNorm in 2001 to facilitate development of electronic medical records.2 RxNorm does not require a license fee to use.
Although prescription drug information encoded with RxNorm would be far easier for a biosurveillance organization to process than are NDCs (unless it needed package-level information for trace back), RxNorm is a relatively new vocabulary that is not widely in use, and its purpose is managing clinical drug information in electronic health records, which are still relatively rare applications in hospitals (13%) and physician practices (14% to 28%). Vendors in general have not incorporated RxNorm into their products. Therefore, at present, developers of biosurveillance systems are unlikely to encounter systems that encode prescription drug information with RxNorm.
RxNorm is available as part of the UMLS (described next). If you need to use both NDCs and RxNorm, RxNorm includes a many-to-one mapping from NDCs to its clinical drug concepts, enabling you to build an adaptor that converts from NDC codes to RxNorm codes.
The UMLS is an amalgamation of preexisting vocabularies. It includes as "source'' vocabularies the code sets described here (except GTINs/UPCs and HL7 tables) and more than 80 other vocabularies.
The NLM created and maintains the UMLS to facilitate the development of computer systems that behave as if they "understand" the meaning of the language of biomedicine and health (National Library of Medicine, 2004). One must sign a license agreement to use the UMLS, but there is no fee.
With respect to biosurveillance, we mention the UMLS because (1) it is the vehicle by which the NLM is distributing SNOMED-CT for free use, (2) it is the distribution vehicle for RxNorm, (3) it contains a machine-readable version of ICD-9-CM, and (4) when vocabularies overlap in their coverage (such as the case in which two vocabularies both provide codes for diagnoses), it provides mappings among them. Future versions will soon include HL7 tables from 2.x versions of HL7 and vocabularies that are part of HL7 version 3.0. UMLS is a one-stop-shopping source for many vocabularies.
The Federal Information Processing Standards (FIPS) are vocabularies for geopolitical entities around the world, such as countries and their political subdivisions (e.g., states, provinces, and counties). FIPS 5-2 provides standard codes for the 50 states in the United States, the District of Columbia, and the various territories of the United States such as Puerto Rico and the Marshall Islands. FIPS 6-4 provides standard codes for the counties in each U.S. state and territory, and FIPS 10-4 provides standard codes for nations around the world and their principal administrative divisions (analogous to U.S. states).
In the United States, the National Institute of Standards and Technology (NIST) develops FIPS when no existing industry standard exists to meet the needs of federal government computer systems. NIST created FIPS 5-2 in 1987, FIPS 6-4 in 1990, and FIPS 10-4 in 1995. You can use FIPS for free.
Because biosurveillance data are inherently spatial, standard methods for identifying geographical entities such as counties and states is critical to representing biosurveillance data. FIPS standards provide a standard vocabulary that covers many geopolitical entities worldwide.
ISO 3166 is another standard vocabulary for countries and their subdivisions. ISO is short for the International
2 Because NDCs identify products down to the level of packaging, they are not suitable for use in physician order entry or decision support applications. The reason is that doctors prescribe, for example, "Cipro 500 milligrams three times a day.'' They do not specify whether the tablets come from a 15 tablet or a 20 tablet bottle for example, and pharmacies may substitute an equivalent generic or brand-name drug.
Organization for Standardization, the standards-developing organization that created and maintains ISO 3166.
There are three parts to ISO 3166: ISO 3166-1, ISO-3166-2, and ISO 3166-3. ISO 3166-1 contains codes for existing countries. ISO 3166-1 itself is composed of three parts: two-character country codes3, three-character country codes (not available for free), and three-digit numeric country codes (not available for free). ISO 3166-2 (not available for free) contains codes for political subdivisions of countries such as states (such as the 50 U.S. states) and provinces (such as Canadian provinces). ISO 3166-3 contains codes for countries no longer in existence, owing either to a name change4 or political changes such as subdivision or merging of countries5.
Despite the existence of FIPS, the defacto standard on the Internet is the two-character country codes that are part of ISO 3166. HL7, during the ongoing development of version 3.0, initially standardized on ISO 3166. However, they removed ISO 3166 because of concerns that ISO 3166 was not available for free. The ISO clarified its position, announced that its two-character country codes are free, and made these codes available for free download from its Web site. There has of yet been no final decision by HL7 with respect to which country-code standard it will specify in version 3.0 of HL7.
If you are developing a biosurveillance system, you will have to choose between FIPS 10-4 and ISO 3166-1 for countries and their political subdivisions. Given that the Internet has adopted ISO 3166-1 two-character country codes and that these codes are now freely available, it makes sense to use ISO 3166-1 codes, except when exchanging data with U.S. federal government systems that require FIPS 10-4. The FIPS 5-2 codes for the 50 states and other territories of the United States are the standard in the United States and are free. The remaining decision is whether to use FIPS 10-4 or ISO 3166-2 for the political subdivisions of countries besides the United States. We are unaware of any reason to believe that either one is widely used or more likely than the other to become standard. Thus, based on consideration of cost alone, FIPS 104 is the better choice.
Was this article helpful?