Medical Image Database

Data collection and database generation is one of the most critical components of algorithm development and one that raises most criticisms and concerns in the scientific community. Datasets are the basis for initial design, training, and testing of the algorithms and together with the analysis tools are key in any validation effort. Given the attention they usually receive and the controversy they sometimes stir, we will review here some of the most important aspects of medical image databases.

Medical imaging data are usually collected retrospectively from completed patient files. Data collection and documentation involves significant amounts of time and effort that is often underestimated. The general guidelines followed in the generation of a database for the particular application of pancreatic cancer are as follows [5, 58-61]:

1. Collect both image and nonimage data and generate complete cases.

2. Review imaging, clinical, and demographic characteristics of the disease and ensure representation of the majority of case types. If this is not possible, prioritize and focus on selected groups that define the most important clinical problems.

3. Review clinical records of institution to ensure availability and adequacy.

4. Define desired number of cases in overall dataset as well as in subsets needed to address specific problems in addition to the main goal of the effort. Number of cases depends on the training requirements of the selected methodology and the statistical power needs of clinical evaluation studies such as the receiver operating characteristic experiments.

5. Digitize films from analog modalities at the highest possible resolution (spatial and dynamic) and reduce, if necessary, using mathematical interpolation.

6. Generate ground truth information preferably based on pathology information. If this is not possible, use medical experts properly screened to define and outline ground truth on the selected images. Although this approach is subject to high inter- and intraobserver variability, it is often the only possible option. Hence, it is critical for the researcher to develop standardized methods for ground truth file generation, same for all experts, and take any step to eliminate external factors of variability. It is also recommended that all experts' opinions are used in validation instead of the most often occurring response, the union, or overlap of opinions. For computer applications the generation of ground truth information in electronic form is highly desirable and it usually contains outlines of the areas of interest drawn by one or more experts that provides information on the type, location, and size of the area. Electronic ground truth files are discussed in more detail below.

7. Define validation criteria, namely what will be considered as true positive (TP), false positive (FP), true negative (TN), and false negative (FN) for a segmentation or classification outcome. Segmentation validation is usually more demanding and cumbersome process. The existence of specific conventions and consistent criteria in the evaluation of segmentation results is often more important than the variability in ground truth information provided by experts.

8. Collect all imaging information and imaging parameters associated with selected cases.

9. Obtain all available reports, e.g., radiology, pathology, clinical reports, that can assist the researcher in case documentation and evaluation of the database contents.

For our development and preliminary study, data were collected retrospectively from the patient files of the H. Lee Moffitt Cancer Center & Research Institute. Approximately 100 patients undergo a pancreatic CT exam annually at the center. About 2/3 of these patients are diagnosed with pancreatic cancer and about 1/3 with a benign pancreatic mass or cyst. Abdominal scans are also performed for staging patients diagnosed with other cancer types, e.g., breast cancer, that may turn out to be negative for metastatic disease or any disease. Figure 4.9 shows a database design for pancreatic cancer imaging applications.

Location of Tumor on the Pancreas

Figure 4.9: Image database design for pancreatic cancer research and CAD development.

Location of Tumor on the Pancreas

Figure 4.9: Image database design for pancreatic cancer research and CAD development.

The contents of the database, e.g., numbers X, Y, and Z, are determined based on (a) the aims of the project, (b) the clinical characteristics of the pancreatic cancer and benign pancreatic masses, (c) the disease statistics, (d) the demographic characteristics both nationally and locally, (e) the imaging protocols implemented at the Institution, and (f) the requirements of the algorithm design as discussed earlier. Imaging protocols and surveillance procedures may differ among institutions and, hence, CAD goals may differ to accommodate specific clinical practices and requirements. HLMCC's imaging protocol for abdominal helical CT scans of patients diagnosed with or suspected of pancreatic cancer includes three imaging series:

• Series #1: An initial abdominal scan is done with a relatively thick slice (8-10 mm) prior to the administration of contrast material; approximately 5 slices from this series contain information of the pancreas.

• Series #2: An enhanced abdominal scan follows with the same slice thickness as in Series #1 shortly after the intravenous administration of contrast material (a second enhanced scan after a short period of time may also be acquired if requested by the physician). Similar to the first series, approximately 5 slices in this series contain information on the pancreas.

• Series #3: A high-resolution scan of the pancreas at a 4 mm or smaller slice thickness. This scan is not routinely performed and depends on the patient and the physician. This series consists of about 10 slices through the pancreas.

• Series #4: A renal delay scan that acquires images through the kidneys only. This series includes partial information on the pancreas.

Series #1 and #4 are not likely to be of value at least in the initial algorithm development because pancreatic tumors are clinically evaluated in contrast enhanced scans, i.e., Series #2, and insufficient information is present in Series #4.

In addition to the CT images and imaging parameters, the following information was also collected or generated: (a) radiology reports, (b) pathology reports, (c) demographic information, (d) other nonimage information including lab tests, and (e) electronic ground truth files. All data were entered in a relational database that links image and nonimage information. All patient identifiers were removed prior to any research and processing to meet confidentiality requirements.

10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook


Post a comment