| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Full paper |
Departments of 1 Medical Physics and 2 Department of Radiology, School of Medicine, University of Patras, 265 00 Patras, Greece
Correspondence: Dr Lena Costaridou, Department of Medical Physics, School of Medicine, University of Patras, Patras 26 500, Greece. E-mail: costarid{at}upatras.gr
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Microcalcification (MC) clusters are considered a strong indicator of malignancy, and they appear in 30–50% of the mammographically diagnosed cases [7]. Computer-aided detection (CADe) systems for MCs have reported high performance, while the automated interpretation of MCs (computer-aided diagnosis, CADx) remains a challenging task [8, 9]. Difficulty in MC cluster interpretation is mainly due to their fuzzy nature and low contrast (difficulty in distinguishing MCs from their surroundings) [10].
A lot of research has focused on the development of algorithms for the automated classification of MCs. These algorithms are based either on morphology and distribution features of MCs extracted by radiologists [11–13] or on computer-extracted image features [14–27]. Two main categories of computer-extracted image features are used. The first category accounts for morphology/shape features of individual MCs or of MC clusters [14–25], while the second category corresponds to texture features extracted from regions of interest (ROIs) containing the MCs [16, 26–28]. While a review of the proposed CADx schemes can be found elsewhere [8, 10, 29], in the following paragraph, representative studies in terms of features used are provided.
Shen et al [14] developed shape features (compactness, moments and Fourier descriptors) of individual MCs, achieving 100% overall accuracy in the classification of 143 individual MCs. Jiang et al [15] used eight features of MC clusters in a neural network classifier, and achieved an area under the receiver operating characteristic (ROC) curve Az of 0.92 in a data set of 53 patients. Veldkamp et al [18] used cluster distribution, shape and location features for classification of MCs. A patient-based classification was performed by combining information from both views (mediolateral oblique (MLO) and craniocaudal (CC)), achieving an Az value of 0.83. Kallergi [23] used morphological features of individual MCs and MC clusters; when age was incorporated in his classification scheme, a high performance was achieved (Az of 0.98). Chan et al [16] developed morphological features of MCs as well as texture features (co-occurrence matrices based) extracted from ROIs containing the MCs; the combined morphological and texture features achieved an Az of 0.89, which increased to 0.93 when averaging discriminant scores from all views of the same cluster. Dhawan et al [26], following a texture analysis approach, used co-occurrence matrices and wavelet features extracted from ROIs containing the MCs and obtained an Az of 0.86 for the classification of 191 "difficult to diagnose" cases. Soltanian-Zadeh et al [27] compared the performance of four feature sets (co-occurrence matrices based, shape, wavelet and multiwavelet features); the multiwavelet features outperformed the other three feature sets, achieving an Az of 0.89.
The performance of the above CADx schemes is differentiated with respect to the features investigated, the classifiers used and the data sets analysed. The success of the morphological features-based schemes strongly depends on the robustness of the MC segmentation algorithms [8, 30, 31]. Specifically, in the case of dense breast parenchyma abutting the MCs, classification is a challenging task due to difficulty in the segmentation process.
The texture analysis approach seems to overcome this limitation as no segmentation stage is required. The rationale of using texture features is based on capturing changes in the texture of the tissue surrounding MCs. Most texture-based classification studies include MCs in the regions to be analysed further; however, this rationale is expected to introduce bias as the MC, a tiny deposit of calcium in breast tissue, can be neither malignant nor benign. The tissue surrounding or underlying the MC can be characterized as malignant or benign. This tissue is also the one subjected to pathoanatomical and immunochemistry analysis to derive a benign or malignant outcome.
To the authors' knowledge, there is only one study focused on texture analysis of the tissue surrounding MCs for breast cancer diagnosis [32]. This study used a data set of 54 digitally acquired images during stereotactic biopsy. The extracted textural features were based on co-occurrence matrices and fractal geometry, and classification was performed with linear and logistic discriminant analysis. They achieved a sensitivity of 89% with a specificity of 83%, validating the hypothesis that tissue surrounding MCs can be used for breast cancer diagnosis.
The current study investigates whether texture properties of the tissue surrounding MCs, as depicted on screening mammograms, can be used for breast cancer diagnosis, thus aiding radiologists in decisions concerning follow up and biopsy. The discriminatory power of four textural feature categories is investigated using a k-nearest neighbour (kNN) classifier. An additional classification scheme is performed by combining classification outputs of the three most discriminating feature categories. Classification performance is evaluated by means of ROC analysis.
| Methods and materials |
|---|
|
|
|---|
A medical visualization tool, developed in our department [33, 34], was used for the implementation of enhancement and segmentation procedures, while feature extraction and classification algorithms were implemented in Matlab (The MathWorks Inc., Natick, MA).
Case sample
The case sample consists of 85 mammographic images originating from the Digital Database for Screening Mammography (DDSM) [35], digitized with the LUMISYS 200 scanner at 12 bits pixel depth and 50 µm spatial resolution. The selected mammograms contain 100 MC clusters in total (46 benign and 54 malignant, according to database ground truth tables) and correspond to heterogeneously dense and extremely dense breast parenchyma (density 3 and 4 according to the American College of Radiology (ACR) BIRADS lexicon [36]). The DDSM database provides a malignancy assessment for each MC cluster, also according to the ACR BIRADS lexicon. The assessment ratings are encoded into numerical values from 1 to 5 in increasing order of their likelihood of malignancy: 1, negative; 2, benign; 3, probably benign; 4, suspicious abnormality; and 5, highly suggestive of malignancy. Figure 1
shows the distribution of the case sample with respect to malignancy rating.
|
|
In some cases, the segmentation procedure resulted in overestimation of MCs, as well as the inclusion of isolated pixels of high grey level value, corresponding to normal dense tissue. These isolated pixels were removed from the labelled MCs applying a size criterion. The use of a more robust segmentation technique was not deemed necessary for the aim of this study, as morphology analysis of individual MCs was not performed.
Texture analysis
Texture analysis was performed in a 128 x 128 pixel subregion of each ST-ROI (Figure 2d
), positioned to contain the cluster at its centre. For clusters larger than a single ROI, multiple ROIs (up to three ROIs with less than 30% overlap) were used to cover the entire cluster area. The texture feature values extracted from multiple ROIs, covering a large cluster area, were averaged.
In the case sample analysed, the average percentage of pixels, corresponding to MCs, excluded from each ST-ROI subregion (128 x 128 pixels) was 2.5%.
In this study, four categories of textural features were extracted: first order statistics (FOS); grey level co-occurrence matrices (GLCMs) features; grey level run length matrices (GLRLMs) features; and Laws texture energy measures (LTEMs). Prior to feature extraction, each 128 x 128 pixel subregion was stretched to a normalized grey level range of 0–255.
First order statistics
FOS provide different statistical properties (statistical moments) of the intensity histogram of an image [40]. They depend only on individual pixel values and not on the interaction or co-occurrence of neighbouring pixel values. In this study, four first order textural features were calculated: mean, standard deviation, kurtosis and skewness.
Grey level co-occurrence matrices features
The GLCM is a well-established robust statistical tool for extracting second order texture information from images [41, 42]. The GLCM characterizes the spatial distribution of grey levels in an image. Specifically, an element in the GLCM, Pd,
(i,j), represents the probability of occurrence of the pair of grey levels (i,j) separated by a distance d at direction
. In this study, four GLCMs were computed, corresponding to four different directions (
= 0°, 45°, 90° and 135°) and one distance (d = 1 pixel). 13 features were derived from each GLCM: angular second moment, entropy, contrast, local homogeneity, correlation, shade, prominence, variance, sum average, sum entropy, difference entropy, sum variance and difference variance. The mean and range of each feature over the four GLCMs were calculated, comprising a total of 26 GLCM features.
Grey level run length matrices features
The GLRLM provides information about the coarseness of image texture in specified directions [43]. A grey level run is a set of consecutive, collinear pixels (i.e. a pixel structure) in a given direction that have the same grey level value. The length of a run is the number of pixels in a run. Features extracted from GLRLM evaluate the distribution of small (short runs) or large (long runs) organized structures within the image. In this study, four GLRLMs were computed, corresponding to four different directions (0°, 45°, 90° and 135°). Five features were derived from each GLRLM: short runs emphasis (SRE), long runs emphasis (LRE), grey level non-uniformity (GLNU), run length non-uniformity (RLNU) and run percentage (RPERC). The mean and range of each feature over the four GLRLMs were calculated, comprising a total of 10 GLRLM features.
Laws' texture energy measures
Textural features were extracted based on the method proposed by Laws [44]. According to this approach, textural features are extracted from images that had previously been filtered by each of the 25 Laws' masks or kernels. Five one-dimensional operators (L5 = [1 4 6 4 1], E5 = [–1 –2 0 2 1], S5 = [–1 0 2 0 –1], R5 = [1 –4 6 –4 1] and W5 = [–1 2 0 –2 1]) are used for generation of the 25 Laws' masks. Specifically, each mask is generated by convolving a vertical one-dimensional operator with a horizontal one-dimensional operator. The filtered images are characterized as texture energy images (TE images). Averaging the TE images corresponding to symmetrical kernels (such as R5L5 and L5R5), and taking into account that 20 out of 25 kernels are symmetric, 15 TR images were produced (R stands for "rotational invariance"). From each of the 15 TR images, five first order statistics (mean, standard deviation, range, skewness and kurtosis) were computed (i.e. five LTEM subcategories, each one containing 15 features), giving 75 LTEMs in total.
The extracted textural features of the four aforementioned feature categories were normalized to zero mean and unit standard deviation [45] and subsequently used for classification.
The typical computational time required to extract texture features from one ST-ROI subregion was 0.07 s for FOS, 4.65 s for GLCMs, 3.50 s for GLRLMs and 2.95 s for LTEMs, using a Pentium IV processor running at 3 GHz.
Classification of tissue surrounding microcalcifications
A k-nearest neighbour (kNN) classifier was employed for the classification of the tissue surrounding MCs, based on the extracted textural features. kNN makes a class assignment based on the classes of the k training samples nearest to the unknown sample. In this study, the inverse distance-weighted voting was used [46]. In this approach, the contribution of each of the k neighbours is weighted according to its distance from the unknown sample, giving greater weight to closer neighbours. Specifically, the vote of the kth neighbour is defined as:
|
|
where dk is the Euclidean distance of the kth neighbour from the unknown sample. The votes of each class are summed, and the unknown sample is assigned to the class with the highest sum of votes. Specifically, the Decision function for classification is given by:
|
|
where m is the number of neighbours belonging to class M (malignant), b is the number of neighbours belonging to class B (benign), and m + b = k. In this study, k ranged from 1 up to 7 neighbours with step 1. If Decision is greater than zero, the unknown sample is assigned to class M; otherwise, the unknown sample is assigned to class B.
The discriminating ability of each textural feature category was investigated using all the individual features of each category as inputs to the classifier. For each textural feature category, a best feature set was selected with respect to overall accuracy achieved, employing an exhaustive search procedure [45]. Specifically, combinations of two to six features were investigated, and the combination of the minimum number of features that provided the highest overall accuracy was selected. In the case of LTEMs, the exhaustive search procedure was initially performed for each LTEM subcategory (mean, standard deviation, range, skewness and kurtosis) and, then, among the selected features from the five subcategories. The training and testing of the classifier, for each textural feature category (and each LTEM subcategory), was performed using the leave-one-out methodology [45].
To enhance the classification success rate, an additional classification scheme was performed by combining the classification outputs of the most discriminating feature sets, with a majority voting rule [47]. In this approach, the unknown sample is assigned to the class of the majority of the classification outputs.
The performance of the classifier for each textural feature set and the combined classification scheme was evaluated by means of ROC analysis [48].
ROC analysis
To obtain a ROC curve for classification based on individual textural feature sets, malignancy thresholds (confidence threshold values) have to be set; above the malignancy threshold, a sample is considered malignant and, below the threshold, it is considered benign. The Decision value given in Equation 2 provides a measure of malignancy for each sample; positive values reflect a high likelihood of malignancy, whereas negative values reflect a low likelihood of malignancy (benignity). Thus, we partitioned the range of Decision values over the whole case sample (maximum Decision value (most positive) minus minimum Decision value (most negative)) in 10 values to obtain 10 malignancy thresholds. In this way, 10 raw data points of a ROC curve were derived. When the threshold is set to the maximum Decision value, none of the samples is malignant, causing both sensitivity (vertical axis in ROC representation) and 1–specificity (horizontal axis in ROC representation) to be 0. When the threshold is set to the minimum Decision value, all samples are malignant, causing both sensitivity and 1–specificity to be 1.
To obtain a ROC curve for the combined classification scheme, we defined as the malignancy threshold (confidence threshold value) for each sample the number of malignant classification outputs provided by the three textural feature sets, ranging from 0 up to 3. We changed the threshold from –1 up to 3 with step 1, to derive five raw data points of the ROC curve, according to the procedure introduced by Soltanian-Zadeh et al [27].
In order to provide a baseline reference for the performance of the proposed surrounding tissue texture-based classification approach, a ROC curve was also generated for the DDSM assessment of malignancy. For this purpose, the malignancy ratings were used as malignancy thresholds (confidence threshold values), and five raw data points of the ROC curve were generated.
The ROCKIT program (Metz CE, University of Chicago, IL) was used for the generation of ROC curves. Specifically, a conventional binormal ROC curve was individually fitted to the raw data points of each textural feature set, the combined classification scheme and the DDSM assessment, with a maximum likelihood procedure. Then, the area under the estimated ROC curve (Az), the standard error (SE) as well as the asymmetric 95% confidence interval (CI) were calculated [48, 49]. Differences in Az values were analysed statistically using the area test (z-score). Derived two-tailed values of p<0.05 indicate statistically significant differences between classification schemes.
| Results |
|---|
|
|
|---|
|
Figure 3
shows the ROC curves for the best textural feature sets, presented in Table 1
, while Table 2
provides the corresponding Az, SE and CI values. The best feature set of the LTEM category demonstrated the highest performance 0.90 ± 0.03 (Az ± SE). The GLCM best feature set provided a sufficient classification performance (0.86 ± 0.04). The FOS best feature set provided a classification performance of 0.78 ± 0.05. The GLRLM best feature set demonstrated the poorest performance (0.46 ± 0.06), corresponding to random classification.
|
|
|
|
| Discussion |
|---|
|
|
|---|
In the present study, we have analysed the surrounding tissue as depicted on screening mammograms, in order to develop a tool that could aid radiologists in their decisions concerning biopsy and follow-up. Four categories of textural features were investigated, with the LTEMs demonstrating the highest classification performance. An additional classification scheme was performed that combined the classification outputs of the three most discriminating textural feature sets with a majority voting rule. This combined classification scheme achieved the highest Az value (0.96), significantly outperforming classification based on DDSM assessment and classification based on individual textural feature sets. GLCM features provided a sufficient performance, in accordance with reported CADx studies employing similar features extracted from ROIs containing [16, 26–28] or excluding [32] the MCs. While LTEM and GLCM features performed better than the DDSM assessment, comparison did not reveal statistically significant differences (p>0.05). However, the sparsely distributed raw data points used for generating the DDSM ROC curve (Figure 4
) may have introduced bias (overestimation of Az value).
The feasibility of the proposed texture-based classification scheme was demonstrated on mammograms corresponding to heterogeneously dense and extremely dense breast parenchyma, rendering classification a difficult task. The difficulty of the data set analysed is further reflected by the fact that 80% of the benign cases (37/46) have been assigned a rating of 4 (suspicious abnormality). It is a well-known clinical fact that the presence of dense breast parenchyma degrades the diagnostic performance of radiologists, on account of increased inter- and intraobserver variability in the interpretation of lesions in both screen–film [51] and digital mammography [6], and the diagnostic performance of CAD systems [8, 10].
While a comparison with other texture-based classification studies is not possible due to different classification algorithms, textural features and data sets (MC subtlety, density categories and number of cases) used, the proposed method has shown promising results. The achieved performance suggests that texture analysis of the tissue surrounding the MCs, as depicted on screening mammograms, may contribute to computer-aided diagnosis of breast cancer by reducing the number of benign (unnecessary) biopsies, while maintaining high sensitivity.
Completion of the proposed method should include the investigation of additional classification schemes and textural features, as well as validation over a larger data set. Reinforcement of the hypothesis of the surrounding tissue texture analysis will be accomplished by investigating the correlation between computer-extracted textural features and pathoanatomical findings.
| Acknowledgments |
|---|
Received for publication September 28, 2006. Revision received December 9, 2006. Accepted for publication January 2, 2007.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
BJR review of the year -- 2007 Br. J. Radiol., April 1, 2008; 81(964): 265 - 269. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| BJR | DMFR | IMAGING | ALL BIR JOURNALS |