Line-based image segmentation method: a new approach to segment VHSR remote sensing images automatically

ABSTRACT There exist different approaches for segmenting Very High Spatial Resolution (VHSR) remote sensing imagery with competitive performance, including object-based (e.g. Multiresolution), gradient-based (e.g. Watershed), and clustering-based (e.g. k-means) segmentation. However, they have a strong dependence on human assistance for tuning the required parameters (e.g. scale value, clusters number or tolerance thresholds), usually following a trial-and-error methodology that becomes tedious, hardly reproducible or transferable to other images, affecting negatively the methods’ robustness and efficiency. In this communication, we propose a novel method denominated Line-based segmentation (LBS) that automatically segments VHSR remote sensing imagery through a data-driven approach, bypassing the parameters’ definition by experts (i.e. region growing´s seeds and thresholds). The proposed algorithm offers flexibility and accuracy to segment regions with varying sizes and shapes, tested on different VHSR images, including multispectral images (WorldView-3, GeoEywe-1, Ikonos, QuickBird and SkySat), RGB aerial image (NAIP) and panchromatic image (Ikonos). The results revealed the LBS method shows a competitive performance compared against two well-known segmentation approaches, but without user intervention and generating consistent and repeatable segmentation results following an automatic fashion.


Automatic segmentation
While many steps of remote sensing image processing can be carried out with minimum human intervention, automation of image segmentation still presents several problematic aspects. First of all, the process usually requires human participation (i.e. experts) regarding the most appropriate parameter settings based on the specific image and ground characteristics, demanding the use of a trial-and-error methodology (Hay, Blaschke, Marceau, & Bouchard, 2003;Myint, Gober, Brazel, Grossman-Clarke, & Weng, 2011;Yu et al., 2006). This results in a timeconsuming process and a bottleneck for the analysis of large volumes of data (Cheng & Han, 2016). Second, although each VHSR image scene usually contains regions with distinct characteristics (e.g. size, shape or texture), also exists different regions comparting similar appearances making it difficult to properly determine their boundaries (Beneš & Zitová, 2015;Chen, Hay, & St-Onge, 2012;Cheng & Han, 2016;Gurcan et al., 2009).
Considerable research efforts have been devoted to automatizing image segmentation, many of which were originated from medical image analysis or computer vision, such as extracting retinal vessels (Zhao, Wang, Wang, & Shih, 2014), breast tumor segmentation (Rouhi, Jafari, Kasaei, & Keshavarzian, 2015) or handwritten text recognition (Saeed & Albakoor, 2009). Given the unique characteristics of remotely sensed imagery, researchers have developed many methods following a semi-automatic approach aiming to delineate image objects with minimum human intervention. For example, the Region-based Image Segmentation (RISA) method relies on clustering k-means algorithm to define seeds for growing regions (Wang, Jensen, & Im, 2010), but it still requires the definition of clusters' number and stopping criteria by a human user. Zhang and Maxwell (2006) proposed a fuzzy segmentation, which applies fuzzy theory to assist parameter selection. This algorithm requires an initial image segmentation to train a fuzzy system, and extensive testing to improve its robustness working on different types of imagery. Byun, Kim, Lee, and Kim (2011) proposed an approach to segment VHSR images using a region growing algorithm that automatically selects seeds following a block-based method, where block size needs to be defined a priori. To facilitate automatic segmentation, Akcay and Aksoy (2008) combined image's spectral and structural information along with morphological operations to build a hierarchical tree for the segmentation process. Although this algorithm showed promising results, it requires experts to define structural elements and clusters based on the objects of interest. To date, user intervention is needed for most of the algorithms to properly define segmentation parameters.

Regions' scale
Actually, multiscale analysis is a hot topic in image segmentation (Blaschke et al., 2014;Zhang, Xiao, Feng, Feng, & Ye, 2015;Zhou, Li, Feng, Zhang, & Hu, 2017). Since the proposal of Witkin (1984) who defined a concise description of a signal scale through a tree involving hierarchical relations between all observation scales; and Marr and Hildreth (1980) proposed a twostep approach to detect intensity changes at different scales, and afterwards localize them spatially, stating that identified segments are not independent, allowing to deduce rules to describe an image (raw primal sketch), resulting in a hierarchy of descriptions covering a range of scales (full primal sketch). Tabb and Ahuja (1997) presented a new concept of scale that represents image structures at different scales, combining scale and structure explicitly, making it possible to detect edges and regions. Baatz and Schape (2000) proposed its iconic and now widely used multiresolution analysis, that opened a new way to analyse images following an object-based approach (Benz, Hofmann, Willhauck, Lingenfelder, & Heynen, 2004), that is evolving into a paradigm (Blaschke et al., 2014), allowing the addition of spatial relations between objects and expert knowledge into the process (Hay & Castilla, 2006). Although the scale selection still remains a big challenge to overcome, it often depends on subjective trial-anderror methods (Blaschke, 2010;Zhou et al., 2017). Arbelaez, Maire, Fowlkes, and Malik (2011) combined contours and segmentation capturing multiscale information (three scales), without showing any significant improvement on the datasets used. Their approach consists of transforming contour detector into a hierarchical region tree, getting an image segmentation as a result. They recognized that contour detection and segmentation are related, but not identical because contour detectors offer no guarantee to produce closed contours and hence do not necessarily provide a partition of the image into regions, as image segmentation does. Drǎguţ, Tiede, and Levick (2010) presented a very promising method to automate the selection of the best scale parameter value for multiresolution segmentation (MRS) involving the intrasegment homogeneity defining a three-level hierarchy to segment regions of different sizes through a tool called ESP (Estimation of Scale Parameters). This represents an improvement for the original approach based on a user-defined scale determination (Baatz & Schape, 2000), that avoid the automatization of this process. Also, Yang, He, and Weng (2015) made a refinement of local peak method (implemented by ESP tool), adding the intersegment heterogeneity that improves the scale determination by 17%. Later, Dragut, Csillik, Eisank, and Tiede (2014) extend the ESP tool functionality, to determine three scale levels using multiple image bands.
This research aims to propose a new segmentation method that departs from previously described multiscale approaches able for monitoring, modeling, and managing complex landscapes through linking different scales hierarchically (Burnett & Blaschke, 2003) present in VHSR remote sensing images. Although multiscale analysis allows different windows of perception, it depends heavily on user assistance, which motivates us to focus on automatic region segmentation, that could benefit other segmentation approaches.
Specifically, our approach employs an entropy function to select a particular line from the image under analysis. Afterwards, an optimization process is applied to this line to determine the best scale value necessary for the automatic segmentation of the image, avoiding the intervention of experts for the parameters' tuning while still achieving a competitive performance. Although this method could be considered as singlescale, it is important to note the fact it allows the segmentation of different sized regions, starting from the small discernible regions in the image until large regions with different sizes, shapes and spatial locations.
This communication starts with a detailed description of the proposed method (Section 2). Then, its performance was tested on different VHSR images comparing it against two well-known segmentation algorithms, under a qualitative and quantitative evaluation (Section 3). The obtained results are explained, compared and discussed in Section 4, and the conclusions are presented at the end (Section 5).

Sample imagery
All selected images have very high spatial resolutions, ranging between 0.8 and 3.20 m on multispectral, visible and panchromatic modes. They were acquired from different sensors with proven performance, including images from Maxar's Satellite Constellation (Ikonos, GeoEye, WorldView), Planet (SkySat) and the U.S. National Agriculture Imagery Program (NAIP) (Figure 1). These images present areas with urban/rural mixed landscapes and a reasonable diversity of region types and sizes, such as roads, buildings, lakes, parks, etc. Table 1 highlights the images' characteristics. A complete VHSR remote sensing scene usually has a large size, so one often employed strategy is to test new methods on a small section of it (Troya-Galvis, Gançarski, & Berti-Équille, 2003) to reduce processing times and to alleviate the high labor intensity required for the manual segmentation during the extraction of ground truth necessary for comparison and evaluation purposes.

Image processing
The proposed image segmentation method is based on a region growing (RG) idea, but adopting a datadriven approach that works applying an entropy function on transversal lines over the image under analysis, reason which motivated its name as Line-based segmentation (LBS). It is composed of three main steps: (a) image preprocessing; (b) identification of segmentation parametersseeds and thresholds; and (c) line (LRG) and random region growing (RRG) segmentation. The flowchart is shown in Figure 2.
Actually, recent advances in satellite and sensor technologies have enabled considerable improvements in the radiometric resolution of the VHSR remote sensing images, from 8bit to 16 or 32 bit, but images with higher radiometric resolution require more radiometric values to record, spreading over more values (Ose, Corpetti, & Demagistri, 2016), which means larger image size. Usually, the most common radiometric image resolution is 8-bit, because the extraction of image information not always adheres to the highest possible radiometric resolution (Rama-Rao, Garg, & Ghosh, 2006;Verde, Mallinis, Tsakiri-Strati, Georgiadis, & Patias, 2018), having this into account, to keep a balance between computational efficiency and results accuracy, the multispectral and visible images were 8-bit, while panchromatic was 16-bit.
The proposed algorithm operates on grayscale imagery, meaning a color to grayscale conversion was needed for multispectral and visible images, an  operation widely used as a preprocessing step in image processing in different domains such as medical imaging, satellite imaging and agriculture/environment real-scene (Ma, Zhao, Zeng, & Wang, 2015;Nafchi, Shahkolaei, Hedjam, & Cheriet, 2017), mainly to reduce processing times and computer requirements. Also, the color often introduces some redundancies (Acharya & Ray, 2005) or unnecessary information into the process (Kanan & Cottrell, 2012). There exist several procedures to convert color to grayscale images, that can be categorized into global, local, and hybrid (Nafchi et al., 2017). The method we chose belongs to the global category, known as luma, that forms a weighted average, being more sophisticated than the average method, known as Intensity, because the former takes account that humans do not perceive all colors equally. Usually, this linear luminance is calculated as a weighted sum of the three linearintensity values, representing the brightness of a panchromatic monochrome image. It combines the red, green, and blue signals in proportion to the human eye's sensitivity to them (Pohl & Van Genderen, 1998). The conversion applied here was a version of luminance through the next formula (Kanan & Cottrell, 2012): where R(x,y), G(x,y), and B(x,y) represents gray level (intensity) values at red, green and blue spectral bands, respectively, and the coefficients represent specific spectral weighting to compute the true CIE luminance of any RGB image following the Rec.709 Color space (ITU-R, 1990). This is a default method implemented by well-known image processing software such as GIMP (https://www.gimp.org), Matlab (https://www.mathworks.com) and ImageJ (https:// imagej.net), sometimes with small coefficients changes (Bosch, Zisserman, & Munoz, 2007, Dec). This color-to-grayscale conversion is required to preserve the salient features of the color images, such as brightness, contrast and structure of the color image (Günes, Kalkan, & Durmu, 2015), and at the same time, retaining as much information about the original color image as possible, while simultaneously producing perceptually plausible grayscale results (Cadík, 2008).
Although the majority of the VHSR images presents four bands, including the near-infrared band, this band was not used because of: (a) the conversion function from color to grayscale image does not include NIR band, (b) RGB images showed satisfactory performance compared with CIR (Composite Infrared) and MS (multispectral) images (Zheng et al., 2018) and (c) some authors indicate NIR band tends to produce larger regions in comparison with RGB images, as is described below.
Torres-Sánchez, López-Granados and Peña (2015) compared 8-bit UAV images from RGB and multispectral camera, applying MRS segmentation with the same weight for each band. They considered scale as the most important factor for controlling region size, finding that for each scale, RGBNir tends to produce larger regions than RGB images, on three different crops, as scale value increases. It is important to note, their main objective was to accurately discriminate between vegetation and bare soil coverage types. It is known that NIR band is very useful to differentiate vegetation, but it was shown to affect regions' sizes, which could reduce the benefit of better image spatial resolution. Another research (Drǎguţ et al., 2010), corroborated this when their ESP tool found larger scales for NIR image than Red channel favoring the detection of larger regions, which could be harmful when used on VHSR images. Yang, Li, and He (2014) indicated that man-made materials, such as metal, asphalt and concrete, which have minimal variability in the NIR region, may feature strong variability in the visible light region on account of painting, special coatings or even age on VHSR images (QuickBird and WorldView3), allowing a better recognition for diverse land cover regions in different urban landscapes; while Van Etten (2018) compared grayscale, RGB and MS images without finding any significant difference between them, all yielding high F1 scores (F1 > 0.80) for detection of regions at different scales (0.15-3.0 m) on VHSR satellite images.
An image contrast enhancement is further applied to improve region perceptibility, improving brightness differences among regions, which was completed through a remapping of grayscale values to cover the complete images' dynamic range. This is a linear transformation (Gonzalez & Woods, 2008) using the following function: ( 2) where GL(max) and GL(min) represent the maximum and minimum gray values of the image, respectively, and g(x,y) is the pixel's gray value at coordinate (x,y).

Identification of segmentation parametersseeds and stopping threshold
The traditional region growing (RG) method requires the identification of a seeds' set that correspond to starting points of growing regions, and a threshold as the stopping criterion for every region. Liu, Li, and Wang (2015) indicate those seeds can be determined using manual selection, gradient-based or random approaches. Given the fact that the first two methods require a certain level of human assistance, we decided to choose the random-option to pursuit our objective of automatic image processing. Sánchez, Martínez, and Arquero (2015) evaluated four strategies for automatic seed selection for a region growing algorithm, including random selection, edge-based selection, distance-based selection and a combination of edge and distance-based. They found the edge-based seed selection was the better strategy. We proposed a new method based on the entropy theory in order to define the stopping thresholds. This permits to identify the maximum variability between pixels' values in the image. Shannon cited by (MacKay, 2003) formulated his entropy formula with the function defined as below: where H is used to denote entropy, P is the probability, and x represents each image grayscale value.
First, the image under analysis is fully scanned using three different straight line patterns, such as rotational, horizontal and vertical, but for illustration purposes, only the main lines are shown (Figure 3). The entropy value from each line is stored. Then, the line with maximum entropy value is selected as this line contains the highest variability in gray levels of the VHSR image under study (Singh & Singh, 2008) likely representing the largest amount of regions vital for later analysis. Figure 4 shows the selected lines in each image.
Next, a vector is created with all the gray values contained inside the selected line keeping the same order. Then, a merging process begins between neighboring pixels depending on if the gray values fall inside a particular threshold, starting from one and increasing up to a maximum predefined value. This maximum value was chosen according to the image's radiometric type (8-bit or 16-bit) to avoid values which could give too few regions (two or one final regions). With every unitary increment, the new gray levels are stored (labels) and a graph is built using both arrays (increments vs. gray levels), applying an optimization process, specifically the iterative L-method approach proposed by Salvador and Chan (2004) to find its respective inflection point. This point represents the turning point where the transition occurs from small to large regions that could be interpreted as the point where the smaller and discernible regions of the image under study can be detected. Following this strategy, allowed us to bypass the two most critical elements of RG algorithm (Fan, Zeng, Body, & Hacid, 2005), such as the determination of thresholds and selection of seeds.  Figure 5 illustrates the aforementioned steps with a simple example. The process starts with the highest entropy line extracted from a 3-bit toy image (8 different gray levels), with ascending gray levels from 0 (black) to 7 (white). To be more specific, these numeric values represent the initial vector ( Figure 5(a)) and illustrates how the gray values change through merging operations according to the actual threshold (i.e. numbers on the left), and the gray levels that were stored (numbers on the right). Figure 5(b) depicts the graph built using both number arrays, locating the inflection point at coordinates (3,2), corresponding to the selected threshold value and final regions, respectively ( Figure 5(c)).
The regions' number identified by the previously described optimization process was used to define the scale in every image under study, through the arithmetic division between the vector length and the number of regions detected. Figure 6, shows the scale values obtained for each image. There are some important aspects to note from it. First, it could be inferred that a better image's spatial resolution tends to produce higher entropy values, but it does not always seem to produce larger scale values, because scale depends strongly on the existence of large areas with similar gray values inside the image, such as grasslands or waterbodies. Second, the images' size seems not to influence the scale values, as it can be verified with the Ikonos-Pan image, which has a double size compared to others, getting a very smallscale value (1.0), but not too far from the small-sized images; by the contrary, the GeoEye image has the largest scale value (7.14), mainly obtained for the influence of a large region (roof) with a constant pixel value, reducing the number of regions detected (only 70). Third, it could be found an inverse relationship between entropy values and scale, because a greater entropy value means more regions detected, causing the scale decrease.

Region growing
The region growing process is split into two components: Line Region Growing (LRG) and Random Region Growing (RRG). Both components work  similarly, adding the pixel from its immediate neighbours (up, down, left and right), where its gray value is closest to the seed value. However, the first component (LRG) grow regions along the selected line (Section 3.2), while the second one (RRG) grow regions randomly.
The LRG starts growing regions following a sequential pattern using the threshold and seeds found in the previous section. The process starts with the first left pixel on the line, growing until the first region is extracted. Next, it continues with the second seed, growing until the second region is obtained. The same process continues until reaching the line's last pixel (Figure 7(a)). Once all the regions on the line are delineated, the algorithm adopts a random pattern (RRG) selecting seeds randomly for growing regions (Figure 7

Segmentation metrics
The metrics employed to assess the segmentation performance and make an objective comparison between our method against other segmentation approaches are known as supervised metrics (Beneš & Zitová, 2015;Zhang, Fritts, & Goldman, 2008), also called discrepancy methods by Cardoso and Corte-Real (2005) which require a reference segmentation (also known as ground truth or gold standard) as the basis. These metrics include different approaches such as statistical, positional and geometrical (Table 2), providing a solid and wide evaluation criteria to reduce  any possibility to obtain distorted results caused by insufficient metrics (Beneš & Zitová, 2015).
In our case, the reference segmentation was obtained by averaging segmentation results realized by five human experts, to get a ground truth closer to an ideal segmentation. This ground truth was not created just for one particular type of regions (like buildings or trees), but for all regions located along the selected line. This approach allows us to use a larger amount and diversity of regions without any subjective regions' selection. Figure 9 provides an example of performance evaluation, where an image includes a rounded region (having a black boundary) over a white background with an arbitrary horizontal maximum entropy line (green rectangle) (Figure 9(a)), red points represent a particular segmentation of the proposed algorithm, and yellow points represent the reference segmentation (Figure 9(b)). Each segmentation is transformed into a binary vector, where '1ʹ means "boundary" and '0ʹ is the region area (Figure 9(c)), the final binary vector resulting from an AND logic operation (intersection) between two previous vectors, with 1 and 0 representing agreements and non-agreements, respectively (Figure 9(d)).

Qualitative evaluation
The qualitative evaluation process involved the analysis of those visual features that could be easily recognized by an expert human. In this case, we selected three different attributes; boundary fidelity, area similarity and presence of artifacts inside every region. For these tests, we choose meaningful regions representing discernible regions, such as roof, street or trees.
Ideal or perfect segmentations rarely occur because any segmentation process usually produces artifacts that do not belong to some particular region, or sometimes, the regions' boundary presents a minor displacement from real border which could not be appreciated visually. To avoid a biased evaluation it was established three region sizes' categories for region delineation according to their boundary fidelity, area similarity and presence of artifacts, using large (lake and building), medium (grass field and forested area) and small regions (trees and streets) ( Figure 10).
We found the next peculiarities within the large regions (lake and building). The LBS offered a better area similarity of both, while the WS method obtained the best boundary fidelity of the building region and the MRS and LBS methods manifested some artifacts over it. Within medium regions (forested area and grass field), all methods oversegmented the forested area possibly by its rough texture, but not showed with  the grass field, where all methods gave a very good boundary delineation, with no presence of artifacts and a good area similarity. About the small regions (trees and car), the LBS merge the trees with its shallows, as one region, and also created a few artifacts inside the segments. With the car region, all methods segmented it well, although with a little difference in the boundary fidelity and area similarity.
In general, all three segmentation methods produced a high boundary fidelity, although it was appreciated that sometimes the boundaries were traced by the external border, and other times by internal borders, but without detecting any particular pattern for any of the methods tested, mostly produced by the particular parameters setting employed for each image. In relation with the area similarity feature, it was found that MRS behave better, especially when taking into account the different categories of regions' sizes, indicating that the ESP tool was a good scale selector. Although the proposed LBS method is single-scale, it showed good performance segmenting regions of any size. The presence of artifacts did not show any particular difference between these three methods, appearing in any place, as in the boundary or inside/outside of regions.
Also, to check if the proposed method behaves differently delineating regions on the panchromatic band, it was chosen three regions with different sizes, such as small (balconies), medium (street) and large (roof), as can be seen in Figure 11. The segmentations let us affirm the proposed method can recognize very small regions like balconies and can delineate them well, although it tends to produce some artifacts inside as the region gets larger. The MRS behaves very well under all regions' sizes tested, giving a well-defined boundary, while the WS method has some trouble to find a parameters' set that can obtain a good delineation of the regions under study, leading to a regions' oversegmentation ( Figure 11).
Next, it will be described the quantitative evaluation to avoid any subjective evaluation that could be biased by an experts' own judgement, and that could be controversial for some other evaluators.

Quantitative evaluation
Seven metrics were chosen to make a quantitative evaluation involving a wide spectrum of metrics (Table 3), including Rand index (Rand, 1971), Positional discrepancy (Montaghi, Larsen, & Greve, 2013), Area fit index (Lucieer, 2004), Hamming distance (Norouzi, Fleet, & Salakhutdinov, 2012), Jaccard distance (Real & Vargas, 1996), Fowlkes-Mallows index (Fowlkes & Mallows, 1983) and overall accuracy, one of the most commonly used measures for performance evaluation (Congalton, 1991). Also, we performed a t-test for means of two independent samples from descriptive statistics, which is a two-sided test for the null hypothesis that two independent samples have identical average values.
Specifically, Rand index is related to accuracy or similarity between clusters, using agreements and disagreements to calculate its value. It penalizes both false positive and false negative during clustering allowing us to measure the correspondence between edges, but at the same time, showing the correspondence between non-edges. This metric did not show a good performance for the LBS method, this could be partially explained because that under large and medium scales have less coincidences between non-boundary pixels of regions producing higher values for false positives and false negatives, and consequently reducing its Rand index substantially. The Hamming and Jaccard distances measure dissimilarity between sample sets, although it is known that Jaccard distance is usually more sensitive to small sample sizes, but that was not a problem here, because our sample size was equal or superior to 500 samples. Any of the three segmentation methods showed no superior performance under these two distances, getting values between 22% and 26%, with no statistical differences between them.
The Positional discrepancy (PD) and Area fit index (AFI) metrics, both involve spatial coincidences between regions but behave differently, because the positional discrepancy takes into account only the centroid position, while Area fit index uses the regions' boundaries. The positional discrepancy metric for the proposed method showed statistical difference, indicating their superior performance, getting positional discrepancy of 3.81 pixels only, meaning it can detect the regions' centroids easily. In relation to the AFI metric, with its values varying in the range [−1,1], showed the LBS method behaves satisfactorily (0.30), giving some space to improve.
With the Fowlkes-Mallows (FM) index, defined as the geometric mean of the pairwise precision and recall extracted from contingency table, indicated the proposed LBS method needs to reduce the interference of false negatives (FN) values, which affected statistically significant its performance.
Last, but not least, the overall accuracy (OA) metric, maybe one the most used metric to evaluate segmentation performance, showed similar performance for all the three segmentation methods, with no statistical differences between them, with values varying between 76% and 78%. It is worthy to point out that the segmentation parameters were found following a trial-anderror methodology in the case for the WS method, while for MRS method, its scale parameter was defined using the ESP tool (Dragut et al., 2014) and its shape parameter through a visual examination, meaning that possibly a different parameter setting could let them improve their performances . Figure 12 illustrates the segmentations produced for each method organized by category size, including the proposed method (LBS) and the two selected methods (WS and MRS) used for comparison purposes through the calculated metrics.    The proposed segmentation method (LBS) is based on region growing algorithm, whose parameters, thresholds and seeds, were determined completely automatic, without any user assistance, a very relevant behavior because it reached a comparable performance with two wellknown methods for segmentation of remote sensing images, such as watershed and multiresolution, both following a user-assisted approach, although the latter could be described as a semi-automatic approach, because it was used in conjunction with a tool for scale determination, as was mentioned before.

Automatic image processing
As it was mentioned before, the Line-based segmentation method (LBS) reached a similar performance with watershed (WS) and multiresolution (MRS) methods according to overall accuracy index, Hamming and Jaccard distances, and a superior performance according to the positional discrepancy metric. We consider this result very relevant because it reached following a fully automatic approach, especially with overall accuracy which evaluates the true positives cases (i.e. real regions' borders) overall samples used. While the Hamming and Jaccard distances confirmed that have not found statistical differences between the proposed method compared against the other two supervised methods, knowing these two distance metrics represent the false negatives cases (i.e. real regions' borders not identified), that can be interpreted as a complementary measure of overall accuracy discussed earlier.
The best performance for the positional discrepancy metric obtained by the proposed method, showing statistical differences against the supervised methods, indicated a superior capability for centroids' detection of segments, which is very important for forward image analysis, such as points clustering or weighting by centroids density, that could be used for estimation of region agglomeration or identification of trends in urban growth. To achieve this metric value without using any prior knowledge about image content during the segmentation process represents a real advantage point supporting its high reproducibility, eliminating the need of the user's visual image interpretation or being influenced by posterior image analysis.
It is important to mention the proposed method (LBS) does not represent a universal solution for segmentation of VHSR remote sensing images, because it is known that a good image segmentation strongly depends on the particular information the user is looking for, but it could be considered as a fair initial step when the user does not have any knowledge about the image content, as a preliminary segmentation result to compare against other image segmentation results or when it is necessary to segment a high volume of images automatically.

Scale selection
As we have explained before, the proposed segmentation (LBS) is a single-scale method, departing from the multiscale approaches that actually are highly appreciated for the analysis of very complex natural phenomena captured by remote sensing images, using hierarchical linking to represent relationships between regions that belong to different scales. Although our approach is single-scale, it has the capability to grow simultaneously different sized regions (e.g. small and large regions), because it does not have a limiting size parameter to prevent it to generate regions larger than a specific size, although it becomes fundamental to generate hierarchical levels under other segmentation approaches.
The image segmentation methods chosen for comparison purposes were the Watershed and Multiresolution segmentation, known as supervised methods, which require most of the times a trial-and-error methodology for the parameters' tuning step. The Watershed segmentation (https://imagej.net/Morphological_Segmentation) has two parameters, tolerance and gradient, varying their values according to the category region size under analysis. For the tolerance parameter, three values were chosen: 5, 10 and 15 corresponding to small, medium and large regions. As the tolerance value represents an intensity parameter, the chosen values were 10 (8-bit images) and 10, 100, 300 (16-bit pachromatic image) as starting points, knowing that lower and greater values increase or decrease the number of regions, respectively. The gradient type parameter corresponds to the difference between dilation and erosion operations within a particular neighbourhood. The selected gradient parameter value varied between 1 and 5, that were chosen through a visual examination.
The multiresolution segmentation has three main parameters, scale, shape and compactness. To define their values, we followed a mixed strategy, combining an automatic methodology (for scale parameter), a trial-and-error methodology (for shape parameter) and fixed value (for compactness parameter). For the scale definition, we used the ESP (Estimation of Scale Parameter) tool (Dragut et al., 2014), based on Local Variance (LV) of region heterogeneity within a scene, choosing a hierarchy of scale from thresholds in rates of changes. The selected scales for all images varied between these ranges for 8-bit images (10-14, 16-48 and 28-78 for small, medium and large regions, respectively) and for 16-bit image (200, 500 and 100 for small, medium and large regions, respectively). The shape parameter was selected through a visual examination, assigning values in the range 0.3-07, while compactness value was defined as 0.5 (Kavzoglu & Yildiz, 2014) for 8-bit images, while 0.1 and 0.9 for shape and compactness parameters were used for the 16-bit image. We consider very important and necessary to overcome the use of the trial-anderror methodology for parameters' definition, such as scale, to enable the automatic implementation of the image segmentation process. Usually, it involves multiple testing converting this into a subjective and timeconsuming task. Here, it was used as an optimization process to define parameters' values automatically allowing the identification of the smaller and discernible regions in the image. Although at first, it was not thought to use it on multiscale approaches, the original image could be splitted in small fractions to segment regions with larger sizes, possibly enabling its hierarchical behavior. The approaches proposed by Dragut et al. (2014) and Yang et al. (2014Yang et al. ( , 2015 represents a new and promising way to identify the scales for the images under study, but they still require the user assistance to define additional parameters such as the number of iterations, ranges or increments.

Seeds selection
The process of seeds selection sometimes impacts negatively the quality of image segmentation when region-growing algorithms are used (Fan et al., 2005). Under the present approach, these seeds were selected automatically using a new method based on the highest entropy line. Also, as the regions start to grow from that line initially, it reduces the intrinsic variability between segmentations from the same image, as is explained next.
After performing several segmentations (five in total for each image) using the proposed method (LBS), we found very small differences (0.01%) between obtained regions, as can be seen in Table 4.
To evaluate further the impact of seeds selection on the behavior of the proposed segmentation method, we selected two statistical criteria; Mean Square Error (MSE), which is a global measure that estimates the perceived error between two images with a value of zero (0) indicating perfect similarity, and Structural Similarity Index (SSIM), which tries to model the perceived changes in images' structural information using small windows, giving values between −1 (different) and 1 (perfect similarity) (Wang, Bovik, Sheikh, & Simoncelli, 2004) . Both criteria confirmed a high similarity and small changes between repeated segmentations confirming the high consistency and reproducibility of the proposed LBS image segmentation method (Table 4).
Specifically, the MSE obtained was close to 300, which represents a very low value compared to the total of regions used (13,000 approx.), meaning it is just 2% away from the optimal MSE value (zero value), also with a very low variability (2.67%). In relation with the SSIM metric, its optimal value is one, and the results obtained from LBS segmentations gave an average value of 0.80, near to the optimal value, and also with extremely low variability (0.67%).

Conclusions
In this manuscript, it is proposed a novel remote sensing image segmentation method named Line-based segmentation (LBS) adopting a region growing approach with the aim to eliminate human intervention in segmenting VHSR remote sensing imagery to make possible the automatic image segmentation. This was realized through the definition of a proper segmentation parameter without the use of a trial-and-error methodology, and using the image's highest entropy line instead, getting the required thresholds and seeds through an optimization procedure. This method was tested on several VSHR images and evaluated qualitatively and quantitatively, showing comparable segmentation performance in comparison with two well-known segmentation methods in remote sensing.
The LBS method is able to delineate simultaneously both small and large regions because it does not have any limiting size parameter, becoming a very important feature knowing that every single remote sensing image usually presents regions with high variability in size and shape. Also, it has shown the capacity to reduce or eliminate the need of prior knowledge about the scene allowing the automatic processing of large volumes of remote sensing images. Because of these qualities, we consider this line-based approach could open new ways to process and analyse VHSR remote sensing imagery effectively.

Acknowledgments
This study was possible thanks to the support of Universidad del Tolima, Colombia under grant 305-05361701. Also, the Universidad Nacional de Colombia, Sede Medellin, provided support with travelling costs to several conferences, under grants 20601009564 and 200000012970.

Disclosure statement
No potential conflict of interest was reported by the authors.