Estimating soil texture with laser-guided Bouyoucos

The main objective of estimating soil texture is to determine amounts of components of the soil. While the analysis of coarse particles is performed by sieving, Bouyoucos-hydrometer method is used to determine percentages of small particles (silt and clay, also sand). However, these traditional methods require expertise, laboratory environment, sensitive equipment and a long-time period. All of these requirements are overcome by “Laser-Guided Bouyoucos” which is proposed in this study. The proposed device estimates the amount of components of soil by passing laser light through the beaker and evaluates the changes in the magnitude of light. Besides, the device requires no expert or laboratory medium for measurements, can be built using easy-to-find and cheap equipment, and most importantly, can be mobile for in-situ use. Several soil samples preanalysed by Bouyoucos-hydrometer method are used in this study. The error of the proposed device was calculated by summing the absolute errors of sand, silt and clay components, and the average of all errors is only 2.25%. Although more soil samples are needed to test the system, according to the successful results on the used dataset, we believe that soil texture can be analysed quickly without an expert or a laboratory.


Introduction
Soil is an important material from which human beings have benefited since the beginning of the earth. Since the soil is a base component in some areas (agricultural activities for food production, constructional works for building shelters, and many others related to the material science), many issues in natural science such as drainage, water retention capacity, air capacity, erosion susceptibility, organic matter content, cation capacity, pH balance, hydraulic conductivity can be assessed with knowledge of the physical content of the soil [1][2][3][4]. By a simple definition, estimating soil texture is to measure the amount of sand, silt and clay particles [5]. While the amount of clay only can be used to comment the water retention capacity of the soil, the amount of organic nutrient waste for plants and the rate of mould, the amount of sand and silt together is used to determine whether the soil has sufficient air flow and whether or not it is suitable for agriculture. These three particle types are generally named according to their sizes in millimetres. Table 1 shows the classification of these components according to their particle sizes.
As shown in Table 1, particles between 2 and 0.02 mm are called "sand", ones in [0.02 0.002] mm are "silt", and particles smaller than 0.002 mm are "clay" [5]. Particles larger than 2 mm are not included in this classification. Particle types, particle distributions, basic principles underlying relations between particles, and methods to use in measuring are examined in a texture analysis. Though hundreds of particle size determination methods and proposals have been published to date [6], traditional and simple methods such as sieving, sedimentation, pipette and hydrometer methods are still considered to be the most commonly used standardized methods for estimation and measurement [5,7]. The classification of particle sizes mentioned above is based on Atterberg's study. The mechanical analysis of the soil was first standardized in 1928 by the International Society of Soil Science as a combination of sieving and pipetting methods taking into consideration all the methods proposed so far and only particles passing through a 2 mm round pore sieve were taken into account.
The chemical structure of the particles also affects the physical content of the soil. For example, the dissolution of salts changes the particle size because the hydrolysis of the ions weakens the solidification between particles. Since soil and water come together in many of the problems related to the soil in the natural sciences, researchers often want to see also the water effect on the soil texture. Contrary to the ones with porous particles or a lot of organic matter, mineral-bearing soil such as iron (haematite, ferric hydride, etc.), manganese (pyrolysis, benzene, etc.), titanium (ilmenite, titanomagnetite, etc.) is denser. Utilizing this density difference, if the prepared suspension is measured constantly at a certain depth and time intervals after mixing, some assessments can be made Many researchers classify the soil sample by considering only particle types in higher ratio. The naming in this way facilitates general descriptions. For example, the texture analysis of clayey soils is more difficult than sandy ones. In addition to which particle type is in high amount, the size and the shape of the particles are the other important factors in classification. Since they can be represented by a single parameter in mathematical manner, all particles are tried to be expressed by spherical volume. Therefore, experts also classify particles according to whether they are spherical or not.
Since the traditional methods have some disadvantages such as causing high deviational results, requiring a lot of labour and experience, being dependent on equipment and laboratory, and having long duration, estimating soil texture is getting difficult [2,8]. For this reason, many studies on technology-supported analysis have been carried out [9][10][11].
Smart and Tovey [12] studied particles below 20 μm using electron microscopy. Additionally, optical methods are frequently used in many areas such as colour analysis [13,14], plant root analysis [15,16], determination of organic matter [17,18], soluble matter analysis [19,20], soil water interaction [21,22]. Chaney et al. [23] compared results of four different analysis devices (two devices are based on laser diffraction and other two uses X-Ray) with traditional hydrometer method. In Monson's study, a system consisting of two light sources producing light at different wavelengths and a camera was proposed to examine the physical structure of agricultural land [24]. In another comparison study, Roberson and Weltje [25] used the sieve-pipette method and 10 particle size analyzer instrument for each sediment sample of Goossens [26] data. Fisher et al. [27] investigated the reliability of using the laser diffraction method to characterize the size of soil particles. The results of a laser diffraction-based device and hydrometer method are evaluated in experiments using 22 soil samples collected from four different locations. Sudarsan et al. [28] used the Continuous Wavelet Transform-based computer vision algorithm with a simple microscope connected to computer to characterize the size of soil particles. Frei and Kruis [29] used ANN (artificial neural network) both for regression and classification on transmission electron microscopy (TEM) images to analyse particle size distributions of agglomerates. Though successful results were obtained by high-level technological equipment, such equipment has low potential in terms of price, availability, usage and portability.
By assessing all of these difficulties, a new device based on the traditional Bouyoucos experiment is improved. This device is designed by placing five laser light sources on one side of the beaker and five LDR (light dependent resistor) sensors on the other side, and it is fully automated with an embedded software. Although the soil preparation steps are the same with traditional Bouyoucos method, the device has some extra advantages as below.
• mobile and small • work at any place (independence from laboratory) • self-work (independence from expert) • cheap and easy producible.
Several soil samples which were preanalysed traditionally are used to test the device. Final results are compared with Bouyoucos-hydrometer measurements.

Soil samples and dataset preparation
Before starting the Bouyoucos-hydrometer method, 40 g of sodium hexametaphosphate is mixed with 1 L of purified water and allowed to fully dissolve for one day. Some amount of soil is taken from the dry soil sample (25 g for clayey soil, 50 g for loamy soil, and 100 g for sandy soil) and placed in an oven at about 105°C for 1 d. Then it is passed through a 2 mm sieve and eliminated from big stones and its weight is measured. A new and same amount of soil sample is taken and mixed with the previously prepared 10 ml of sodium hexametaphosphate and 150 ml of distilled water solution, and left for 1 d. After mixing it in the dispersion vessel (5 min for sandy soil, 10 min for loamy soil, 15 min for clayey soil), the mixture is put into 1000 ml sedimentation cylinder (beaker) and distilled water is added up to 1000 ml line. The suspension is stirred thoroughly by moving it up and down 20 times with the brass mixing rod. Timer is started when the mixing rod is removed from the cylinder. After exactly 20 s, the hydrometer is gently immersed into the suspension and 40 s later the reading of the hydrometer is recorded. At the end of the 2nd hour of the experiment (again by immersing the hydrometer 20 s before), a second reading is made. The temperature values are also recorded in both readings. If the temperature is different from 20°C the calculated ((ReadTemperature − 20) × 0.36) value is added to the recorded hydrometer value as a correction. Both of the hydrometer readings are compared to the weight of the oven-dried soil sample. The ratio obtained from the first reading gives the percentage of "silt + clay", and the ratio obtained from the second reading gives the percentage of "clay". Sand, silt and clay percentages are determined using simple mathematical operations [7]. Six different soil samples were supplied from the Department of Geological Engineering of Çukurova University for using in the calibration and the test of the device proposed in this study. The clay, silt and sand percentages of the supplied soils were determined by the experts from three replicate experiments with conventional Bouyoucos-hydrometer analysis. In this study, the average of these three replicate experiments is used to calibrate the device. The contents of 6 soil samples (averages of 3 replicate experiments) are given in Table 2.
In the study, since they are difficult to determine, mostly silt and clay weighted soil samples were supplied. As seen in Table 2, the contents of the samples vary from 4% to 22% sand, 24% to 68% silt and 10% to 68% clay. As a general opinion, it is argued that the sample set should be formed by choosing from meaningful points that will represent the whole data. If the sample set is limited due to environmental conditions or difficult to obtain subjects, the concept of "purposive" or "representativeness" emerges here. It is also possible for a small sample set to represent its parent population. In literature there are many statistical methods to evaluate whether a small sample set can be considered as reasonable or not [30]. In their work, using sample completion rate and binomial distribution, eight successfully evaluated sample out of nine have been considered as exceeding benchmark within the margin of 70%. In other words, with 80% probability 70% of the samples would complete the evaluation validly. This outcome claimed to be suitable for publication. Considering 6 out of 6 samples are successful in our dataset since the produced error rates (presented in "Results and Discussion" section) are all lower than traditional Bouyoucos error margin (5%), same calculation produces 61% completion rate. This indicates that by 76% probability, 61% of the samples will be succeeded. Although it is barely sufficient evidence, promising results can drive us to overcome drawbacks due to small sample sizes in further researches.

Laser-guided Bouyoucos
In a preliminary work leading to this study, a simple system has been developed consisting of a LED (light emitting diode) light source, nine LDR sensors and an embedded system to determine the sand percentage in a soil sample. Various mathematical methods have been applied to the data recorded in the measurements, and as a result, a proposal has been made that such a system can be used to determine the sand ratio in soil [31,32]. Although the mean squared error value about 1% level obtained in these studies was acceptable, the same success was not achieved for silt and clay rates.
In the new system called Laser-Guided Bouyoucos (LGB), five homogenous red dot lasers instead of one LED and five LDR sensors are used. In the experiments, an Arduino microcomputer, a laptop computer, a beaker with 1000 ml, five homogeneous laser sources (flux density fixed at constant voltage) placed at one side of the beaker from bottom to top, and five LDR sensors at the other side of the beaker (each one facing to corresponding light source only) are used as shown in Figure 1.
As seen in Figure 1, the LGB is connected to a computer that gathers and interprets the data through an embedded circuit. The photographs of this experimental setup, which is produced using simple, easy-to-find and cheap equipment, are shown in Figure 2.
As seen in Figure 2, the beaker can move without affecting other parts of the system. This prevents the dislocation of the sensors during agitation of soil-water suspension. The LGB is calibrated by placing the empty beaker in between lasers and LDR sensors in an isolated environment from any light. By this calibration made before each experiment, the magnitude of light reaching to the LDR sensors are used for normalization of min and max values. Then, adequately stirred soil-water suspension has been put into the beaker for each soil sample. After the pure water is added until the suspension level reaches 1000 ml line, the mixture is stirred by a long stirring rod until it has a homogeneous contexture. The experiments are carried out complying with the same standards of Bouyoucos-hydrometer method (50 g soil preparation, sodium hexamethaphosphate solution, fixed temperature at 20°C using digital heating system, etc.). Lastly, the electronic system has been started for measurements. Figure 3 shows an overview of an experiment carried out in a dark environment.
As seen in Figure 3, the level of the bottom-most laser is considerably denser because of excessive particle accumulation. Each LDR sensor measures a total of 90 min light intensity by taking 2 measurements in a second (2 Hz). Thus, a data with 5 rows (for each sensor) and 10,800 columns is obtained for a single experiment. The experiment is repeated one more time with a new instance of the same soil sample in order to eliminate errors that can occur during measurement. After 2 repetitions for 90 min with 5 sensor data, as a result, data with 10 rows and 10,800 columns is obtained. This process is repeated all over again for 6 soil types, a data  set of 60 rows and 10,800 columns is created. For two soil types, light intensity changes by time are shown in Figure 4.
According to Figure 4, the fastest illumination rate is observed in the 5th sensor at the top-most level of the beaker. On the other hand, while the illumination is considerably high about 1500th sampling for sensor #4 in Figure 4(a), the same value is reached at 2500th sampling in sensor #4 of Figure 4(b). We can roughly say that, the amount of sand in Figure 4(a) is more than the one in Figure 4(b). However, when all of the sensor signals are plotted as shown in Figure 5, it is inferred that the important moments of the conventional Bouyoucos method (40th second and 2nd hour) alone cannot provide meaningful information.
In Figure 5, only the first 15 min of the signals are shown, so that the first 40 s looks better. According to the figure, almost no change has been observed at the top-most sensor (#5) up to 400th sampling (approx. 200 s). In the Bouyoucos analysis, however, the sand percentage can be determined by measuring only the first 40 s. This can be interpreted as that the laser light in the LGB system is blocked by small particles suspended in water and therefore did not reach to the LDR sensors, even at the top-most level of the beaker, for a long time (200 s). Nevertheless, this result shows that the estimation of the amount of sand can be possible by evaluating the whole of the signal. The data preparation process of the LGB is given below in an algorithmic manner. (1) Put the empty beaker in its place, and switch all lasers and LDR sensors on, check the light intensity values read by all LDRs. If they are not equivalent with a very small tolerance to each other, normalize it.
(2) Put the prepared soil and distilled water solution into the beaker, and fill the beaker up to 1000 ml line with distilled water and stir it adequately. (3) Start the electronic system, and store the values measured at 2 Hz for 90 min by LDR sensors on the computer. At the end of each experiment, a dataset with 5 rows (one for each sensor) and 10,800 columns (2 Hz × 90 min × 60 s) is obtained. (4) Drain and clean the beaker after each experiment.
Make two repetitions for each experiment of each soil sample. (5) If there is an untried soil sample, go to step 1 for this sample. (6) As a result, a dataset with 60 rows and 10,800 columns is generated.
It is necessary to extract some meaningful features from the dataset obtained as the result of the data preparation process before giving it as input to the prediction system. In this phase, some features of each row in the dataset are extracted by using most common used statistical functions in the literature such as standard deviation, entropy, mean, min and max. Because of the results found in the previous study [32], standard   deviation and entropy functions are chosen to extract features. Thus, the dataset is transformed into a new format with 60 rows and 2 columns. Then, a computerized prediction process is started by using this dataset.
For doing computational prediction, ANN models are the best-known methods in the literature. ANN models are designed to imitate the ability of the human brain generating logical responses to new situations by using previous experiences [33]. ANNs are used in many areas such as prediction, classification, identification, generalization, association. ANNs have the ability to learn and deduct themselves by keeping the sample situations they meet. The mathematical working principle of ANN is based on the update of randomly determined weight values at the beginning. When they are properly trained, they can produce decent results even with incomplete data (Figure 6).
The multilayered perceptron (MLP) is not only the most preferred ANN model, and at the same time it is a basis system for almost all of the deep learning tools that are popular today.

Results and discussion
In the LGB system, a traditional MLP structure with three layers (input, hidden, and output) is used. There are 10 neurons in the single hidden layer, and activation function is hyperbolic tangent for all neurons. For the validation, 6-fold cross-validation method has been applied. Data of 5 soil samples are used to train the MLP system, while the data of remaining 1 soil sample is used as test data to measure the success. This procedure is repeated a total of 6 times as each soil sample is to be test data once. The average of all test results is determined as the overall system performance. Since the MLP here tries to predict the amount of sand, silt, and clay by using two extracted features (standard deviation and entropy) from LDR sensor signals, the error of the system is calculated as the success according to law of low error for high success. Among many types of errors, the mean absolute error (MAE) is chosen for this study, and the differences between the expected value and the obtained value are summed for each soil component. As a summary, the LGB system measures signals by five LDR sensors, then extracts two features for each signal, and lastly gives these features to the MLP predictor. In order to decrease the size of input signals, at first, a study was made on which sensor combinations and how short the measurement duration can give the most successful result. Because of the five sensors, there are 31 (2 5 − 1) different sensors combinations. The MAE values obtained from the first 90 min segment for different time lengths of the measurement data and for the different sensor combinations for estimating amount of sand are shown in Figure 7, and the ones for estimating amount of silt are shown in Figure 8.
For both Figures 7(a) and 8(a), each sensor combination represents the decimal value of sensors used in binary form for that experiment, in other word, 5 means    Figure 7, both images show the minimum, the average, and the maximum of the obtained errors (min-avg-max among different sensor combinations for 7.a and min-avg-max among different time lengths for 7.b). According to Figure 7, although the lowest error is occurred at 4th minute with the 0.004% MAE, the 0.05% MAE value on the "average" curve at 5th minute is considered as a consistent result. Accordingly, it has been determined that a 5-min measurement is enough to detect sand in the soil samples used in the study. A similar approach has been performed for sensor combinations that produce the best results on the average curve. Accordingly, the four closest combinations starting from the lowest MAE value were 29th (11101), 27th (11011), 11th (01011) and 3rd (00011). Based on their binary representations, it has been determined that the most valuable one among five sensors is the top-most LDR sensor in the device (the last bit in the four combinations), and the most successful combination is all the sensors together except the second sensor from the top (29th combination). Figure 8 shows that although the overall image looks bad compared to the sand test, the silt estimates are also promising. Since the minimum MAE was reached at 81st minute, it can be said that approximately 81 min of measurement is sufficient to determine the silt percentage. Although reasonable MAE values have been encountered earlier than 81st minute, it can only be said definitely after being sure there is no more MAE drop off for the remaining plot. It also turns out that the most significant sensor is the top-most sensor, and the best sensors to detect the silt percentage are the second and third sensors from the top. When the system detects sand and silt ratios in detail, then clay amount can be calculated by using 1 − (sand + silt).
In order to show the success of the proposed system, an application is performed on the soil samples in Table 2. The best sensor combinations described above (the top-most sensor only for sand, the second and third sensors together for silt) and the shortest possible durations (5 min for sand and 81 min for silt) restricted the signals, then the features are extracted, and all data have been given to the MLP model (two experiments done with the same soil sample used for test and the remaining ones for train). The obtained estimations are presented in Table 3.
In Table 3, the contents of each soil sample (sand, silt, clay) used in the experiments are given in two different  columns for a better comparison. "P sand (%)" represents the sand amount predicted by the LGB and "B sand (%)" is the sand amount obtained by the Bouyoucos analysis.
According to the  Figure 10 for each soil component. As seen in Figure 10, R 2 values are greater than 0.99. In the literature, there are some studies used R 2 as success measurements, in which the results can be compared to each other. According to the Roberson and Weltje [25], laser diffraction-based devices (Fritsch Analysette 22C and Horiba Partica LA-950) outperformed the sediment based systems with the 0.96 and 0.97 for R 2 values, respectively. The approach used in [28] reached 0.87 for coarse fractions and 0.88 for fine fractions as the best R 2 results. Fisher et al. [27] proposed different thresholds ( < 9, < 26, and < 275 μm) for Particle Size Distribution in Laser Diffraction Methods which corresponds < 2, < 20, and < 200 μm thresholds in Bouyoucos-hydrometer method, respectively. But Lin's concordance correlation coefficient used instead of R 2 are only found as 0.82, 0.97 and 0.88 with respect to < 9, < 26, and < 275 μm thresholds. Although only a few soil samples are tested in our study, the results proved that the LGB system is a good idea to work on when considering the literature studies. It is clear that more experiments are required to ensure the reliability of the system. In addition, there are some concerns for clayey soils because of the reasons such as the reflection of light and the inability of rays to reach the LDR sensors on the opposite side of the beaker for a long time.

Conclusion
Conventional analysis methods such as the Bouyoucoshydrometer method are still used in the physical analysis of soils in fields requiring soil treatment such as agriculture, construction and mining. These methods are both laborious and time consuming as much as they are successful and reliable. It is a great disadvantage that analysis requires both an expert and a long time. Considering today's technology, this study has been carried out to demonstrate that soil texture can be analysed more effectively by utilizing the speed, power and success of computerized systems. Thus, a new device can be produced, which is easy-to-use, fast resulting, independent from the expert, with high accuracy and consistency.
In this study, a device called the LGB, which is based on five sensor-pairs and a simple embedded electronic system was designed to determine the percentage of sand, silt and clay in the soil. Because the light cannot reach the LDR sensors for a long time due to the clay component, the sand ratio estimated at 40 s in Bouyoucos experiments could not be estimated before of 5 min in this study. But the whole process is shortened. When the LGB is compared with the Bouyoucoshydrometer method, the results of the LGB system are almost identical to those of the Bouyoucos method.
Although the system needs to be tested with far more soil samples, the high success achieved is noteworthy. Furthermore, the modular design of the device allows for possible improvements. Thus, hardware revisions can be easily done to remove concerns about clayey soils with future works.

Disclosure statement
No potential conflict of interest was reported by the authors.