A comprehensive study on bid-ask spread and its determinants in India

Abstract Determinants of bid-ask spread have been explored significantly for low-frequency datasets in many developed markets. Researchers have identified share price, traded volume, market–capitalization, return volatility, and number of trades as the prime spread drivers. However, the validity of these determinants has not been explored in high-frequency trading. The present study attempts to articulate the validity of low-frequency determinants of the bid-ask spread in high-frequency trading. It used “bigglm” concept to estimate various determinants of spread, one of the study’s major contributions. The study found a positive relation between market–capitalization and spread, supporting the theory that a higher trading volume cannot decrease the bid-ask spread. Explanatory variables are all significant and show different impacts in different market conditions and sectors. For pooled data, share price, traded volume, quote return, trading frequency, and return volatility are in inverse relation with the spread. The study investigates sectoral determinants of the bid-ask spread to understand sector-specific influences. It also explores the influence of up and down market, settlement cycles, opening and closing intervals, and price and market-cap on determinants. The findings have significance regarding influence on the market microstructure for trading, designing of trading liquidity, and reduction of transaction cost.


PUBLIC INTEREST STATEMENT
Bid-ask spread provides information to traders on liquidity and profit margin in the stock market. The determinants of bid-ask spread have been explored in low-frequency data. However, it has not been examined in high-frequency data in the Indian stock market. The article examines various determinants of bid-ask spread using tickby-tick data, with a representative sample of 60 stocks involving six dominating sectors over a period of 3 months. The study finds the narrowest bid-ask spread for the IT-Telecom sector compared to a wide bid-ask spread in the case of sectors, such as Services & Healthcare and Automobile & Industrial Manufacturing. Similar to Madhavan (2000), the article finds that market-capitalization, stock price, return volatility, and trading volume are the major determinants of bid-ask spread along with stock quote return and number of trades which are the new determinants under high-frequency trading in the Indian context. The study will be helpful for traders while placing orders in an order-driven stock market.

Introduction
Advances in electronic trading together with the ability to process and store fast, large amounts of high-frequency data bring the concept of high-frequency trading (HFT) into the market. HFT is an investment strategy where stocks are bought and sold in a short time by a computer algorithm and are held for a very short time, normally milliseconds (or even microseconds) (O'Hara et al., 2019). This is done to profit from taking advantage of extremely short-term changes in the market. Transactions are so fast that many trading organizations place their servers close to the exchange computers to catch trading instructions at the speed of light. HFT possesses the following characteristics: a) it is programdriven trading by computers, b) it has an extremely high trading volume, and c) the rate of return is low, but on the whole, the return is stable. Stocks can be bought and sold multiple times within a trading day. However, understanding HFT is different for market participants, regulators, and academics. At the extremes, any intraday activity where the trading time is measured in microseconds is high-frequency. HFT improves market quality, reduces spreads, increases market depth, and enhances price discovery (O'Hara, 2015;Rojcek & Ziegler, 2016).
In HFT, traders' decisions to buy and sell depend on different transaction costs, such as processing fees, exchange fees, and liquidity costs. Processing and exchange fees are regulated by stock exchanges. Bid-ask spread, price impact, and opportunity cost are influenced by traders' expectations, market forces, and information asymmetry. High-speed and high-volume trading narrow the bid-ask spread in the market. Market microstructure literature (Madhavan, 2000) defines the bid-ask spread as the value paid for immediacy. In an order-driven market, traders execute trades by submitting market or limit orders. The bid price is the price an investor is willing to pay for an order, and the ask price is the price an investor is willing to receive from an order. The difference between the bid and the ask prices is known as the bid-ask spread, which is the compensation for immediacy. Spreads are pertinent to high-frequency traders, because a higher spread may generate higher profits. But spread may also occasionally imply greater risk (Ghasemiyeh et al., 2017), when traders cannot exit their positions at their desired price.
There are typically three types of bid-ask spread: the quoted spread, the effective spread, and the realized spread (Harris, 2003;Su & Tokmakcioglu, 2020). The quoted spread represents economic costs in terms of barriers to trade. It is the difference between the best ask price and the best bid price at a specific moment. An effective bid-ask spread is the difference between the actual price at which a dealer buys a security and that at which they subsequently sell it, or vice versa (Harris, 2003). An effective spread can measure marketable orders executed in relation to the market center's quoted spread and considers hidden and midpoint liquidity to be available. In other words, it is the gross underwriting spread, adjusted for impact by a common stock offering's announcement on a firm's share price (NASDAQ). A realized spread can be described as the theoretical profits of a liquidity provider. For this study, an effective spread is considered the measured bid-ask spread.
The theoretical literature identifies three main factors determining spread: inventory-holding costs (Amihud & Mendelson, 1980;Ho & Stoll, 1983), adverse-selection costs (Easley & O'Hara, 1992;Glosten & Milgrom, 1985), and order processing costs (Brock & Kleidon, 1992). Jha et al. (1998) suggest that order processing costs are incurred from orders of different sizes and frequencies for the Indian equity market. The adverse-selection cost increases in the presence of traders with rich information about a security's value. But the inventory-holding cost is present whenever there is uncertainty about a security's future payoffs in an equally information-distributed market. Each cost component depends on various factors that call for an analysis of the spread determinants.
This study initiated an extensive literature review focusing on the determinants of the bid-ask spread in HFT in the Indian market. It focuses on methodologies to empirically examine the determinants of the bid-ask spread in HFT under different market conditions. This study is motivated by the recent movement of HFT in the Indian stock market as well as by previous studies by Stoll (1978), Madhavan (2000), Giouvris and Philippatos (2008), Benston and Hagerman (1974), and Kim and Ogden (1996). The study examines the effects of the various determinants for each sector to understand the sectoral effects. Gkillas et al. (2019) found a day-of-the-week effect in the adjustment speed of spreads. This study analyzes settlement day-wise as well as opening and closing interval-wise analysis to understand the effects of days and times on spread determinants.
The paper is organized as follows. Section 2 presents the literature review on the different aspects of the bid-ask spread and its determinants under low-frequency as well as HFT. Section 3 discusses the objectives and hypotheses of the study. Section 4 explains the methodology used to estimate the bidask spread and its determinants. Details on the data selected from the National Stock Exchange and the study time period are provided in section 5. The study findings are given in section 6. Section 7 summarizes the findings of this study and Section 8 presents managerial implications of the findings. Section 9 points out the limitations, and the last section concludes the paper.

Liquidity and spread
The bid-ask spread comprises profit and transaction cost; it indirectly measures liquidity or immediacy (Demsetz, 1968). An investor may face difficulty in buying or selling a security in the absence of a significant number of trades. Fewer liquid assets, such as small-cap stocks, generally have a higher spread compared to large-cap index stocks. A quoted spread is the generic measure of the bid-ask spread. In both quote-and order-driven markets, quote prices are just the negotiation starting point. But trades may or may not happen at the quote price itself. Frequently, trades occur within or outside the quote prices. So in these scenarios, rather than considering the quoted spread as a measure of the execution cost, an effective spread is considered a useful measure of the execution cost. An effective spread is the difference between the trade and mid-quote price and is potentially a superior measure of the execution cost (Rath, 2004). The easy availability of order, quote, and transaction data from different stock markets in different countries has stimulated research on bid-ask spread determinants. There is a significant number of studies on developed countries' stock exchanges, such as the New York Stock Exchange (NYSE), LSE, NASDAQ, FTSE, Paris Bourse, and Sydney Stock Exchange as well as on stock markets of countries like Saudi Arabia (SSM), Brazil, and India (NSE) (Al-Suhaibani & Kryzanowski, 2000;Chakrabarty & Jain, 2005;Minardi et al., 2006). Glosten and Harris (1988) present the asymmetric information model, where the bid-ask spread is broken into transitory and adverse-selection components. Results show the spread is determined by trade size and the adverse-selection component. Menyah and Paudyal (2000) find that along with the order processing cost, the spread is influenced by the inventory cost in a quote-driven market. Giouvris and Philippatos (2008) compare components of the bid-ask spread and their determinants for FTSE100 and FTSE250 stocks. Determinants such as the number of trades have an impact on the bid-ask spread as well as on asymmetric information and order processing cost components. Trading volume has a significant effect on spread and its components for small stocks rather than big stocks. Volatility also has positive effects on all the components. Plerou et al. (2005) show that spread is influenced by order flow and volatility. McInish and Wood (1992) consider trading activity, risk, information, and competition as determinants of spread in NYSElisted stocks. They find that spread has an inverse relation with trading size and a positive relation with asymmetric information and risk. Kim and Ogden (1996) find a positive relation of spread with negative information, volatility, and trading volume. To examine intertemporal variations in the spread, they perform time-series and cross-sectional tests, finding that the frequency of trades and order size are related to firm size. Kim and Ogden (1996) results also show that although volatility has a positive relation to spread, it also has a relation to firm size, trade frequency, and order size. But past trade frequency has a negative relation to the bid-ask spread. Madhavan (2000) finds that price inverse is an important determinant of the cross-sectional variation in stock spreads. The author's empirical findings confirm that market-capitalization, price, stock volatility, and trading volume are the prime determinants of spread. The study also provides empirical evidence of the nonlinear effect of volume and price on spread. In the case of the Brazil market, Minardi et al. (2006) find positive relationships between spread, stock price, and liquidity; spreads are negatively related to conventional liquidity measures, such as stock turnover, the volume traded, and the number of trades. Alzahrani (2011) finds volume and volatility to be in positive relation with effective bid-ask spread, while trading frequency is found to be in inverse relation in a study of the Saudi Market, using high-frequency data. The study has achieved overall very low Rsquared fitting of the models for different metrics of bid-ask spread. Huang (2004) discusses the determinants of information asymmetry components and order processing costs in his comparative study of the Singapore and Taiwan stock exchanges and confirms a positive relation between the bid-ask spread and price level and return volatility. On the other hand, a rise in the number of transactions and the total volume of trading can lower order processing costs. Based on Demsetz's (1968) and Hasbrouck's (2006) studies, Huang (2004) suggests a negative relation between order processing costs and trading activities. Riedl and Serafeim (2011) find that a higher information risk will lead to higher levels of information asymmetry, leading to increasing bid-ask spreads across financial instruments. Fender and Lewrick (2015) conclude that the increased presence of informed traders significantly influences the spread.

Determinants of spread
Empirical research by Demsetz (1968) is the basis for research in investigating determinants of the bid-ask spread. Demsetz examined the impact of transaction costs on transaction rates (i.e., the number of transactions over a given time period) on NYSE stocks. Benston and Hagerman (1974) estimate the bid-ask spread using trading volume, price volatility, the number of market makers, and the number of transactions as independent variables. They find that spread is positively related to price volatility and is negatively related to competition in market making. Benston and Hagerman (1974) examined the nonlinear relationship between spreads with share price, number of stockholders, number of dealers, unsystematic risk or inventory holding risk, the number of transactions per security, and insider losses. Copeland and Galai (1983) empirically observe relations between the bid-ask spread and variables like price volatility, asset price level, and volume. The volume's negative effect on spreads is also argued by others. Easley and O'Hara (1992) and Brock and Kleidon (1992) find a positive relationship between spread and volume. Cohen and Maier (1986) find a positive relationship between transaction price volatility and bid-ask spreads when price change is measured over short intervals, such as daily intervals. Similar results have also been reported by Tinic and West (1972). However, Chordia et al. (2001) find a negative effect of volatility on spreads. Wang et al. (1994) find that bid-ask spread and price volatility are jointly determined and positively related. The results indicate that the major determinants of bid-ask spreads are price risk, volume per transaction, and competition in market making. Wang (1999) examines components of the bid-ask spread in the Sydney Futures Exchange and mentions that the bid-ask spread has a higher adverse-information component and a lower order-processing component. Hussain (2011) finds that a contemporaneous and lagged trading volume and the bidask spread have a statistically significant effect on return volatility. Amihud and Mendelson (1987) demonstrates that the bid-ask spread has a positive effect on return variance. They also show (Amihud & Mendelson, 1986) that asset returns increase with bid-ask spreads. Chen and Kan (1996) also adopt Amihud and Mendelson (1986) model but cannot find any reliable relation between the CAPM risk-adjusted return and the relative bid-ask spread. Demsetz (1968) finds a positive relationship between the share price spread, while McInish and Wood (1992) find a negative relationship. Nath et al. (2017) find a significant relationship between the bid-ask spread and its determinants, such as order imbalance, trade execution, volatility, and the trading volume in the case of the intraday trading of the government securities market. Rahman et al. (2002) find a positive relation between the intraday variations of the bid-ask spread and intraday return volatility. Zhang et al. (2008) find that the bid-ask spread increases with return volatility. Paital and Sharma (2016) investigate the relationship between return volatility, trading volume, and the bid-ask spread in the high-frequency data of the Indian stock market. The authors find a weak relation of volatility with both volume and spread and conclude that the Indian market is informationally inefficient. Ripamonti (2016), in studying the emerging market of Brazil, finds a relation between the asymmetric information measure of spread and variables such as the market-to-book ratio, debt on equity, size, and return. In the Chinese futures commodity market, Liu et al. (2016) find a positive relationship between bid-ask spreads and volatility and a negative relation with trading volume. In the Indian market, Chakrabarty and Jain (2005) examined variables affecting the bid-ask spread of NSE-listed stocks using only day-end data. The results confirmed a negative relationship between spread and trading volume, while volatility is positively related to spread. These authors' findings could be different, given high-frequency intraday data.

Research objectives and hypotheses
The above discussed literature presents an in-depth review of the determinants of bid-ask spread. The literature outlines share price, market-capitalization, the number of trades, the square of return, and trading volume as the prime drivers of bid-ask spread. The majority of the studies were conducted in developed markets with a low-frequency dataset.
Analyzing the determinants of bid-ask spreads in the Indian market using high-frequency datasets is an evolving area of current research. This study will contribute to existing literature by studying determinants of the bid-ask spread under HFT in the Indian stock market for different sectors, market conditions, settlement periods, pools of stocks based on characteristics (volume, share price, and market-capitalization), and different timeframes. The study has framed the following objectives to examine various determinants of the bid-ask spread in HFT: (a) Using "pooled bigglm, (bounded memory linear and generalized linear models)," the study will examine various determinants of the bid-ask spread.
(b) Sectoral analysis will be conducted to articulate sector-specific influence on bid-ask spread determinants.
(c) The study will conduct settlement day-wise bid-ask spread analysis to understand the impact of settlement days on bid-ask spread determinants.
(d) The first and last ticks of the day carry different information than the other ticks of the day. Hence, the study uses the first and last ticks of the day to analyze bid-ask spread determinants.
(e) The study will separate stocks on the basis of low and high price and low and high marketcapitalization.
This study also formulates the following hypotheses to be tested during the study: (a) HT 1: Stocks with large trading volumes will have narrower bid-ask spreads than those of low trading volume.
HT 2: The bid-ask spread is narrow when volatility is low and risk is at a minimum.
HT 3: For low-priced stocks, the bid-ask spread will tend to be larger.
Using "pooled bigglm," the study will examine determinants of the bid-ask spread separately for each data set.

Estimations of bid-ask spread Quoted Spread
As per Bessembinder and Venkataraman (2010), a quoted spread is computed using equation 4.1: where S Quoted it is the quoted spread, a it is the best quoted ask price, and b it is the best quoted bid price, where t=1,…,n is the stock being observed, t=1,…,T is the time, Quotedspread it is the quoted spread at time t, Askprice it is the ask price at time t, and Bidprice it is the bid price at time t.
The quoted percentage spread is the measurement of the execution cost, described on a percentage basis. Studies (Benzennou et al., 2020;Bessembinder & Venkataraman, 2010) propose computing the quoted percentage spread as follows (equation 4.2): where S %Quoted it is the quoted percentage spread, a it is the best quoted ask price, and b it is the best quoted bid price, where i=1,…,n is the stock being observed, is the time, QuotedpercentageSpread it is the quoted percentage spread at time t, Askprice it is the ask price at time t, Bidprice it is the bid price at time t, and Midquote it is the mid-quote at time t, computed as Askprice it a it

Effective Spread
The effective spread is twice the absolute value of the difference between the actual trade price and the midpoint of the market quote (i.e., between the quoted bid price and the quoted ask price), divided by the midpoint between these two prices. Using the trade indicator (buy or sell), in Roll (1984) and Huang and Stoll (1997), the effective spread is computed from the difference between the trade price and the quoted midpoint: where t=1,…,n is the stock being observed, t=1,…,T is the time, P it is the transaction price, P M it is the midpoint of the posted bid and ask quotes, and D it is the trade indicator, whose values are "+1" for a buyer-initiated order and "-1" for a seller-initiated order. P M it is the quote midpoint P M it ¼ where P A it is the posted ask price and P B it is the posted bid price.

Classification of Buy and Sell Transactions
In the high-frequency quote and trade data of the NSE, taken from the Bloomberg server, there are no flags for classifying whether a trade was buyer-initiated or seller-initiated. In studying the market microstructure, this classification is important, as it is supported by different empirical studies. To calculate the effective spread, first, the computation of the trade indicator is needed, to understand the impact on the spread from the buyer/seller initiation of trade. This classifier is the trade indicator. There are two approaches to computing this indicator: 1) the tick test, comparing the trade price to adjacent trades and 2) comparing the trade price to the quote (bid and ask) prices of the prevailing quote. Lee and Ready (1991) take the tick test approach, defining the trade indicator direction of the trade initiation (bid or ask). Value is computed as (+1) for buy-initiated orders and (-1) for sell-initiated. Lee and Ready (1991) propose an algorithm for computing trade direction, based on the tick test. The computation of trade direction is based on several conditions comparing trade prices to preceding trades. Trades can be categorized into four categories: uptick, downtick, zero-uptick, and zero-downtick. If the current trade price is higher (lower) than the previous price, it is called uptick (downtick). If the current price is equivalent to the previous price, it is zero tick. In case of a zero tick in a previous trade, a check is needed for the last price change. If that price change was an uptick (downtick), the current trade is called a zero-uptick (zero-downtick). Lee and Ready (1991) argue that a better result is obtained from a tick test than from other tests. Using a tick test of preceding trades, buy orders are identified: where P t is the trade price and P m is the mid-quote price.
In high-frequency data, one major finding is that of duplicate timestamps in the data. To handle that, this study selected the best bid and ask from the quote data and the mean trade from the trade data for computation from duplicate timestamp ticks.

Determinants of bid-ask spread
For high-frequency determinants of bid-ask spreads, the study used a modified Madhavan (2000) model to estimate the following regression equation for measuring the tick spread from the big data perspective: where for security i at time t, is the stock spread and dependent variable.
Regarding the independent variables, M it is market-capitalization, P it is price, QR it is the quote return of the stock computed as, NT it is the number of trades, RV it is the volatility of return, and V it is the trading volume. The quote return is estimated as the absolute value of the return at day t, calculated from the mid-points of the bid-ask quotes. Brown et al. (2010) also tested a similar model, mentioned above, for the bid-ask spread.
Several factors influence the bid-ask spread, the most evident being trading volume. Stocks with large trading volumes will have narrower bid-ask spreads than those that are infrequently traded (Brock & Kleidon, 1992;Copeland & Galai, 1983;Easley & O'Hara, 1992;Kim & Ogden, 1996;McInish & Wood, 1992). Stock with a low trading volume is considered illiquid, demanding more compensation in the form of a larger spread. Copeland and Galai (1983) and McInish and Wood (1992) found a negative relation between spread and trading volume. Another important aspect affecting bid-ask spread is return volatility. This usually increases during periods of rapid market decline or advancement. During high-return volatility, the bid-ask spread is much wider, because liquidity providers want to book profit, and when volatility is low and uncertainty and risk are at a minimum, the bid-ask spread is narrow. Many empirical studies have documented a positive relation between bid-ask spread and volatility (Cohen & Maier, 1986;G. H. Wang et al., 1994;Kim & Ogden, 1996;Tinic & West, 1972;Zhang et al., 2008). But Chordia et al. (2001) found a negative relation. While McInish and Wood (1992) found a negative relationship between spread and price, Minardi et al. (2006), in the Brazilian market, found a positive relation. With a higher trading frequency, the bid-ask spread tends to be narrow, as a higher trading frequency means a large trading volume for a stock. Minardi et al. (2006) found a similar result in the Brazilian market. The empirical literature cannot establish conclusive evidence for the impact of market-capitalization, price, and volume on the bid-ask spread.
For analysis of large-scale data, the study has followed the Bigglm package in the regression and classification models. This package was specifically designed by Lumley (2011) to build generalized linear models on big data. This package takes the following steps to reduce the load on memory and process large-scale datasets efficiently: (a) data is loaded into memory chunk by chunk; (b) after the last chunk of data is processed, the model statistics must be updated; (c) after disposition of the last chunk, the next chunk is loaded; and (d) these steps are repeated until the end of the file.
Any algorithm is classified based on the amount of time or space it requires (Narahari, 2002). The growth of time/space requirements can be specified as a function of the input size (p). There are two kinds of algorithm complexity: time complexity, the execution time of the program as a function of the size of the input; and space complexity, the amount of memory required during the execution as a function of input size. Big O notation (O) is a convenient way to describe the complexity of an algorithm or program. Time taken by linear regression is O(np 2 +p 3 ), which is not so easily reducible. In R, the implantation for the same consumes memory O(np+p 2 ), but building the model matrix using chunks (Miller, 1992) can reduce the memory (n is sample size and p is number of variables). Bigglm has generally been used in studies for the computation of big data (Muggeo & Adelfio, 2010;Myers et al., 2016;C. Wang et al., 2016).

Data and period of study
Tick-by-tick data for the selected stocks listed on the NSE CNX500 covering the period of June 2016 to August 2016 was collected from the Bloomberg servers and was processed as a large-scale database. The data covered major stocks from dominating sectors, such as Consumer Goods, Financial Services, IT & Telecom Services, Services & Healthcare, Pharmaceuticals, and Automobile & Industrial Manufacturing. Sixty companies were considered for the study, ten from each market sector. The period also consisted of the ups and downs of market conditions of Indian equity markets. The trade and quote data collected had more than 100 million ticks in total. After a cleaning process, 85 million data points remained for processing. This satisfied all seven Vs of big data: volume, velocity, variety, variability, veracity, visualization, and value. The HFT technological infrastructure enables it to compete on speed, time, and superiority, an advantage over lowfrequency trading (Seddon & Currie, 2017).
Besides using the pooled dataset, this study also includes datasets for different time periods and market conditions for analysis. Sector-wise analysis was done to capture the behavior and information absorption capacity of different sectors. This study also brings settlement-day-wise study of bid-ask spread to capture different trading behaviors during different settlement periods of the month. To capture behavior of small market-capitalization stocks and low price stocks, separate spread determinant analyses were conducted on the subsample dataset.

Summary statistics: effective spread of tick-by-tick data
Effective spread was measured for 60 stocks over 64 trading days for 6 different sectors from 1 June 2016 to 31 August 2016. Table 1 represents the sector-wise estimates of the mean and standard deviation of the effective spread for the time period. Table 2 and Table 3 represents summary statistics and correlation analysis of independent variables.

Determinants of bid-ask spread
Determinants of the bid-ask spread were discussed in the literature review. The study used a modified version of the Madhavan (2000) model to study these determinants, where the effective spread was considered a dependent variable, and share price, traded volume, volatility, marketcapitalization, number of trades, and squared return were independent variables. The study analyzed spread determinants across sectors, up and down markets, settlement periods, beginning and closing intervals of the day, high-and low-priced stocks, and market-cap wise stocks.
Explanatory variables of the bid-ask spread were found to be less volatile; there was no significant correlation among them.

Bid-ask spread determinants: pooled datasets
All explanatory variables of the pooled regression of the bid-ask spread (Table 4), along with their expected relations, were found to be significant, as per the literature (Benston & Hagerman, 1974;Giouvris & Philippatos, 2008;Kim & Ogden, 1996;Madhavan, 2000;Stoll, 1978). The pooled regression satisfied the basic requirements of a robustness test. For pooled datasets, we performed an ARCH effect to test heteroskedasticity and also performed an augmented Dicky-Fuller test to examine the stationary property. These tests were conducted for all pooled datasets.
The negative relation of spread to number of trades and volume implies a lower bid-ask spread for a higher number of trades and volume, consistent with the literature (Benston & Hagerman, 1974;Kim & Ogden, 1996, in the NYSE;Giouvris & Philippatos, 2008, in the LSE). This supports Hypothesis (HT) 1, stating that with large traded volumes, spread tends to be narrower. Similar to Stoll (1978) for NASDAQ, Jegadeesh and Subrahmanyam (1993) for the NYSE, and Heflin and Shaw (2000), the study found an inverse relation between spread and stock price in the Indian market. A negative relation between spread and quote return is consistent with Narayan et al. (2014); however, the negative relationship indicates that the bid-ask spread does not price the risk. This also supports Hypothesis (HT) 3, implying that for lower-price stocks, spread tends to be wider. Return volatility has an inverse relationship to spread, indicating that it reduces the spread. A positive relation between market-capitalization and spread indicates that despite a higher trading volume, spread does not decrease. This is consistent with Kim and Ogden (1996) and Heflin and Shaw (2000).

Bid-ask spread determinants: pooled datasets-sector-wise analysis
Analysis of determinants for bid-ask spread was conducted sector-wise to capture the impact of sector-specific information and behavior on spread. The study conducted pooled regression analysis for Consumer Goods (sector 1), Financial Services (sector 2), IT-Telecom (sector 3), Services & Healthcare (sector 4), Pharmaceuticals (sector 5), and Automobile & Industrial Manufacturing (sector 6), where each sector had 10 identified stocks having a larger market share.
In the sectoral effective spread graph, Figure 1, differences in spread levels are clearly visible among different sectors.

Source-Estimated Figure
In sector 2 (Financial Services) and sector 4 (Services & Healthcare), there were more ups and downs in spread during the study period. There was no common trend in spread among different sectors of the economy, and each sector had its own way of movement; these needed to be analyzed separately. This study tried to capture the impact of different sectoral behaviors on effective spread.
The sector-specific pooled regression revealed similar results in a larger perspective; however, price had a direct relation to spread in the Automobile & Industrial Manufacturing sector, unlike other sectors. Similarly, quote return was statistically insignificant in the Consumer Goods sector, and return volatility had a direct relation in the Automobile & Industrial Manufacturing sector, unlike in other sectors. Price as an explanatory variable comparatively had more explanatory power in the Pharmaceuticals sector than in others. Similarly, quote return as an explanatory variable comparatively had less explanatory power in the Services & Healthcare sector than in others. The pooled regression satisfied the basic requirements of a robustness test. Sector-wise analysis captured the behavior and information absorption capacity of different sectors.

Bid-ask spread determinants: pooled datasets-settlement days-wise analysis
Investors square off their positions, and delivery of stocks occurs on settlement days. This regulatory requirement might have some impact on spread. To observe this, the study conducted a settlement day-wise analysis of spread. For June 2016, the last week of settlement days was considered; for July 2016, the mid-month settlement week was considered; and for August 2016, the first week of the settlement period was considered. This covered the full cycle of the settlement period over 3 months to discern different behavioral changes in spread.
Unlike pooled regression, price had a positive relation to spread in the July/August settlement days. Quote return had a negative relation to spread in the beginning of the August settlement days, not found in the mid-and end-settlement days of June/July. Meanwhile, quote return had a higher explanatory power at the beginning of the August settlement days. Return volatility had a significant inverse relation in the beginning settlement days of August compared to other settlement days of June/July. Thus, the study concludes that the settlement period has some impact on the spread. The pooled regression satisfied the basic requirements of a robustness test.

Bid-ask spread determinants: pooled datasets-opening and closing interval-wise analysis
Investors' buying and selling behaviors are pronounced during the day's opening and closing. To factor this behavior into the spread, separate pooled regression was conducted for the opening and closing 30 minutes. Source-Estimated Figure: June-August, 2016 Table 3. Correlation of independent variables Table 3  During the opening and closing 30-min intervals in different sectors, the market exhibits different trends, as per the graph. While the opening 30 minutes show a smooth trend, though differently for different sectors, the closing 30 minutes exhibit abnormal behavior. To capture the opening and closing interval behaviors of spread, the study separately analyzed them. Table 7 represents the relationship between spread and various determinants Table 8.
Price had a direct relation to spread in the closing interval, opposite from the opening interval. Similarly, the square return had a direct relation to spread in the closing interval but an inverse relation in the opening interval. The pooled regression satisfied the basic requirements of a robustness test.

Price range-wise analysis
High-priced and lower-volume stocks generally have a higher spread, while low-priced and highvolume stocks have a low spread. Usually, low-priced and high-volume stocks are more prone to spread changes. To capture this effect of price on spread, the study conducted analysis of determinants for the bid-ask spread on different price division-wise stocks.
As per the pooled regression, the price range-wise determinant analysis revealed similar results in a larger perspective, except for a difference in the relation of quote return to spread. Quote return had a direct relation to spread for mid-and low-priced stocks but an inverse relation for high-priced stocks. Also, price, traded volume, return volatility, and market-cap as explanatory variables comparatively had more explanatory power for high-priced stocks than for other stock price ranges. The pooled regression satisfied the basic requirements of a robustness test Table 9.

Market cap-wise analysis
High market-capitalization can be from either a high volume or high price of a stock. Similarly, low market-capitalization can be from either a low volume or low price of a stock. The study conducted different market-capitalization analyses on spread to capture the behavior of small stocks on spread determinants. Spread determinant regressions for high, mid and low market-capitalization stocks revealed similar results except in the relations of volatility and number of trades to spread. Though quote return and number of trades had an inverse relation to spread for mid-and small-market-cap stocks, both had a positive direct relation to spread for high-market-cap stocks. The study results  for low-priced stocks also support Hypothesis (HT) 3, where for lower-priced stocks, the spread tends to be wider. The pooled regression satisfied the basic requirements of a robustness test.

Summary of findings
In this study, bid-ask spread and its determinants have been assessed under different market conditions as well as stock level properties. Past studies (Chakrabarty & Jain, 2005;Giouvris & Philippatos, 2008;Kim & Ogden, 1996;Madhavan, 2000) have found a strong relation between spread and determinants like volatility, trading volume, frequency of trades, order size, and stock price. The empirical findings by Madhavan (2000) have confirmed that market-capitalization, price, stock volatility, and trading volume are the prime determinants of spread. This study finds return volatility, share price, trading volume, and number of trades to be significant determinants of spread, as per the prior studies (Benston & Hagerman, 1974;Giouvris & Philippatos, 2008;Kim & Ogden, 1996;Madhavan, 2000;Stoll, 1978). The negative relations of spread with the number of Table 6. Bid-Ask spread determinants: pooled data sets-settlement days-wise analysis method: pooled bigglm: settlement days-wise analysis; Period: June-August, 2016; dependent variable: effective spread Table 6 represents Settlement Day wise analysis of determinants of effective spread. We randomly chose three different settlement weeks from 3 months covered under this study to analyse the impact of determinants in various phases of settlement cycles. Variables like quote return and volatility tend to act differently in different settlement periods. This confirms the effect of settlement periods on spread management. For all three experiments, we have seen similar level of explained variation trades and trading volume imply that bid-ask spread will be lower for higher numbers of trades and volume, consistent with the findings from Benston and Hagerman (1974), Kim and Ogden (1996), and Giouvris and Philippatos (2008). We found an inverse relation between spread and stock price. Similar relations were established by Stoll (1978) in NASDAQ, Jegadeesh and Subrahmanyam (1993) in the NYSE, and Heflin and Shaw (2000). Higher volatility increases the risk, which indicates that bid-ask spread is not pricing the risk. Consistent with findings from Narayan et al. (2014), we found return volatility to have an inverse relationship with spread, indicating thereby that return volatility reduces the spread. As spread has a positive relation to market-capitalization, it indicates that despite higher trading volume, spread does not decrease. This is consistent with findings by Kim and Ogden (1996) and Heflin and Shaw (2000).
The sector-specific analysis reveals similar results in a larger sense; however, price has a direct relation with spread in the Automobile & Industrial Manufacturing sector, unlike in other sectors. Return volatility has a positive relation with spread in the Automobile & Industrial Manufacturing sector, opposite to the other sectors. Price, as an explanatory variable, comparatively has more explanatory power in the Pharmaceuticals sector than in other sectors. Settlement days-wise Table 7. Bid-Ask spread determinants: pooled data sets-opening and closing interval-wise analysis method: pooled bigglm: opening and closing interval-wise analysis; period: June-August, 2016; dependent variable: effective spread Table 7 represents Settlement Day wise analysis of determinants of effective spread. We randomly chose 3 different settlement weeks from 3 months covered under this study to analyse the impact of determinants in various phases of settlement cycles. Variables like quote return and volatility tend to act differently in different settlement periods. This confirms the effect of settlement periods on spread management. For all three experiments, we have seen similar level of explained variation analysis of spread indicates that the settlement periods have some impact on the spread in terms of different behaviors by determinants. In the price-range wise analysis, the study found that volatility has a positive relation with spread in the case of mid-priced and low-priced stocks, but an inverse relation in the case of high-priced stocks. Stock price, traded volume, quote return and market-capitalization have comparatively more explanatory power in the case of high-priced stocks than other price ranges. Similarly, market-capitalization specific analysis reveals that volatility and number of trades have an inverse relation with spread in the case of mid-cap stocks.

Managerial implications
Traders generally manage inventory of cash and stocks with continuous buying and selling of stocks to manage the inventory in a dynamic setup. Buy orders reduce the cash inventory, and sell orders increase the cash inventory. While placing the buy and sell order, traders generally influence their inventories with a motive of increasing the trading profit. In this context, our article provides information to traders on improvement of trading profit through spread management. The findings reveal that exchanges can manage the bid-ask spread by improving trading volume and trading frequency. The study has found that higher priced stocks have low trading volume and high spread. In this context, exchanges should design a policy for the splitting of high-priced stocks to reduce their spread. The study has made a significant contribution in estimating and analyzing the high-frequency bid-ask spread of selected CNX 500 stocks, across the six sectors, listed on the National Stock Exchange of India (NSE). It discerns the behavior of spread and assesses the factors impacting such behavior on the basis of market-microstructure theories. The results are more pervasive than those of previous studies. The sectoral analysis of spread provides information to traders for the systematic management of buy and sell orders for the IT-Telecom sector where the spread is very narrow, while the Automobile & Industrial Manufacturing sector shows a widened spread. The sector having a widened spread might be at a high risk of illiquidity, and our article provides empirical evidence to traders to manage the trading profit of high-spread stocks. The traders should be very careful while churning the inventory of stocks. To maximize trading profit, the timing of buy and sell is not symmetrical in cases of large-cap stocks, due to a positive relation between quote return and spread. However, in cases of mid-cap stocks, the timing of buy and sell is very important, as quote return and spread are inversely related. Exchanges should also advise companies with high-priced stocks to adopt share splitting to improve trading and to reduce their spread. Exchanges should specifically design monitoring parameters for very low price and small market-capitalization stocks, which are more prone to high spreads and low market liquidity. Since the bid-ask spread indicates the trading and liquidity levels in the market, exchanges should monitor transaction costs and information asymmetry, which have significant impacts on the bid-ask spread.

Limitations and future directions
The study has a few limitations with respect to the sample data and modelling. We have not considered the non-linearity aspect of financial time series in the study. The non-linearity aspect of time series data can be factored to understand the evolvement of the bid-ask spread in the stock market. The study was confined to the cash market only and ignored the implications of options and other derivative segments on the bid-ask spread. But, the study can be extended to capture the intraday bid-ask spread and expected returns in the options, futures, and other derivatives segment of the Indian equity market. Apart from that the actual behavior of traders through primary data collection was not captured in this article. This could be another potential area of    Figure 2 shows the variation in average effective spread in different sectors during 9:30 am to 10 am in stock market. Consumer Goods sector possesses an interesting shape like "W" during this period.
research in future. A deep dive analysis for selected sectors (Birjandi et al., 2019) can bring out more factors like information asymmetry, company conditions, opinions etc., which were beyond scope of this study.

Conclusion
The study examined the determinants of the bid-ask spread in HFT for selected sample stocks of the NSE, India. According to the authors' knowledge, this is the first study that has considered such a deep analysis of the determinants of bid-ask spread in the Indian order-driven market under HFT. This study is unique in terms of analyzing the effects of various market conditions and stock properties on spread and its determinants. The literature has empirically established return volatility, share price, trading volume, number of trades, and quote return to be significant determinants of the bid-ask spread in the NYSE, NASDAQ, and many emerging stock exchanges. The empirical analysis in this study reveals that in the Indian equity market, return volatility, share price, trading volume, and number of trades are significant determinants of spread. This study used the "bigglm" method to model spread determinants, an approach that is unique and that has not been performed in earlier studies. This method takes care of the effects of big data when running regression models.
The findings indicate that the bid-ask spread is not pricing the risk in the Indian market. The positive relation between market-capitalization and spread reveals that higher trading volumes cannot decrease spread. The month-wise analysis on the pooled datasets revealed that explanatory variables have different impacts on spread in different trading time periods; the pricing of information on the spread is not uniform across time periods. The results of sector-wise pooled regression were not similar across sectors, indicating that sector-specific information is captured in the spread determinants. The impact of brisk trading during settlement days reveals that beginning and end settlement cycles have a larger impact on spread. The analysis found that opening and closing intervals both have an impact on spread. The study's empirical analysis reveals that price and return volatility have direct and inverse relations, respectively, with spread, with respect to closing intervals, while the relations are the opposite for opening intervals. The spreads of midpriced and low-priced stocks are more prone to return volatility. Price, traded volume, volatility, and market-cap as explanatory variables have comparatively more explanatory power for highpriced stocks than for other stock price ranges. The analysis of spread determinants could help stock exchanges design the market microstructure for trading. The findings on NSE-listed sample stocks reveal that exchanges need to enhance trading volumes, free float shares, and reduce return volatility to reduce spread.  Figure 3 shows the variation in average effective spread in different sectors during closing 30 minutes of stock market (3 pm to 3:30 pm). Almost all the sectors' average effective spread are going up during closing of market as captured in the study period of June to August 2016. So, towards very end of stock market closing, effective spread tends to be widen which is due to unavailability of buyers or abruptly squaring-off the positions by few traders.