Exploring the NFT market on ethereum: a comprehensive analysis and daily volume forecasting

With the popularity of Non-Fungible Tokens (NFTs), which has now become a financial market that has attracted extensive attention worldwide. A large number of investors and creators are flocking to this emerging market in search of investment opportunities. Nowadays, many studies have analysed this phenomenon from an economic perspective. However, we know little about the players and ecosystem characteristics of this market. To fill this knowledge gap, we first provide a processed large-scale dataset of the Ethereum blockchain-based NFT market, containing more than 80 million NFT transaction records from January 2018 to April 2022. Second, we constructed the NFT creator graph (NCG) and NFT holder graph (THG) to delve into the characteristics of the NFT market. Further, we analyse the market preferences and trends using statistical methods to reveal the development trends of the NFT market. Finally, we focus on predicting the transaction volume of the NFT market and analyse the influence factors. This study provides data support for participants and researchers to explore the NFT market, while our analysis promotes a deeper understanding of the NFT market among the public.


Introduction
Since its inception in 2009 (Nakamoto, 2008), Bitcoin, one of the most popular cryptocurrencies, has sparked widespread interest from investors, developers, and researchers, placing cryptocurrencies in the spotlight (Sarode et al., 2023).In this realm, blockchain technology has revolutionised our understanding of trust, transparency, and security in financial transactions (Y.Wang et al., 2019;J. Zhou, Tian, et al., 2023).By decentralising record-keeping, this system enhances safety, reducing the risk of fraud and ensuring the integrity of asset documentation during ownership transfers (Dang et al., 2023;Taherdoost, 2023).With the rise of cryptocurrencies and the transformative impact of blockchain, NFT, as an encrypted digital asset based on blockchain technology, debuted in 2014 with "Quantum", introduced by developer Kevin McCoy and artist Anil Dash (Dash, 2021).The NFT market gained significant traction in 2017 with CryptoKitties. 1 Since then, it has grown steadily.The introduction of the ERC-721 protocol 2 in 2018 further structured and unified the NFT market.Nowadays, NFTs have become a preferred platform for artists, and the market has seen exponential growth, exceeding $50 billion in 2022. 3Notable sales, like Beeple's "Everydays: The First 5000 Days" for $69 million, 4 and Trevor Jones' "Crossroad" for $6.6 million, 5 have captured global attention.Unlike traditional cryptocurrencies like Bitcoin, NFTs have distinct characteristics.Each NFT represents a unique and irreplaceable digital item, including art, collectibles, or other creations.They cannot be traded one-to-one due to their uniqueness, sparking excitement among artists who can showcase and sell their work without traditional gatekeepers (Bao & Roubaud, 2022;Fridgen et al., 2023).NFTs offer transparency, security, and ownership through blockchain integration, protecting creators' rights and providing immutable ownership records.There is no doubt that the NFT market economy has become a non-negligible economic phenomenon.Much research has focussed on this emerging ecosystem, such as intellectual property (Dong & Wang, 2023), valuation (Kapoor et al., 2022), and digital collectibles (Wilson et al., 2022).However, most of the researchers analyse the NFTs in terms of economic issues and little is known about the traders, holders, and NFT performance, trends, and influencing factors in the NFT market.Furthermore, gathering and handling NFT market data remains challenging due to its decentralised, fragmented nature and rapid evolution.Consequently, a large-scale public dataset for the NFT market is currently lacking.To fill this gap, we first collected all NFT transaction records and event logs on the Ethereum blockchain from January 2018 to April 2022 to build a large-scale dataset of the NFT market.Next, to deeply analyse the characteristics of the NFT market, we constructed the NFT Creator Graph (NCG) and NFT Holder Graph (NTHG).Furthermore, by applying statistical methods, we analysed the performance and trends of the NFT market.Finally, we predicted the NFT market transaction volume and analysed its influencing factors.In summary, this article makes the following contributions: • The study provides a comprehensive analysis of the NFT market's structure, performance, and trends, offering valuable insights for market participants and increasing public awareness of this emerging market while sharing datasets 6 for further research.• The study investigates correlations between NFT minting and transaction volumes, creating creator and holder graphs from Ethereum account data to uncover buyer and seller behaviour patterns and facilitating informed decisions by market participants.• The study utilises statistical analysis and an accurate ARIMA model to predict future daily volume changes in NFT collections.Additionally, it explores transaction volume factors through feature extraction and machine learning models, using interpretable SHAP techniques to assess feature contributions.
The rest of the paper is organised as follows.Section 2 provides the background knowledge related to NFT, which is necessary for subsequent analysis.Section 3 introduces the research methods and framework of this article.Section 4, we collect and prepare all NFT transaction records and event logs for study.Section 5 we analyse the NFT market participant's actions and NFT collections performance.Section 6 focuses on the distribution of transaction prices in the NFT market and market trends.Section 7 uses the ARIMA model to predict the total daily transaction volume of the NFT market.Section 8 uses the different machine learning models to analyse the factors influencing the NFT market transaction volume.Section 9 summarises related work.Section 10, we summarise the full text and

Smart contracts
Smart contracts utilise blockchain technology and serve as computer programmes for managing and executing transactions within a blockchain network (Estevam et al., 2021).These contracts possess the advantages of automatic execution, eliminating the need for intermediaries and ensuring transaction security and reliability (Jitendra Singh Yadav & Sharma, 2022).By implementing contractual terms through code, smart contracts enhance transparency and fairness in transaction processes.They can execute operations automatically based on predefined conditions, such as transferring currency to a designated account or updating blockchain data.Numerous programming languages are available for developing smart contracts, with Solidity being the most popular choice for creating smart contracts on Ethereum.
The applications of smart contracts span various domains, including digital currency transactions, privacy-preserving (Li et al., 2020(Li et al., , 2021)), digital identity verification, and property registration (Arora & Kumar, 2022).These contracts offer several benefits, such as removing intermediaries, enhancing transaction security and reliability, reducing transaction costs, and mitigating fraud risks (Taherdoost, 2023).Nevertheless, challenges persist in the smart contract landscape, including the absence of standardised programming languages and tools, potential vulnerabilities, and legal risks.Despite these challenges, the significance of smart contracts in enabling automated transactions and driving the digital economy is increasingly acknowledged, thanks to the ongoing advancement of blockchain technology and these programmable contracts.

NFT
Non-fungible tokens (NFTs) represent a distinctive class of digital assets that signify ownership of specific digital content, including artworks, videos, or tweets.These tokens leverage blockchain technology and smart contracts to substantiate ownership and validate authenticity.On the Ethereum blockchain, the ERC-721 standard facilitates the generation of these one-of-a-kind NFTs (Fridgen et al., 2023).NFT minting, a process that involves associating digital assets with smart contracts and then releasing them on a blockchain, reinforces the individuality and inimitability of these tokens (Catindig et al., 2023).The term 'NFT collection' pertains to a grouping of NFTs that usually share a common source, commonly issued by the same smart contract.
Prominent NFT trading platforms like OpenSea, 7 Rarible, 8 SuperRare, 9 and Foundation 10 serve as decentralised marketplaces where users can actively engage in buying, selling, and trading NFTs.These platforms offer a range of features, such as access to trading histories, price charts, and the opportunity to participate in auctions, enhancing the NFT investment and trading experience.
In summary, NFTs leverage blockchain technology to establish digital ownership and confirm the provenance of unique digital assets.Major NFT marketplaces provide platforms for the purchase, sale, and exchange of diverse NFTs, facilitating a vibrant ecosystem for NFT trading and investment.

Analysis methods and framework
This paper presents a comprehensive data-driven analysis of the NFT market using statistical and machine-learning techniques.We adopt the analysis framework shown in Figure 1 to analyse the NFT market on Ethereum.Specifically, our analysis comprises three stages.In the first stage, by parsing NFT transaction events and logs on Ethereum from January 2018 to April 2022, we collected over 80 million NFT transaction records.Based on these data, in the second stage, we analyse the market structure by constructing the NFT Creator Graph (NCG) and NFT Holder Graph (NHG) to discuss market characteristics, while applying statistical methods to analyse market performance and trends.
In the third stage, we focus on predicting NFT market transaction volume and analysing influence factors.We first forecast the trading volume trends using ARIMA models, then construct market trading features and leverage machine learning models (LightGBM, XGBoost, Random Forest, etc.) to analyse the factors impacting volume.Finally, we adopt SHAP, an interpretable machine learning method, to explain the prediction results and analyse different features' contributions.Overall, our analysis enhances the public understanding of NFT market conditions, trends, and influence factors.The shared dataset also enables more researchers to explore the NFT market.

Data collection and preprocessing
To gather comprehensive data for our study, we employed the Parity client provided by Openethereum 11 to establish a full node on a local machine.This software enabled us to synchronise the complete transaction data of the entire Ethereum network.Our data collection process encompassed downloading all block data from the initial block until block 1,450,000, encompassing transactions recorded until April 2022.
Furthermore, it's important to note that the NFT market's emergence dates back to 2017 with the popularity of CryptoKitties. 12Since then, the NFT market has witnessed rapid growth and continuous evolution.In 2018, the standardised ERC-721 protocol 13 was introduced, adding structure and interoperability to the NFT market and driving its further development.As a result, the NFT data analysed in this article encompasses all transfer data on the Ethereum blockchain from January 2018 to April 2022.
We utilised Parity APIs to retrieve NFT transaction data, internal contract transaction details, and contract call information from the block data.By meticulously scanning the bytecode of each contract, we successfully identified the contracts that implemented the ERC721 protocol standard.Consequently, we identified over 50,000 NFT contracts and their respective creators within the Ethereum network.
For instance, Figure 2 illustrates a representative event log of a standard ERC721 protocol NFT collection transfer event.The event log consists of different topics, with topic 0 denoting the hash value of the transfer event, topics 1 and 2 representing the sender and recipient addresses, respectively, and topic 3 indicating the token ID of the NFT within the collection.We thoroughly parsed the external transaction data to identify all transfer events adhering to the ERC721 protocol, scrutinising the event logs.Our analysis involved matching event logs in which topic 0 represented the hash value of the event type and searching for event logs where topic 0 corresponded to the hash value of the transfer event for ERC721 NFT transfers, consistently denoted by 0xdd..ef.Furthermore, we examined the other topics in the event logs to extract additional information about the NFT transfer.Specifically, we focussed on event logs in which topics 1 and 2 indicated the sender and recipient addresses,   respectively, and verified whether topic 3 represented the token ID of the NFT in the collection.By adopting this approach, we successfully and accurately identified all transfer events that complied with the ERC721 protocol.
Leveraging these methodologies, we amassed an extensive dataset comprising over 80 million NFT transfer records and 55,417 smart contracts responsible for NFT creation.This wealth of data allowed us to comprehensively analyse the NFT market as of April 2022, encompassing 32 transactions and 11 contract data features.
Table 1 provides essential information about transactions involving non-fungible tokens (NFTs) on the Ethereum network, while Table 2 provides information about smart contracts that NFTs on the Ethereum network.Thirty-two transaction data features were obtained, including timestamps, addresses of both transfer parties and transaction volume, among others.In the contract data set, 11 features were received, including token address, NFT name, and creator information.A detailed description of each field can be found in Table 3, although only the most essential features are included due to space constraints.

NFT market analysis
This section presents a comprehensive analysis of the NFT market from various perspectives.We delve into the correlation between NFT minting and transactions, analyse the actions of creators and holders, and evaluate the performance of different categories of NFT collections.Our analysis is based on a meticulous investigation of transaction blocks and log data, allowing us to explore the involvement of 55,417 NFT contracts and The timestamp of the block in which the transaction occurred.transactionHash The unique identifier of the transaction.from The address that sent the NFT. to The address that received the NFT.tokenId The unique identifier of the NFT.tokenAddress The address of the smart contract that manages the NFT.valueToUsd The value of the transaction in USD.name The name of the NFT Collection.symbol The symbol of the NFT Collection.totalSupply The total number of NFTs in existence.createdTimestamp The timestamp of when the smart contract was created.creator The address that created the smart contract.creatorIsContract A binary flag indicating whether the creator address is a smart contract.3,314,430 addresses within the NFT ecosystem.Throughout our research, we have identified 81,161,983 NFTs engaged in transactions.While there are over 60,000 NFT contracts, it is crucial to note that we have observed some contracts that deviate from the ERC721 standard.Therefore, our study primarily focuses on the transactional data of the discovered NFTs to gain insights into the market structure characteristics of NFTs.
To provide a comprehensive overview, we summarise the two main types of data in our dataset: Mints data and Transfers data, as shown in Table 4.The Mint data comprises 52,647,432 records involving 2,424,510 NFT holders and 54,184 NFT collections.On the other hand, the transfer data contains 28,514,551 records involving 2,257,027 NFT holders and 41,731 NFT collections.

Active NFT collections: what are they?
When issuing an NFT collection, the number of mints determines the collection's supply, while the number of transactions reflects its activity level.Generally, NFT collections with a larger supply tend to have higher transaction volumes.However, it's important to note that the supply of NFT collections in different categories can vary significantly.For example, specific game-related NFT collections, such as GODS Unchained 14 with a total supply of 2 million, may have millions of items available.Despite this ample supply, their actual transaction volumes may be lower than some popular NFT projects with smaller supplies.Several factors contribute to this difference, including the type and popularity of the NFT collection and its perceived collectible value.
To thoroughly investigate the supply and transfer quantities of each NFT project, this study utilised a vast dataset comprising 52,647,432 mints and 28,514,551 actual transactions.Pearson and Spearman correlation coefficients were employed to understand the relationship between total supply and actual transaction volumes.The results revealed a strong and consistent correlation between these two variables, with a Pearson coefficient of 0.82 and a Spearman coefficient of 0.54.These findings confirm a close interconnection between total supply and actual transaction volumes within the NFT market.
Meanwhile, we analyse the distribution of NFT collections regarding the number of mints.Figure 3 presents the power-law distribution for different NFT collections.Figure 3(a) reveals that only 25.3% of NFT collections were minted once, while 77.8% were minted fewer than 100 times.Notably, 18 NFT collections had minting quantities exceeding 100,000, whereas only three NFT collections surpassed 100 mints.These three collections are "Gods Unchained", a card game with a minting amount of 6.97 million, "CryptoKitties" (minting quantity of 2.01 million), and "ENS", a domain registration service (minting amount exceeding 1 million).
An exciting phenomenon arises regarding the power-law distribution of transfer transactions for NFT collections shown in Figure 3(b).Only 23.4% of NFT collections had been transferred once, and 64.9% had been transferred fewer than ten times, accounting for 84.9% of the total.Furthermore, only three NFT collections had transfer volumes exceeding 1 million, while 14 NFT collections had transfer volumes surpassing 100,000.The top three NFT collections in terms of transfer volumes were 'CryptoKitties' with over 3.55 million transfers, followed by "ENS" with over 1.2 million transfers, and "Gods Unchained Cards" with over 1.02 million transfers.
The analysis reveals the underperformance of most NFT collections regarding user activity.Additionally, a distinct power-law distribution of minting and transaction quantities suggests a small number of highly active NFT projects alongside numerous inactive ones.A fitting distribution line of y ∼ x −α was plotted, where a larger α value indicates lower activity levels for the NFT collection.
Among the vast NFT collections exceeding 50,000, many inactive smart contracts were deployed on the Ethereum blockchain.Approximately 25% of the contracts were minted only once, and 24% were transferred either once or not at all.The top two NFT contracts with the highest transfer volumes were "CryptoKitties" and "Gods Unchained", both being blockchain games.Following them was the "ENS" contract used for blockchain domain services.Employing three correlation methods, the Spearman coefficient of 0.82 indicates a strong correlation between the number of mints and actual transfer volumes.
Finding 1: The number of mints and transactions are closely related in NFT collections, with collections with larger supplies tending to have higher transaction volumes.However, there is a power-law distribution with a small number of highly active NFT projects and many inactive ones.The correlation between the number of mints and actual transfer volumes is strong.We also find that "CryptoKitties" and "Gods Unchained" are the two NFT contracts with the highest transfer volumes, both being blockchain games.

NFT collections: who are the creators?
The anonymity provided by blockchain technology makes it challenging to reveal the identity of an NFT collection creator.On the Ethereum platform, a single address can be used to create multiple NFT collections, making it difficult to establish relationships between creators.To better understand the supply and demand relationship of the NFT market and the flow of NFT transactions, this paper introduces the concept of the NFT creator graph (NCG).By constructing the NCG, different NFT collection creators can be linked to their minted NFTs.
The NCG is defined and constructed as NCG = (V, E), where V is a set of external, contract, and NFT collection contract accounts, and E is a set of edges.E = (v i , v j ) | v i , v j ∈ V is a set of ordered node pairs, where the order of edges represents the creation relationship, i.e. an address v i created an NFT collection v j .Therefore, the NCG is a directed graph.The NCG can be easily constructed by examining the smart contract information dataset for creating NFT collections.
To get an overall impression of NCG, we randomly select 10,000 edges and show the result in Figure 4(a).NFT collection nodes were highlighted in red, while creators were shown in blue.The size of each node was proportional to the number of NFT collections created by that particular creator.Notably, some nodes created a significant number of NFT collections, which is surprising given that NFT collections usually represent unique digital assets that can be used to describe specific rights or serve as digital currencies in decentralised applications.
To better understand NCG, we analysed the distribution of outdegrees, representing the number of NFT collections created by a node.Figure 4(b) shows the outdegree distribution of NCG, which conforms to a power-law distribution.This means there are a few nodes with many outdegrees and many with a small number of outdegrees.We fitted a line y ∼ x −α to this distribution and found that a larger value of α corresponds to a smaller outdegree for a given node.By analysing relevant data, we found that the vast majority of creators in NCG only create a small number of NFT collections.Specifically, 81.4% of creators created only one NFT collection, 98.4% created no more than 5, and 99.4% created no more than 10.This suggests that factors such as minting costs and market demand limit the creation of NFT collections by most creators.However, we also found that a few accounts in NCG have created a vast number of NFT collections, with some creating over 1,000.Despite this, these large-scale creators represent a tiny proportion of the overall creator population.
We focussed our analysis on the NFT accounts that have generated the highest number of creations, given the vast number of creators involved in the NFT space.The account with the most NFT collections is 0x3b61, which has created 3253 collections.Our investigation, conducted through a Google search, revealed that this contract account creates new NFT collections by invoking the "createCollection(string name, string symbol, uint256 nonce)" function.Our experiment discovered that 17,641 NFT collections were created, but 90% had ten transfers or fewer.This observation suggests the presence of numerous inactive NFT collections on this platform, which serves as a venue for users to create NFT smart contracts and trade NFTs via Foundation Labs.However, a substantial disparity exists in  activity levels between these traded NFTs and those sold on Opensea, the largest NFT marketplace.The second most prolific NFT creator, 0x957a, is a proxy contract officially created by Opensea NFT, and it has generated 1107 NFT collections.Among these collections, 67% had ten transfers or fewer.Given that Opensea is the most prominent platform in the NFT market, NFTs created through this platform exhibit higher levels of activity and influence.
To gain insights into the characteristics of NFT creators in the market, we analysed the number of creators in different account types and the corresponding number of NFT collections they have generated.Table 5 presents the findings, indicating that approximately 53% of NFT collections are created by human creators, accounting for 98% of external accounts.Additionally, around 47% of NFT collections are generated by smart contracts, as the NFT collection contract itself can automatically create NFT collections when necessary.Interestingly, our calculations demonstrate that a mere 2% of smart contracts have contributed to 47% of all NFT collections, which is a remarkable observation.
Figure 4(c) illustrates the developmental trajectory of external and smart contract accounts regarding NFT collection creation.The number of creations was tracked by year, and the statistical data unveils that smart contracts have gradually emerged as the primary method for NFT creation, particularly since 2021.Notably, in 2022, the number of NFT smart contracts created by third-party platforms experienced a significant surge.This trend signifies the vital role played by smart contracts in NFT issuance, as they can lower the entry barrier for ordinary users seeking to create NFTs, thereby facilitating their entry into the market.As smart contract technology continues to evolve and gain wider adoption, NFT issuance increasingly relies on smart contracts.Using smart contracts simplifies, expedites, enhances transparency, and reinforces the security of NFT creation and distribution.Consequently, an increasing number of individuals are choosing to employ smart contracts for creating and issuing their NFTs, thereby fuelling the growth of the NFT market.
Finding 2: To gain a deeper insight into the dynamics of supply and demand in the NFT market and how transactions flow, one can utilise the NFT Creator Graph (NCG).The NCG is a directed graph that connects NFT collection creators with their minted NFTs.Most creators only create a small number of NFT collections due to factors such as minting costs and market demand, but a few accounts have created a massive number of NFT collections.Smart contracts have gradually become the primary means of NFT creation since 2021, particularly in 2022, indicating that smart contracts play a crucial role in NFT issuance.

NFT holders: who holds them?
If an NFT collection is viewed as a small-scale economy, having a broad range of holders is crucial for ensuring a healthy and stable market.This section focuses on the characteristics and attributes of NFT holders.To achieve this, we introduce the concept of an NFT Holder Graph (NHG), as illustrated in Figure 4, which provides insights into the interactions and relationships among holders, including the transfer and transaction history of NFTs.Analysing the NHG uncovers vital information such as the distribution of holders, their transaction behaviour, and the number of NFTs they hold.This analysis aids in developing a better understanding of the NFT market and the conduct of its participants.
To create the holder graph, we analysed all the transfer logs of NFTs.For each address and corresponding NFT in the logs, we aggregated the total number of received NFTs and subtracted the total number of sent NFTs.This calculation provides the most current amount of NFTs held by each address.If the remaining amount is greater than zero, the address is recognised as a holder of NFTs.This approach effectively identifies all NFT holders and their respective holdings.With this information, constructing the NHG becomes a straightforward task.
The NHG can be represented as NHG = (V, E), where V is a set comprising holder accounts and NFTs, and E is a set of edges, following the same definition of the NHG itself.Specifically, E = (v i , v j )|v i , v j ∈ V denotes a set of ordered node pairs, where the ordering of the edges signifies the relationship between the nodes and NFTs, i.e. a node v i holds an NFT v j .Therefore, the NHG is a directed graph.Constructing the NHG is a straightforward process, as it involves examining the information dataset of holders possessing NFTs.
We randomly selected 10,000 edges to understand NHG comprehensively and showed them in Figure 5(a).Blue dots signify accounts, whereas red dots represent NFTs.The size of each dot corresponds to the number of NFTs held by the account.Hile some NFTs have numerous holders holding a substantial number of NFTs, the number of NFTs held by some accounts could be much higher in comparison, resulting in most blue dots appearing very small.The degree distribution of NHG is presented in Figure 5(b), with an NFT degree indicating the number of holders.This distribution also adheres to a power law.Statistical analysis revealed that slightly over half of the holders (53%) hold only one NFT, with 90% of holders possessing no more than ten NFTs.In other words, most users could be more active, indicating that the NFT economy is still developing.
Considering many holders, we focus on accounts with the largest NFT types.The first address is the all-zero black hole address, which received 3,879 types of NFTs from other owners, removing them from circulation and causing permanent destruction.In second place is the Illiquid DAO address, where users can send NFTs with poor returns and no liquidity to the designated address 0xeea89. 15As compensation, Illiquid DAO will provide a certain amount of JPEGSNFT to each sender based on the number of NFTs sent; however, the NFTs must have been minted before December 28, 2021.The third-ranked address is the 0xdead address, and this is another black hole address containing 2243 types of NFTs sent to the zero address by their owners for permanent destruction.The fourth-ranked address is 0xe0521, 16 which holds 2054 different types of NFTs and is stored in the Nifty Gateway comprehensive wallet.Eligible NFTs will be transferred from the verified wallet address to the Nifty Gateway wallet for custody, allowing them to be sold and purchased simultaneously.This data is consistent with DappRadar's count of 2068 NFTs, indicating our dataset is completeness and reliability.
Simultaneously, we analysed the number of holders for all NFT collections, resulting in the power law distribution depicted in Figure 5(c).Our analysis indicates that 69% of NFT collections have ten or fewer holders, while 83% of NFT collections have 100 or fewer holders.Moreover, three NFT collections boast over 100,000 holders, 209 NFT collections have holders between 10,000 and 100,000, and 3044 NFT collections have over 1000 holders.ENS, with 354,354 holders, is the NFT collection with the most significant number of holders, followed by Crypto stamp, an encrypted stamp issued by Austrian Post, with 150,160 holders, and CryptoKitties, an Ethereum-based virtual cat breeding game, with 137,226 holders.
These statistical results unveil the distribution of NFT holders in the market.Most NFT collections have a limited number of holders, whereas only a tiny proportion of NFT collections have a substantial number of holders.This pattern may arise from the distinctive types and attributes of the NFT market, as different types of NFT collections may appeal to varying holders.This demonstrates that a few crucial NFT collections and holders could significantly influence the NFT market.Therefore, when analysing the NFT market, it is essential to consider this holder distribution phenomenon.
In conclusion, analysing the characteristics and attributes of NFT holders is crucial for understanding the NFT market and its participants' behaviour.The NFT Holder Graph (NHG), constructed from transfer and transaction logs, provides essential information, such as the distribution, transaction behaviour, and holding amounts of NFT holders.Our analysis shows that most NFT holders have only one NFT, indicating that the NFT economy is still developing.Moreover, we identified the accounts with the most extensive NFT types, including black hole addresses and comprehensive wallets.Finally, we unveiled the distribution of NFT holders in the market, indicating that most NFT collections have a limited number of holders.In contrast, only a tiny proportion of NFT collections have a substantial number of holders.These statistical results provide insights into the behaviour and preferences of NFT holders and can help market participants make better-informed decisions.
Finding 3: NFT holders play a crucial role in ensuring a healthy and stable market.The NFT Holder Graph (NHG) can be constructed by analysing NFT transfer logs to identify all NFT holders and their respective holdings, providing insights into their distribution, transaction behaviour, and holding amounts of NFT holders.The NHG is a weighted directed graph, and statistical analysis shows that most NFT holders possess no more than ten NFTs, and the NFT economy is still in its nascent stage.Moreover, most NFT collections have a limited number of holders, whereas only a tiny proportion of NFT collections have a substantial number.

Performance of different NFT categories: what is it?
In the Ethereum blockchain, each NFT has a unique smart contract address, but the smart contract address itself does not contain information about the type of NFT collection.Therefore, we cannot directly obtain the type of NFT collection through the Ethereum blockchain.To get the category of NFT collection, it is usually necessary to use other means, such as analysing the website, social media, and other information sources of the NFT market to determine the type of collection.So, in this section, we use scraping techniques to obtain data on the 3777 most active NFT collections on nftgo.io 17before April 2022, which were then classified into 14 different types for analysis.Table 6 provides a detailed data summary, including the number of NFT collections, holders, and the total transaction volume.Figure 6 shows the analysed findings, and we draw the following conclusions: Conclusion 1: Pfp-type NFT collections have the highest proportion of collections, accounting for 43.94% of the total, with the highest number of holders and transaction volume, accounting for 43.42% and 21.48%, respectively.These results indicate that Pfp NFT collections are highly valued and in demand in the market, possibly due to their perceived value for collecting and identity recognition.Conclusion 2: Utility-type NFT collections account for 12.22% of the total but have a lower total number of holders and transaction volume than Pfp type, accounting for 9.63% and 24.36%, respectively.Although these NFT collections are fewer than Pfp, their value and demand in the market are high due to their practical functionality, such as access to a specific platform or service.Conclusion 3: Art and Game type NFT collections have relatively high proportions of the total number of collections and holders, but lower total transaction volumes, accounting for 8.06% and 7.09% of the total, respectively.This phenomenon indicates that these NFT collections may have high value but relatively low demand.These results could be because these NFTs are primarily appreciated by art and gaming enthusiasts, who are not large-scale NFT traders.
This section provides a data visualisation and analysis of 14 different types of NFT collections.It reveals that Pfp-type NFT collections are the most popular, Utility-type NFT collections provide practical functionality, and Art and Game-type NFT collections are primarily appreciated by art and gaming enthusiasts.These findings are essential for understanding the performance and trends of different types of NFT collections in the market.They can aid investors, collectors, and developers in making more informed decisions.
Finding 4: Our analysis revealed that Pfp-type NFT collections are the most popular, accounting for 43.94% of the total, with the highest number of holders and transaction volume.Utility-type NFT collections, which offer practical functionality, accounted for 12.22% of the total but had a lower number of holders and transaction volume than the Pfp type.Art and Game type NFT collections had relatively high proportions of the total number of collections and holders but lower total transaction volumes, indicating that they may have high value but relatively low demand.

NFT market trend analysis
In this section, we have cleaned up the data by removing minting data, transactions involving contract addresses in the "from" or "to" fields, and transactions with zero amounts.After cleaning, we left with 12,173,686 transactions from 13,808 unique NFT contracts.Then, we analysed the NFT market's price distribution, transaction volume, and daily average transaction prices with this data.Finally, to better understand the market trends, we analyse the historical distribution of prices for various NFT collections.
By analysing the transaction volume and average prices of NFT collections, we can gain valuable insights into the market's behaviour and identify potential trends and patterns.This analysis can help investors and collectors decide which NFTs to invest in or collect and at what price points.

NFT market transaction price distribution
To gain a deeper understanding of the price distribution in the NFT market, we conducted a comprehensive analysis of the price distribution for all NFT transaction records and the top 1000 NFT collections ranked by transaction volume.
In this study, we first conducted a comprehensive statistical analysis of the price distribution for all NFT transactions on the market, as shown in Figure 7(a).Specifically, NFT prices are between $100 and $1000, accounting for a high percentage of 55%.NFTs priced below $100 also account for a significant portion, approximately 13%.NFTs above $1000 account for 32%, while those above $10,000 make up only 4%.These results indicate that although most transactions concentrate in the low to mid-price range, high-value NFTs exist in the market.
Additionally, we conducted a statistical analysis of the price distribution for the top 1000 NFT collections ranked by transaction frequency, as shown in Figure 7(b).These NFT collections typically represent the market's most active and popular assets.Specifically, 90% of the median transaction prices for the top 1000 NFT collections are above $149, with average transaction prices above $291.Among them, 50% of the average transaction prices are above $977, and the median is above $527.About 10% of the average transaction prices are above $4400, and the median is above $2190.
These findings suggest that the NFT market is relatively stable, with prices concentrated in the mid-range.The high proportion of NFTs priced between $100 and $1000 may indicate that this is the price range where most buyers and sellers are willing to participate in the market.However, high-value NFTs indicate buyers and sellers are eager to engage in highpriced transactions.One possible explanation for the concentration of prices in the midrange is that the entry barrier for buyers and sellers is relatively low at these price points.As NFTs become more mainstream, more people may be willing to participate in this price range market.
In conclusion, our study comprehensively analysed the price distribution for all NFT transactions and the top 1000 NFT collections ranked by transaction frequency.The results indicate that the NFT market is relatively stable, with prices concentrated in the mid-range.However, high-value NFTs indicate buyers and sellers are willing to engage in high-priced transactions.The concentration of prices in the mid-range may be due to the low entry barrier for buyers and sellers at these price points.As NFTs become more mainstream, more people may be willing to participate in this price range.The popularity of NFT collections and related events may also lead to higher transaction prices for the top 1000 NFT collections.Overall, these findings reveal the pricing dynamics of the NFT market and provide insights for investors and enthusiasts.
Finding 5: The NFT market has a relatively stable price distribution, with prices concentrated in the mid-range, but high-value NFTs also exist.The high proportion of NFTs priced between $100 and $1000 indicates that this is the price range where most buyers and sellers are willing to participate.The popularity of NFT collections and related events may lead to higher transaction prices for the top 1000 NFT collections.Overall, the study provides insights for investors and enthusiasts.

NFT market development trends
With the rapid development of the cryptocurrency market, the NFT market has also emerged gradually in recent years.The unique attributes of digital artworks make the NFT market experience unprecedented growth in 2021.As global media reports on some high-priced digital artworks transactions, the concept of NFT is becoming increasingly wellknown.This led many artists, creators, and investors to enter the market, and the NFT has become an essential digital asset.In this market, people can achieve investment returns and trading profits by buying and selling unique digital artworks, virtual real estate, virtual game items, and more.
To better understand the NFT market's price and transaction situation, we analysed and visualised the daily average transaction price and volume presented in Figure 8(a).The figure shows that the NFT market had a relatively short-term growth in early 2018, and then the market entered a relatively stable period until 2021.In March 2021, the NFT market exploded, with the daily total transaction volume quickly climbing to billions of dollars.Some high-priced digital artworks transactions, such as Beeple's "The First Five Thousand Days" and Twitter founder Jack Dorsey's first tweet, attracted global media and public attention.These high-priced digital artwork transactions and the celebrity effect they bring have attracted widespread attention from the international press and the public, prompting more people to pay attention to and enter the NFT market, further driving the market's prosperity and growth.In addition, the application field of NFTs is continuously expanding, from digital artworks to music, sports, games, and other areas, bringing more opportunities and potential to the NFT market.
We took the price's logarithm to reflect the market trend better and displayed the price change in Figure 8(b).The figure shows that the market price showed a brief downward trend from 2018 to 2019; however, beginning in 2019, the market price rose gradually and exhibited a certain degree of stability by 2022.This indicates that with time, the market slowly recognises the value of digital assets and begins to see the potential of NFTs.Specifically, in 2018, CryptoKitties became the first popular NFT project, driving the market's development.However, with time, the market began to face challenges, such as large price fluctuations and small market size.This also led to a downward market trend from 2018 to 2019.But as the market gradually matured, we found the value and potential of digital assets were more widely recognised, and the market price began to rise slowly.However, in 2021, with the intense outbreak of the NFT market, the transactions of some high-priced digital artworks attracted global attention.This also drove the rise in market prices.However, in early 2022, the market experienced a brief decline, possibly due to the cooling of the heat and the relatively small market size.However, the market quickly recovered its vitality and entered a relatively stable growth stage.
Overall, the price trend of the NFT market shows a development process from initial decline to gradual recovery and entering a relatively stable stage.This also indicates that the value and potential of digital assets are increasingly widely recognised, and the market is confident in the future of NFTs.
Finding 6: The NFT market experienced rapid growth in 2021, driven by high-priced digital artwork transactions that attracted global media attention.The market trend shows a development process from initial decline to gradual recovery and entering a relatively stable stage, indicating that the value and potential of digital assets are increasingly widely recognised, and the market is confident in the future of NFTs.

NFT market daily volume forecasting
In this section, we utilise the ARIMA model to forecast the daily transaction volume of the NFT market between January 2018 and April 2022.The significance and value of this forecasting lie in its ability to provide stakeholders with a better understanding of market trends and changes, enabling them to make more informed decisions in areas such as investments, business operations, and research.For investors, the forecasting results offer valuable market insights to guide their investment decisions, while for businesses, they drive to meet market demands better and make strategic decisions.Researchers can benefit from the forecasting results by gaining data support to understand better the characteristics and operating mechanisms of the NFT market.In summary, predicting the daily transaction volume of the NFT market is significant and valuable because it provides stakeholders with deeper market insights and guidance to help them make more informed decisions.

ARIMA model
The ARIMA model is a prominent tool in time series analysis data (Zhang, 2003), extensively utilised for forecasting and modelling time series data.Renowned for its exceptional performance in time series forecasting, it has found widespread adoption in various research studies, such as flood risk analysis (Yan et al., 2022), pertussis incidence prediction (M.Wang et al., 2022), and forecasting trends in monetary funds (Fan, 2022).These examples underscore the model's versatility and effectiveness in addressing diverse forecasting challenges across domains.
It involves three key parameters: p, d, and q, representing the autoregressive term, degree of differencing, and moving average period, respectively.The autoregressive time, p, shows the linear relationship between the current and previous p values.The moving average term, q, demonstrates the linear relationship between the current value and the preceding q errors.The degree of differencing, d, signifies the number of times the time series needs to be differenced to attain stationarity.By introducing differencing operations, the ARIMA model enhances the accuracy of the forecasting results and stabilises the time series data.
Compared to other time series models, the ARIMA model has several advantages.Firstly, it only considers a single feature factor, making it more convenient for forecasting.Secondly, it applies to a broader range of actual data types as it does not require the time series data to be entirely smooth.However, the ARIMA model also has some disadvantages, such as being a linear model and unable to handle nonlinear relationships.Additionally, before applying the ARIMA model, it is essential to test for stationarity.
Since the daily transaction volume of the NFT market exhibits significant fluctuations over time, it is initially considered non-stationary.This experiment employs differencing to transform the non-stationary time series into stationery.Further analysis is conducted on this stationary time series.Unit root testing, specifically ADF testing, is utilised to ensure the accuracy of the experiment's results.

ADF test
The ADF test is a commonly used time series analysis method to detect whether a sequence contains a unit root (Cheung & Lai, 1995), determining the sequence's stationarity.Stationarity is necessary for building predictive models, so the ADF test often determines whether a time series is suitable for building predictive models and the degree of differencing required.
Initially proposed by Dickey and Fuller in 1979, the ADF test was widely used in time series analysis after expansion and improvement.When conducting the ADF test, the dataset is typically divided into training and testing sets, with the former used for model building and the latter for testing the model's predictive ability.
Hypothesis testing is required when conducting the ADF test.The null hypothesis is that the sequence is non-stationary, while the alternative hypothesis is that the sequence is stationary.If the significance test statistic is less than a critical value at a given confidence level, the null hypothesis can be rejected, and the sequence is considered stationary.Confidence levels commonly are 99%, 95%, and 90%.
In this study, we applied one differencing process to the NFT transaction data and obtained the ADF test result in Table 7.The result indicates that the ADF statistic is less than 1% significant.The critical values are the same as those of the original data before differencing, meaning that the data sequence becomes stationary.

Evaluation of experimental results
We utilised the ARIMA model to predict the daily transaction volume of the NFT market, with the data divided into an 80% training set and a 20% testing set.After differencing the time series and conducting hypothesis testing, we found the series to be stationary.We further performed a logarithmic transformation to reduce volatility and increase stability to smooth the series.We optimised the model parameters using a grid search method and made forecasting on the testing set.To evaluate the forecasting performance, we used four metrics: root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and R2 score.
After optimising the parameters, the ARIMA model was obtained with d = 2, p = 2, and q = 2.The forecasting results are shown in Table 8, which indicates that the model has good forecasting accuracy with an RMSE of 0.31, MAPE of 1.39%, MAE of 0.24, and R2 of 0.95.The evaluation metrics indicate that the model has made minor forecasting errors when forecasting future transaction volume.Additionally, Figure 9 demonstrates the fitting effect of the prediction model, where we show a sample of the daily transaction volume in the NFT market alongside the predicted values obtained by fitting an ARIMA(2,2,2) model in Table 9.These results suggest that the model has achieved satisfactory performance.Furthermore, as depicted in Figure 9, shows the ARIMA model's daily transaction volume forecasting, comparing the actual data and the forecasted values, Figure 9(a) presents both the actual and predicted daily transaction volume, while Figure 9(b) shows the 5day, 10-day, and 20-day moving averages of the predicted transaction volume, as well as the actual 5-day moving average transaction volume.This comparison enables analysts to observe changes in transaction volume across different time scales, aiding in their understanding of market shifts and future trend forecasting.The accuracy and applicability of the predictive model are also demonstrated through the degree of fitting observed in these lines.
Overall, the optimised ARIMA model with d = 2, p = 2, and q = 2 shows good predictive power and is suitable for forecasting the daily transaction volume of NFT markets.The evaluation metrics suggest that the model has minor prediction errors, and the moving averages help analysts to better understand market changes at different time scales.The results of this analysis can provide valuable insights for investors to make informed decisions.
Finding 7: An ARIMA model was used to predict the daily transaction volume of the NFT market, and after optimisation, it showed good predictive accuracy with minor errors.We use evaluation metrics such as RMSE, MAE, MAPE, and R2 scores to assess the model's performance.Moving averages were also used to reflect changes in transaction volume at different time scales and help analysts understand market changes.Overall, the optimised ARIMA model is suitable for forecasting the daily transaction volume of NFT markets and can provide valuable insights for investors.

Transaction volume influences factors analysis
In this section, we aim to gain a deeper understanding of the factors that impact the transaction volume of the NFT market.To achieve this, we initially manually extracted features from the 13 trading markets (refer to Table 10).Subsequently, we employ various machine learning models to predict the transaction volume through regression analysis.Finally, this research assesses the extent to which different features contribute to the prediction results using interpretable machine learning techniques, the SHAP method.

Feature extraction
In the feature extraction process, various factors of the NFT market are analysed to provide a comprehensive understanding of its dynamics.This involves capturing key metrics such as daily address count, token ID count, transaction count, and contract sender and receiver counts.These metrics offer insights into the daily activities and interactions within the NFT ecosystem.
Additionally, the extraction process includes capturing daily averages and maximum values, such as daily average transaction value, daily maximum transaction value, daily average gas price, daily average gas limit, and daily average gas used.These metrics shed light on the financial and computational aspects of NFT transactions, helping us understand the economic dynamics of the market.
Furthermore, the feature extraction process involves identifying unique elements within the NFT market, such as the daily count of unique NFT collection addresses, the daily count of unique seller addresses (sending addresses), and the daily count of unique buyer addresses (receiving addresses).These unique counts highlight the diversity and participation of different actors within the NFT ecosystem.
For specific details on the extracted features, please refer to Table 3.

Methodology
We outline the principles of the employed models as follows.Support Vector Machine (SVM) (Platt, 1998) is a potent supervised machine learning algorithm for classification and regression tasks.It seeks to find a hyperplane that optimally separates data points into distinct classes or predicts continuous output with maximum margin, the space between the hyperplane, and the nearest data points from each class.SVM excels in handling non-linearly separable data by utilising the kernel trick, transforming data into a higher-dimensional space.Its strengths lie in its capacity to handle complex data distributions and generalise effectively to new, unseen data, making it invaluable in domains like image classification, text analysis, and bioinformatics.
Linear regression (Weisberg, 2005) is a fundamental statistical and machine learning technique used to model the relationship between a dependent variable and one or more independent variables through a linear equation.The main goal is to determine the equation's coefficients to minimise the disparity between predicted and actual values.This approach provides valuable insights into the strength and direction of relationships and can be extended to multiple linear regression when multiple predictors are involved.Linear regression has widespread applications in fields such as economics, finance, and social sciences, where it is employed for tasks including sales forecasting, risk assessment, and trend analysis.Random Forest (Biau & Scornet, 2016) is an ensemble learning algorithm that combines multiple decision trees for prediction.Each decision tree is trained on a different subset of data in a random forest while using a randomly selected subset of features for splitting.By averaging or voting among the trees, the final prediction is obtained.This approach reduces the risk of overfitting and enhances the stability and accuracy of forecasts.
LightGBM (Ke et al., 2017) is a gradient-boosting framework that focuses on efficiently constructing gradient-boosting decision trees.It employs a histogram-based training algorithm for faster training and supports parallel computation.LightGBM also introduces techniques like GOSS (Gradient-based One-Side Sampling) and EFB (Exclusive Feature Bundling) to improve training speed and prediction performance.
XGBoost (T.Chen & Guestrin, 2016), short for eXtreme Gradient Boosting, is another gradient-boosting framework.It trains multiple decision trees iteratively, with each iteration attempting to correct errors from the previous round of predictions.XGBoost also incorporates techniques such as regularisation and depth limits for split points to control overfitting.It excels in prediction performance and supports feature importance assessment.
CatBoost (Prokhorenkova et al., 2018) is a gradient-boosting framework designed to handle categorical features (such as nominal and ordinal).It can automatically manage the encoding of categorical features, reducing the burden of feature engineering.CatBoost also employs techniques like symmetric tree learning and permutation-based feature importance calculation, contributing to improved predictive performance and stability of the model.

Explainable artificial intelligence
This study introduces Explainable Artificial Intelligence to analyse multi-category characteristics to examine their influence on the NFT market.It aims to investigate the predictability of transaction volume in the underlying NFT market.Assessing the effects of various features on the NFT market is challenging due to the need for more relevant research.
Despite generating highly accurate forecasts, the predictive ensemble structures as opaque systems, offering limited interpretability (Stiglic et al., 2020).This result highlights the practical necessity to explain the predictive influence of factors such as Twitter sentiment, Twitter features, and trading behaviour.Such insights would prove advantageous to market participants at different levels, facilitating short-term and long-term strategic planning.To address this issue, we employ the emerging Shapley additive explanation (SHAP) technique (Ullah et al., 2022) to expound on how different features influence the NFT market.
The SHAP measure, initially conceived by Shapley (1953) to evaluate the contribution of individual entities in a collaborative game, has recently found new applications in the field of Explainable Artificial Intelligence.This development has led to utilise the SHAP metric for feature evaluation, presenting novel opportunities.Mathematically, the SHAP measure is computed as follows: where, φi denotes the contribution of ith feature, N is the set of all features with cardinality n, S is the subset of N with feature i, and v(N) is the predicted outcome considering the ith feature.
The explanation is specified as follows: where,z ∈ {0, 1} M , M denotes the number of features under consideration, and φ i is determined using Equation ( 2).SHAP offers a different model explainer for accomplishing the task.The present research has utilised the SHAP utility to draw insights into relative importance features for building the LightGBM model.

Influential factors results analysis
This section employed six machine learning models and six evaluation metrics to analyse these features' influence on the NFT market's transaction volume.The detailed experimental results are presented in Table 11, and the fitting performance of the six models' predictions is shown in Figure 10.Subsequently, we provide a summary of the results and an analysis of the significance of these features.
Our analysis indicates that the MSE, RMSE, MAE, MAPE, and SMAPE metrics for four models -specifically, CatBoost, XGBoost, RandomForest, and LightGBM are notably lower than those of the SVM and LinearRegression models.This compelling evidence suggests that the former four models consistently demonstrate superior predictive accuracy compared to the latter two.
Specifically, CatBoost exhibits an MSE of 0.0781, signifying a 95.56% reduction compared to SVM and a 95.24% reduction compared to LinearRegression.The RMSE stands at 0.1733, indicating an 80.51% decrease compared to SVM and an 81.79% decrease compared to LinearRegression.Additionally, MAE, MAPE, and SMAPE are all at least 80% lower than the corresponding metrics for SVM and LinearRegression.Compared to CatBoost, XGBoost shows a reduction of 27.66% in MSE, a decrease of 7.56% in RMSE, and a decline of 14.77% in MAE.MAPE and SMAPE also exhibit a noticeable decrease.Concerning the R-squared (R 2 ) metric, CatBoost, XGBoost, RandomForest, and LightGBM consistently achieve values exceeding 0.99, contrasting to SVM's 0.8173 and LinearRegression's 0.8299.This compellingly suggests that the former four models exhibit a remarkable fit and superior predictive accuracy in contrast to the latter two models.
In an overall assessment, considering all metrics, LightGBM emerges as the topperforming model, demonstrating the highest precision.XGBoost and RandomForest closely follow, with only slightly lower performance than LightGBM.CatBoost ranks fourth in the lineup, while SVM and LinearRegression exhibit relatively poorer predictive performance.
Meanwhile, we employed the SHAP library to conduct an in-depth feature importance analysis.Specifically, our focus was on examining the unique contributions of individual features to transaction volume and using visualisation techniques to unveil their significant roles within the predictive model.To vividly showcase the relative significance of these features, we present the ranking contributions of all features based on SHAP values in Figure 11, and Table 12 shows the contribution score of each feature in the LightGBM model.
The feature importance ranking, as revealed by SHAP values, offers valuable insights into the factors influencing transaction volume.Notably, the feature "daily_avg_value" holds the highest importance score of 1.0819, indicating that it plays a pivotal role in predicting transaction volume.This result suggests that the average daily transaction value is a dominant factor in influencing transaction volumes in the NFT market.
Following closely, "daily_transaction_count" with an importance score of 0.4208 and "daily_max_value" with a score of 0.3790 underscore their significance as crucial contributors to transaction volume predictions.These metrics suggest that the volume and frequency of daily transactions and the maximum daily transaction value are crucial determinants of transaction volume dynamics.
Additionally, features such as "daily_token_id_count", "daily_unique_receiving_ addresses", and "daily_unique_sending_addresses" exhibit notable importance scores, reinforcing their roles in understanding transaction volume fluctuations.These insights into feature importance provide a deeper understanding of the factors driving transaction volume within the NFT market, aiding in more informed decision-making and predictive modelling.In summary, this comprehensive analysis of influential factors in the NFT market's transaction volume utilised six machine learning models and six evaluation metrics, revealing compelling evidence of superior predictive accuracy in four models: CatBoost, XGBoost, RandomForest, and LightGBM, as compared to SVM and LinearRegression.These models consistently achieved significantly lower MSE, RMSE, MAE, MAPE, and SMAPE metrics, with R-squared values exceeding 0.99, demonstrating their remarkable fit and predictive accuracy.Among them, LightGBM emerged as the top-performing model.
Furthermore, an in-depth feature importance analysis employing the SHAP library highlighted "daily_avg_value" as the most pivotal feature in predicting transaction volume, closely followed by "daily_transaction_count" and "daily_max_value".These findings offer valuable insights into the factors influencing transaction volume, facilitating more informed decision-making and predictive modelling in the dynamic NFT market.
Finding 8: In this comprehensive analysis of the NFT market's transaction volume, the LightGBM model achieved a remarkable fit with R-squared values exceeding 0.99.Additionally, feature contribution analysis identified "daily_avg_value" as the most pivotal predictor, followed by "daily_transaction_count" and "daily_max_value", providing valuable insights for informed decision-making in the dynamic NFT market.

Related work
Ethereum smart contracts have become crucial in developing decentralised applications (Estevam et al., 2021) and facilitating digital asset transactions in recent years (Jitendra Singh Yadav & Sharma, 2022).Smart contracts have enabled the creation of NFTs (Murray, 2022), unique digital assets people can verify on the blockchain.These NFTs are backed by smart contracts allowing ownership, transfer, and use in various applications (Ghaffari, 2023).
Research in Ethereum smart contracts has explored various technical aspects of NFT protocols, including the ERC-721 standard defining basic rules for creating and trading NFTs (Bamakan et al., 2022).Additionally, researchers have studied the security of smart contracts used to back NFTs and the challenges related to scalability and interoperability with other blockchain platforms (Arora & Kumar, 2022;Spataru et al., 2021).
As the NFT market grows, researchers are also studying the relationship between NFTs and traditional crypto assets.For example, some studies have investigated the potential impact of NFTs on cryptocurrency prices and the correlation between NFT prices and broader market trends (Ante, 2022;Zhong & Hamilton, 2023).Furthermore, specific NFT collections such as Axie Infinity, CryptoPunks, and Bored Ape Yacht Club have gained significant attention and become valuable in the secondary market (Apostu et al., 2022).Researchers are analysing the factors that contribute to the success of these collections and their influence on the NFT market (Kräussl & Tugnetti, 2022).Moreover, some research has focussed on the fairness of NFT sub-markets, such as CryptoKitties, to ensure that all participants in the NFT market have equal opportunities and are not unfairly disadvantaged (Smith, 2022).
However, most of these studies focus on protocols, security, and economic (Bhujel & Rahulamathavan, 2022).There is little analysis of NFT market participants, performance, trends, and the multifaceted factors influencing this burgeoning sector.Moreover, a significant hurdle exists in obtaining and handling vast amounts of NFT market data.This difficulty arises from the decentralised and fragmented nature of the NFT ecosystem, wherein data is dispersed across multiple platforms and lacks uniform formatting.Furthermore, the NFT market experiences rapid expansion and continual evolution, rendering the collection and analysis of data a dynamic and demanding undertaking.Consequently, there is currently no publicly accessible comprehensive dataset for the NFT market.
Therefore, this research addresses a research gap by sharing and analysing an extensive NFT market dataset from January 2018 to April 2022 on the Ethereum blockchain.The study encompasses NFT market analysis, comprising data collection, exploration of market characteristics through NFT creator and holder graphs, and prediction of transaction volume and influencing factors.The dataset provided not only opens up opportunities for researchers but also enriches the public's understanding of the NFT market.

Conclusion and future work
This research provides a comprehensive analysis of the NFT market on the Ethereum blockchain, addressing a gap in the existing literature.We first build a large-scale dataset of over 80 million NFT transactions from 2018 to 2022.Our analysis then encompasses the market structure, performance, trends, and influence factors.Key findings include the strong mint-volume correlation, emergence of power-law distribution in projects, and dominance of certain blockchain games.Constructing the NFT Creator and Holder Graphs also reveals insights into creator behaviour and holder distribution, emphasising their importance.
Furthermore, we analyse different NFT categories, with Pfp being the most popular, followed by utility, art, and game types.This provides valuable information for investors.We also delve into the pricing distribution, showing most NFTs are mid-range but high-value ones exist among top collections.Moreover, we trace the rapid 2021 growth, subsequent stabilisation, and increasing digital asset value recognition.
Finally, we use ARIMA for volume prediction and machine learning to analyse feature impact on the market."Daily_avg_value" emerges as the most pivotal predictor, followed by "daily_transaction_count" and "daily_max_value", providing actionable insights.
As for future work, further analysis and exploration can be undertaken to delve deeper into the evolving NFT market.Potential avenues include examining the impact of specific events and celebrity endorsements on NFT prices, studying the evolution of NFT contracts and standards, and exploring the potential regulatory challenges and solutions for this emerging market.

Figure 1 .
Figure 1.This figure illustrates our NFT market analysis framework based on the Ethereum research outlined in this paper.

Figure 2 .
Figure 2. A log of ERC721 NFT transfer event.

Figure 3 .
Figure 3.This figure illustrates the distribution of mints and transfers in the NFT market.The x-axis represents the number of NFT collections, and the y-axis indicates the proportion of mints or transfers.Panel (a) presents the distribution of NFT collection mint quantities, while panel (b) depicts the distribution of NFT collection transfer quantities.(a) Distribution of NFT collection mint quantity, and (b) Distribution of NFT collection transfer quantity.

Figure 4 .
Figure 4.This figure shows the NFT creator features.(a) visualises the NFT Creator Graph (NCG).At the same time, (b) shows the outdegree distribution of the graph, and (c) shows the distribution of the number of Contract Addresses (CA) and Externally Owned Accounts (EOA) by year.(a) NCG, (b) Outdegree distribution of NCG, and (c) CA and EOA accounts by year.

Figure 5 .
Figure 5.This figure shows the NFT holder's features.(a) visualised the NFT holders Graph (NHG).At the same time, (b) shows the degree distribution of the graph, and (c)shows the distribution of the NFT collections holders.(a) NHG, (b) Degree distribution of NHG, and (c) Distribution of the NFT collections holders.

Figure 6 .
Figure 6.This figure shows the number of NFT holders of each category and the proportion of NFT collections to the total.(a) displays the ratio of NFT collections in each category to the total.At the same time, (b) shows the proportion of NFT collection holders in each category to the total.(a) The proportion of NFT collections in each category to the total, and (b) The proportion of NFT collection holders in each category to the total.

Figure 7 .
Figure 7.This figure illustrates the price distribution of NFT transactions and ranks the top 1000 NFT collections based on transaction frequency.In panel (a), we observe the distribution of NFT transaction prices, while panel (b) presents the average and median prices for the top 1000 NFT collections with the highest number of transactions.(a) The distribution of NFT transaction prices, (b) The average and median prices for the top 1000 NFT collections with the most transactions.

Figure 8 .
Figure 8.The figure below illustrates the daily transaction prices and total transaction volume trend in the NFT market from January 2018 to April 2022, (a) presents the trend of the daily transaction price and total transaction volume of the NFT market, while (b) displays the logarithm of the daily transaction price and total transaction volume of the NFT market.(a) Daily transaction price and total transaction volume trend of the NFT market, (b) The logarithm of the daily transaction price and total transaction volume in the NFT market.

Figure 9 .
Figure 9.The graph illustrates the daily transaction volume forecasting using the ARIMA model, comparing the actual data and the forecasted values.In Figure (a), both the actual daily transaction volume and the predicted values are presented.Figure (b) displays the 5-day, 10-day, and 20-day moving averages of the predicted transaction volume and the actual 5-day moving average transaction volume.(a) Actual and forecasted daily volume, (b) The figure below shows the 5-day, 10-day, and 20-day moving averages of both the actual and predicted daily transaction volume.

Figure 10 .
Figure 10.The figure compares the fitted plots of the predicted and actual values of the transaction volume of the NFT market by various machine learning models.(a) This figure compares true and predicted transaction volumes to assess the LinearRegression model's fit, (b) This figure compares true and predicted transaction volumes to assess the SVM model's fit, (c) This figure compares true and predicted transaction volumes to assess the CatBoost model's fit, (d) This figure compares true and predicted transaction volumes to assess the XGBoost model's fit, (e) This figure compares true and predicted transaction volumes to assess the RandomForest model's fit, and (f) This figure compares true and predicted transaction volumes to assess the LightGBM model's fit.

Figure 11 .
Figure 11.This figure shows the relative contribution of each feature to the model predictions when using all features as inputs, with the features ranked vertically by importance from most to least influence.(a) SHAP summary plot of the LightGBM model, and (b) The relative importance of each feature was determined by calculating the average absolute SHAP values.

Table 1 .
Example of NFT transactions dataset.

Table 2 .
Example of NFT contracts dataset.

Table 3 .
Description of significant feature Names in the dataset.

Table 4 .
Number of mint and transfer data in our dataset.

Table 5 .
Categories of NFT collection creators and the number created.

Table 6 .
NFT category and statistics on the number of collections, holders, and volume.

Table 9 .
Sample the daily transaction volume in the NFT market, with predicted values from ARIMA(2,2,2).

Table 10 .
Description of features.

Table 11 .
Comparison of performance metrics for different models.

Table 12 .
Feature contribution scores using SHAP in LightGBM.