Analysis of Tweet Form’s effect on users’ engagement on Twitter

Abstract This research focuses on the effects on users’ engagement of different tweet forms including text length, text sentiment and the usage of hashtag, mention, video or picture URL. In the first part, we analyze the tweets of five companies from the apparel industry and finds out that there is no universal form that can boost user’s engagement, but in company scale, the effects of different forms between companies are various due to company attributes. Hence, in our second research, we expand the dataset and analyze the formats of tweets from 70 brands focusing on the attribute of the industry section. The conclusion shows that industries such as luxury and hardware technology are more digital sensitive and benefit more using more hashtag and video or picture URL while industry such as software industry is more digital insensitive. The result could provide evidence and guidance for different categories of companies to design tweets with high customer engagements and serve as a reference for enterprises on other media platforms.

ABOUT THE AUTHORS Xu Han, Xingyu Gu, Shuai Peng are master of Marketing from Johns Hopkins University Carey business school. In the time when pursuing the master degree, they learnt data analytics with R and conducted course project about studying the data patterns and predicting heart disease for the Cleveland dataset. After the course, they tried to find interesting and realistic problem and lay their interest on social media marketing. They focused on Twitter and analyzed the tweet form's effect on users' engagement, as their first academic research for publishing. Currently, Xu Han and Xingyu Gu study further at Carey Business school as master student in Information System and dedicate their time to data analysis utilizing more advanced tools. Meanwhile, Shuai Peng works as marketing Specialist in a digital marketing company, applying his knowledge on analysis about social media.

PUBLIC INTEREST STATEMENT
With the prevalence of social media, more brands tend to use their official account on social media such as Twitter to enhance brand loyalty and consumer engagement. This perspective article describes some tips for companies to utilize their Twitter account, based on data gathered via TwitteR, a tool provided by Twitter to obtain public tweets data. It finds that shortening or lengthening the tweet, adding more hashtag or mention, or inserting picture or video will not affect the user's engagement in the same way for different companies. Instead, the effects of different tweets' form are industryvarious. Industries such as software technology and luxury are more digital, sensitive, and can benefit more by using more hashtag and video or picture URL. The retail industry is more insensitive in these tweets' form changing. So, a company should identify its industry feature first when managing the Twitter account. media to social media (Montague, Gazal, Wiedenbeck, & Shepherd, 2016), due to the lower cost and diverse presentation forms for marketing contents (Momany & Alshboul, 2016). Twitter, by virtue of its 68 million monthly active users in the United States (Twitter, 2018), is currently employed by numerous companies as their marketing channel in America (Momany & Alshboul, 2016). In order to establish a deep relationship with users, companies are seeking ways to enhance the users' engagement with tweets and replies, so that the interactions could impress users and thus build stronger brand loyalty (Syrdal & Briggs, 2018) (Burton & Soboleva, 2011).
To figure out what kind of format of tweet can attract more customer engagement, a number of scholars have conducted researches on the qualitative analyses on users' engagement with regards to different advertising formats (Tutaj & Van Reijmersdal, 2012) (Kalro, Sivakumaran, & Marathe, 2017), but there are few studies on quantitative exploration for it and no quantitative findings show how different formats of tweet influence their customer engagements. Besides easily thinking the richer the content of tweet the better of its customer engagement or the simpler the better, there should be some customized strategy for a specific industry of the company. Therefore, based on the previous researches and existing studies, this thesis analyzes the main elements of tweet formation and how these elements affect the users' engagements, respectively, so as to provide advice and direction for companies to devise tweets that could arouse more interaction with users.

Theoretical basis and model
Tweets include couple different components not only texts but also URL, hashtags and mentions. Their contents show different sentiment scores, which could be a very important factor that influences customer engagement of this tweet. To build a relationship between formats of tweets and customer engagement, we did some research below to find some theoretical supports.
People today rarely communicate with text only, while pictures and videos are used more and more widely in communication (Burnett, 2015), including communication in social media marketing. According to the statistics, press releases that include a photo or video get 45% more views than those with text only (James, 2012). These media serve as stimuli to influence viewers' perceiving and consequent engagement (Byrum, 2014). In tweets, the URL demonstrates the presence of pictures and videos, so we set the URL number as an independent variable for the engagement measurement.
Another unique and important feature of the tweet is the hashtag. It serves as a metadata tag used on Twitter to facilitate a search for a specified topic of interest (Antoine, 2016). For companies, the hashtag can be a supplement to brand image, building community and evoking an efficient response, and companies use the hashtag to launch campaigns, engage directly with consumers (Kleinberg, 2013). Thus, the number of hashtags is reasonable to be perceived to influence consumers' engagement.
Based on the observation of some typical brand tweets, the function of mention ("@") is sometimes used to mention brands' stakeholders, who most of the time are celebrities or influencers (Bao, Hua-Wei, Huang, & Chen, 2013). Similar to celebrity endorsement, the function of mention indicates the public endorsement of the celebrity to the brand, thus enhancing the followers' positive attitude (Tang, Ni, Xiong, & Zhu, 2015). When Twitter extents its tweet length limits, many researches were conducted to exam the effect of longer tweets. Some literature already indicate that longer text increases the complexity in perception (Arrabal-Sánchez & De-Aguilera-Moyano, 2016), but according to SocialFlow, tweets longer than the old 140-character limit are great for getting attention and engagement (Cho, 2012). Tweets that below 140 characters were retweeted on average 13.71 times and received 29.96 likes. Go above 140 characters and the retweets jump to 26.52 and the likes to 50.28 (Sherman, 2017). So, the tweet length is set as an independent variable.
Twitter offers a unique dataset in the world of brand sentiment as companies can receive sentiment messages directly from consumers. Both the targeted and competing brands can dissect these messages to determine changes in consumer sentiment (Ghiassi, Skinner, & Zimbra, 2013). But the sentiment of tweets posted by the brand can also influence consumer's engagement, as a strong argument with emotion offers a convincing manner (Byrum, 2014). To measure the sentiment of tweets, we use sentiment score as the independent variable.
The concept of customer engagement provides a construct that comprises the total set of behavioral activities toward a firm (Johanna, Veronica, Emil, & Minna, 2012), reflecting the broadest theoretical perspectives of the paradigm of services and of relational marketing and is able to provide enhanced predictive and explanatory power of focal consumer behavior outcomes such as brand loyalty (Linda, Mark, Glynn, & Roderick, 2014).
As for the measurement of users' engagement, a comprehensive algorithm formula ( The algorithm covers all the possible actions of the Twitter users that could enable them to interact with the brand and thus reduces the inaccuracy of the result. Moreover, such a formula is also flexible according to the practical situation. Therefore, we choose it as the original algorithm for Twitter engagement calculation.
In this study, we focus on the engagement for every single tweet, so there is no time period for the measurement of all mentioned actions, impressions and reach, as well as other interactions. Besides, only very few tweets contain reply, so we eliminate the effect the reply in the algorithm to reduce further bias. Therefore, only Interaction of diffusion and Interaction of approval are left in terms of user interactions, and their original ratio 3:1.5 could be simplified to 2:1. Furthermore, the number of impressions is almost the same to that of reach, and hence the calculation "impressions/reach" in the algorithm is equal to 1. So, the below-revised algorithm formula will be adopted in the calculation and analysis.

Hypothesis generation
Therefore, to figure out the influence on customer engagement from the contents of tweets, we make several hypotheses to test the influence contents of tweet and customer engagement. Each type of tweet forms may influence the engagement which can be demonstrated by two attributes, Favorite and Retweet behavior ( Figure 3). H1: the number of URL in a tweet will influence the engagement.
H2: the number of hashtag in a tweet will influence the engagement.
H3: the number of at in a tweet will influence the engagement.
H4: the length of a tweet will influence the engagement.. H5: the sentiment of a tweet will influence the engagement.

Data collection
To extract information such as tweets and followers from each account, we use R Language and the TwitteR package. By requiring and utilizing APIs from Twitter (authentication from twitter developer website, apps.twitter.com), we obtain 3200 tweets at maximum for each account including text content, favorite count, created time, retweet count and so on. To setup a data set for further analysis, different pre-handling steps are needed (See Figure A1 in Appendix).

Data sample selection
In deciding which companies' Twitter accounts should be chosen, we exam several criteria to guarantee the quality of rudimental data for further analysis. First of all, the Twitter accounts of these companies must be active, with abundant tweets daily or weekly. It also demonstrates that these companies indeed spend marketing efforts on social media. Besides, all selected companies should be in the same industry to eliminate the potential risk of heterogeneity in different industries (Sevin, 2013). Also, the companies should be the leading companies in the industry so that the result can be more instructive to a certain industry. For the measurement of this criterion, we focus on the followers in each account, as more followers mean greater brand awareness and higher possibility of quality communication.
Enlightened by Sevin (2013), we exam Twitter accounts in several industries such as airlines, smartphone manufacturers, supermarkets and apparel. And ultimately, we decide to collect data from apparel industry and specifically, from Forever 21 (@Forever21), Gap (@Gap), H&M (@hm), Abercrombie & Fitch (@Abercrombie) and GUESS (@GUESS). Their Twitter accounts all enjoy hundred-thousands of followers and thousands of "favorite".

Data pre-analysis processing
First, tweets returned by TwitteR inquiry contain both original tweets post by the brand and retweets sent to other users. As retweets by the brand are hard to be observed by its followers and usually irrelevant with its own marketing attempt, we exclude from the rudimental dataset about retweets with "mention" as initial in text attribute.
Before data cleaning, we count the number of hashtags and URLs as they are variables that we need in SPSS analysis. Then, as the process of sentiment analysis, we practice data cleaning to the text, eliminating irrelevant information in the corpus, including URLs, punctuation, number and stop words (Feinerer, Hornik, & Software, 2017). After the data cleaning, some rows in the data frame are vacant in the text, hence we have to delete these rows too for sentiment analysis (Pak & Paroubek, 2010). By Sentiment140 package, we categorize the dataset into three classes: positive sentiments, negative sentiments and a set of objective texts (no sentiments), assigned 1, −1, 0, respectively (See Figure A2 in Appendix).

Data description
After the data collection and preliminary data process, we finally get a dataset including 316 rows of data. The columns of the table include (1) relevant information: post-data-cleaning text, brand name; (2) independent variables: URL number, hashtag number, mention number, tweet's length, sentimental score (set positive and negative separately as dummy variable), and (3) dependent variables: favorite count, retweet count, reach, impression and engagement ratio.
From the description of data (Table 1), we can see that the average number of favorite count of a tweet is 141.5 and the average number of retweet count of a tweet is 17.75. In our case, the influence on engagement caused by the number of favorites is higher than the number of retweets. Besides, most tweets have no mention and hashtag but have more than one URL. We can see that the median of the favorite count, Retweet count and engagement score are lower than the means, which means there are some extremely big number in these dimensions.

Empirical analysis
According to the previous hypotheses and meanwhile considering the law of diminishing marginal utility, we use log-linear regression to test these hypotheses. To avoid interaction influence among companies, we create dummy variables for different companies to decrease the influence caused by companies themselves. Finally, we got the following result. From the result, we use 95% confidence interval and compare it to a p-value of all independent variables from three different regressions. So, we can have a conclusion to our hypothesis based on the statistical significance.

Conclusion
Through in-depth evaluation and comprehensive analysis, we find that all these five independent variables (number of URLs, number of hashtags, number of mentions, length of the tweets, and sentiment of the tweets) do not pose direct influence on users' engagement of the tweet, which is different to public common sense. Namely, there is a general recommendation for these five companies to spend efforts on specific tweets' format as the result show that no independent variables will affect the engagement (Table 3). But from the regression result, we find that the companies factors (dummy variables in Table 2) are statistically significant, which means that different companies have different Twitter performance according to their own format of tweets, their brand image and other individual factors, such as industry section. Rather than analyzing the relationship between the contents of individual tweets and their engagement performance, it is more necessary to explore the tweets difference between companies. So, instead, based on the individual tweets, we conduct a further analysis based on individual companies, to examine if there is any pattern in groups of companies that have similar Twitter performance.

Data sample selection
In this section, we expand our selection of companies and their Twitter accounts. Because the number of followers can indicate the activity of companies' social media marketing to some extent

Data description
We collect data from 70 different companies and filter 46,822 pieces' tweets as our dataset, which have all the factors we need to analyze. We summarize all data in Table 4, and you can see all tweets have diversified coponents, which give us more chance to analyze the influence of format of tweet on its engagement. As we conclude above, for different companies, their Twitter performance (tweets engagement, the number of retweets and number of favorites) are determined by different factors statistical significantly. For example, some companies' Twitters performance may be significantly influenced by URL numbers when others may be not by it but hashtag number. Also, the influence can be positive or negative to different companies.
So, we summarize all the factors that are both positive significant and negative significant for each company and use 1 as the positive coefficient, −1 as the negative coefficient and 0 we present in the table shows that this independent variable is not statistically significant to the specific company. Then, a company with result 1 should enrich its use on this variable. If a factor is significant, we then call the company is sensitive towards the factors and the proportion of these companies in the same industry is the sensitive rate of a factor for this industry.
General speaking, for the influence of tweet contents on engagement, almost 1/3 companies are sensitive on the URL number and hashtag number, reaching 31.43% and 34.29%, respectively. Among these companies, 2/3 of their engagement performance are positively influenced. Mention number has an equal rate in positive sensitive and negative sensitive while the length is mainly a negative influence on engagement with 7.14% positive sensitive ratio and 17.14% negative sensitive rate (Table 5). As for sentiment score, it affects the engagement score of companies only in a positive way.

Data empirical analysis
From Table 6, we can see that the retail industry is a good category of companies to design different messages to increase companies' customer engagement. The number of URL, hashtag and sentiment score of tweets have a positive influence on 14.29%, 28.57% and 42.86% of the companies' customer engagement, respectively, and no negative influence on any company's customer engagements. Besides, the number of mention and the length of a tweet have a negative influence on 14.29% and 14.29% of the companies' customer engagement and no positive influence on any company's customer engagement. All in all, retail companies should make their tweets content including more URL, hashtag and positive words by decreasing mentions and length at the same time to increase customer engagement.
Besides, the sentiment score has no negative influence on any company's customer engagement and has a positive influence on 18.57% of the companies' customer engagement. It positively influences 42.86% of the retail companies' customer engagement, who are influenced by sentiments score most severely. Generally, people prefer positive content, and companies should take care of their tweets design by making their contents more positive.
Technology companies who based on hardware business have different influences from tweets format design compare with those who based on the software business. Different

Engagement-Tweets format relationship results
formats have some influence on 40% of the companies' customer engagement who are technology companies based on hardware business, but only 14.67% of those technology companies based on the software business. The format design of tweets matters more for technology companies based on hardware business compare with those based on the software business.
Finally, if game companies use more hashtags and fewer mentions in their tweets and technology companies use fewer words in their tweets content, their tweets' formats will help them to increase customer engagement.

Conclusion
Based on the comprehensive analysis on tweets from above-mentioned brands and companies, we come up with the conclusion that company marketing teams should design the tweet with considerations toward which category their company lies in, since different forms of tweets will contribute to the different results of engagement score in terms of different categories of Overall relative sensitive rate(tech-soft) 6.67% 6.67% -6.67% -13.33% 0.00% -1.33%

Engagement-Tweets format relationship summary
companies. For managers, when considering the utilization of Twitter as part of social media marketing strategy, it is better to analyze the Twitter performance of companies similar patterns, in our study, same industry, before generating own tweets content. In that way, managers can recognize whether the company is necessary to manage the Twitter account in terms of social media sensitivity and which tweet format will improve the Twitter engagement at most.

Limitation
In the study, we focus on the engagement for tweets as our main dependent variable. And we generally indicate that the engagement is the only goal for the company using Twitter. But sometimes the company will use Twitter for other reasons such as announcement which do not care much about engagement. Also, we do not consider the time period for the measurement of the actions, which is hard to obtain from TwitteR. Further study can include these considerations.
Based on our first data analysis, we find the further research necessity on the exam the relationship between tweets' engagement performance regarding different formats and companies' different attributes. In this thesis, we only focus on industry section as one feature of companies. Further study can explore other factors that differ the results on tweet performance, to provide other aspects for companies considering applying Twitter.
Besides, the number of companies we select is limited and not applicable to log-linear analysis. So our conclusion about how a company should design the format of its tweet is based on the descriptive analysis. We will expand our database with more companies from different industries to conduct our research and find out statistical significance to approve our conclusion.
Figures below show R codes this study used to catch, clean and reorganize data into formats that can be done further analysis. These codes can be replicable used to address further or related conclusions of this study.