Exploring Philippine Presidents’ speeches: A sentiment analysis and topic modeling approach

Abstract This study analyzed the annual obligatory and traditional speeches, referred to as State of the Nation Address (SONA), of the 13 past Philippine presidents. The study determined the sentiments, as well as the emergent topics, expressed in these materials. It is found that these SONAs generally expressed positive sentiments while the lowest negative sentiment, on the other hand, was during the martial law period in 1974. Also, it is shown that “development” is the most frequently appeared word among these speeches. The study also revealed that the sentiments of the incoming presidents were lower than that of the outgoing. Moreover, it is shown that these SONAs mainly focused on the following concerns of the country: (a) economic development; (b) enhancement of public services; and (c) addressing challenges. The results of the study translate into the importance of SONA as a venue to discuss and to engage with its people the nation’s state and direction.


Introduction
A state of the nation address (SONA) is a speech delivered by the president of the Philippines. It is a constitutional obligation of the head of the state in a democratic government (Official Gazette of the Philippines, n.d.). As an important annual political event (Xue et al., 2013), it aims to inform the nation about its current political and economic state as well as its future direction based on the governance of the current administration. A SONA is a venue to assert the political and economic policies of the government and to reassure the people on the accomplishment of the commitments presented during the campaign period. For instance, a US leader utilized the inaugural ABOUT THE AUTHOR John Paul P. Miranda is an instructor at the Don Honorio Ventura State University (DHVSU) Mexico Campus. His publications are in data science, data mining, computer science/IT education, and software development. Rex P. Bringula is a professor at the University of the East (UE) College of Computer Studies and Systems. His publications are in computer science/IT education, mobile learning, intelligent tutoring systems, Internet studies, cyberbehavior, e-commerce, web usability, and environmental issues.

PUBLIC INTEREST STATEMENT
This paper employed data mining as a methodology to further understand the speeches of the Philippine presidents from 1935 to 2016 during their state of the nation address (SONA). Several information and insights were derived from analyzing these textual data that are readily available in the Internet. Data mining analysis identified valuable and considerable themes and patterns conveyed in their speeches in the SONAs. It was found that these are focused on economic development, enhancement of public services, and addressing challenges. Hence, data mining approaches presented in the study can be valuable tools for understanding broader and more complex topics from materials such as the speeches of past Philippine presidents. speech (i.e., equivalent of SONA in the Philippines) to describe himself as the one who can save and address the nation's economic security and challenges (Loko, 2018). Moreover, SONA serves as a platform to inform, inspire, appease, persuade, assert, divide, set expectations, or sow enmity from the people (Denton & Woodward, 1998).
Presidential speeches serve as a venue for incumbent president to form political agenda and are sometimes considered as means to persuade the congress, the administration, and the public on important issues (Al-Faki, 2014;Eshbaugh-Soha, 2008;Eshbaugh-Soha, 2010a, 2010bHorváth, 2009). Likewise, presidential speeches contain substantial amount of information (Banguis-Bantawig, 2019; Eshbaugh-Soha, 2010b) and are far more influential than any other speeches made by an individual (Eriyanto, 2000;Eshbaugh-Soha, 2008;Eshbaugh-Soha, 2010a;Grice, 2010;Perloff, 1998) as they can sway people to take action, (Eshbaugh-Soha, 2010a;Rezaei & Nourali, 2016) and are likely to represent different events and developments in a country from different timelines and scopes (Chinwendu Israel & Botchwey, 2017;Mazlum & Afshin, 2016;Shaw, 2017). Rubic-Remorosa (2018) and McDougal (2013) revealed that perennial problems of a country could be seen through the speeches of the president as well as how such speeches are used to convey the possible solutions to these problems (Noermanzah et al., 2018). On the same note, Eshbaugh-Soha (Eshbaugh-Soha, 2010b) and Shaw (2017) suggested that presidential speeches are vital to governance and could only be effective when expressed publicly (Eshbaugh-Soha, 2008) in which the president could manifest desired intentions, policies, and capabilities (Sharififar & Rahimi, 2015)as well as what were achieved during the presidency are manifested (Shayegh & Nabifar, 2012). Several studies were conducted toward understanding and analyzing speeches of presidents (Horváth, 2009;D. Liu & Lei, 2018;Najarzadegan et al., 2017;Rydeen, 2018;Van Dijk, 1997) but only a few studies actually focused on analyzing the sentiments and the topics that could be inferred in annual addresses (e.g., State of the Nation Address, State of the Union) delivered by presidents. As studied by Banguis-Bantawig (2019), speeches of Filipino presidents usually offer more detailed information, a characteristic common to most of the selected Asian presidents involved in the study.
With the introduction of data mining analysis approaches (i.e., topic modeling and sentiment analysis), a previous SONA for example, can be further analyzed and understood. In historical point of view, data mining may provide insights on the issues confronted by previous leaders, assessing the relevance of these issues to current situations. Toward this goal, this study analyzed the SONAs of the 13 past Philippine presidents: (1) Quezon; (2) Osmeña; (3) Roxas; (4) Quirino; (5) Magsaysay; (6) Garcia (7) Macapagal; (8) Marcos; (9) Aquino; (10) Ramos; (11) Estrada; (12) Arroyo; and (13) Aquino III). Specifically, it aimed to: (a) determine and analyze the sentiments of the Philippine Presidents' SONAs; and (b) determine the topic that emerged in the speeches during their term.

Analyses of President's speeches
In the study of Capistrano and Notorio (2020), through scenario planning approach, they examined the SONAs of six Philippine presidents' speeches from 1987 to 2019 to understand the present and future of Philippine tourism. From their study, three major drivers (i.e., tourism policy, development, prospects for the future) were identified and these drivers assisted the development of a model. The same study further implied that the adoption of the model by public institutions can significantly assist both decision and policymakers. Additionally, Capistrano and Notorio predicted that transportation and sustainability are the focus of the future developments in the Philippines.
In a separate study, Ancho et al. (2020) examined the first three SONAs of the three Philippine presidents (i.e., Arroyo, Aquino III, Duterte) and their relation to the issues and concerns in education sector. Using the NVivo software, their study identified one prevailing theme from each president. The analyses on the SONAs, in relation to achieving quality education, suggested the following: (a) Arroyo assumed that it is achieved through strategic planning and sound policies; (b) Aquino III on the other hand, believed that it is achieved through government support; and lastly, (c) Duterte thought that it is achieved by devising preventive measures (i.e., programs and laws) on social ills (e.g., crime, violence, and drugs). Ancho et al. concluded that a SONA is vital as it highlights those requiring immediate attention (i.e., social and political issues).
Both the studies of Capistrano and Notorio and Ancho et al. made use of online resources to gather the SONAs of Philippine presidents and similarly implied that through careful analysis, various assumptions and constructs can be acquired. Budiharto and Meiliana (2018) suggested a framework in collecting opinions using Twitter, a social media platform, based on relevant topics and in applying sentiment analysis to predict election results (Bermingham & Smeaton, 2011;Budiharto & Meiliana, 2018;Mehndiratta et al., 2014) as well as to monitor citizens' mood and perception (Li et al., 2016;Umali et al., 2020). Other studies on sentiment analysis include: measuring the impact of the presidential candidate pronouncement (Le et al., 2017); determining which presidential candidate is favored based on widely discussed topics (Hamling & Agrawal, 2017); investigating the candidate's political behavior based on social media content (DiGrazia et al., 2013); outlining the process for summarizing the important political issues (Stieglitz & Dang-Xuan, 2013); comparing the significant relationships of public mood levels and major events (i.e., social, political, cultural, and economic) (Bollen et al., 2011); understanding the politicians' discourse toward local gun policy using news articles (M'Bareck, 2019); detecting fears on texts (e. g., speeches and documents) from different political, economic, and humanitarian leaders (Hogenraad, 2019); and gaining insights from the tone used by the presidents of the United States of America in their addresses (Rydeen, 2018). Several studies suggested that sentiment analysis can be used to classify phenomenon in text data (Bringula et al., 2019;B. Liu, 2012;Miranda & Martin, 2020;Rahab et al., 2019;Ye et al., 2017).

Topic modeling
Many historians and political analysts seek to find valuable insights toward multiple political events. These events can include, but not limited to, different topics that could be found in multiple speeches of presidents in different contexts across different timelines. as cited in Gautrais et al. (2017), Van Dijk (1997 suggested that political discourse provides salient features on key events happened to a country and how these events evolve with time. In addition, Liu and Lei (2018) and Horváth (2009) showed that a president's inaugural address may be summarized into important topics. Moreover, ideologies could also be found within these speeches including their recurrence (Gautrais et al., 2017).
In his study, Guha (2017) mentioned that through computational techniques, different underlying phenomena could be found within a corpus of documents. One of the widely-used techniques for identifying topics within a large amount of textual data is topic modeling (Dahal et al., 2019). Blei (2012) described topic modeling as an algorithm that enables researchers to organize and summarize large amount of data, and that Latent Dirichlet Allocation (LDA) was the simplest probabilistic topic model. Villadsen (2016)emphasized that topic modeling analysis is a good technique for exploring new insights from presidents' speeches. Other studies utilized LDA in multiple contexts such as eliciting key insights and relevant topics (Villadsen, 2016;Zirn & Stuckenschmidt, 2014), finding the associated events and key themes from e-petitions (Hagen, 2018), analyzing opinions toward video tutorials (Bringula et al., 2019;Miranda & Martin, 2020), revealing interesting information from consumer complaints (Bastani et al., 2019), and obtaining forensic details on criminal activity and their methods from public forum (Porter, 2018).

Data collection procedure, sample size, and sampling design
The SONAs were collected, processed, and analyzed using Python through Jupyter Notebook. The transcript of speeches was collected manually and stored individually in a text file. There is a total of 77 SONAs delivered from 1935 to 1941 and 1945 to 2016 in which 71 were collected from the official gazette of the Philippines, three from ABS-CBN News, and one each from GMA News Online, Rappler, and Presidential Communications Office website. One president (i.e., Laurel) did not deliver an official SONA during the Japanese occupation in the country.

Data cleaning and data preprocessing
Data cleaning and preprocessing were performed to the collected speeches. All speeches were combined together to form the dataset of the study. Next, the combined speeches were tokenized and transformed into lower case. Punctuation marks (e.g., "!", "?", etc.), special characters, and numbers were also removed, and then followed by the application of word stemming. Afterwards, a list of English and Filipino stop words from the Natural Language Toolkit (NLTK) library was used to filter unnecessary texts (e.g., and, the, a, etc.) from the speeches. The cleaned and preprocessed data comprised the corpus containing 656,873 words.

Data analysis
The sentiment values of each speech in the dataset were analyzed. Bigram was used to analyze the words in order to arrive into clearer results. Through the NLTK's sentiment intensity analyzera powerful Python tool libraries aimed at analyzing and making machine understand human language data, the study produced the sentiment values (i.e., positivity and negativity scores) of each sentence, these values were added together and tagged with the year and the president who delivered the SONA (e.g., sentiment value: 14.28, year: 1935, president: Quezon). The accumulated sentiment values of each SONA were examined by subtracting the sentiment value of the first year from the last year, dividing it by the first year and multiplying it by 100 to determine how much has changed (i.e., in percentage) in the sentiment of the presidents during their respective terms. Simultaneously, the aggregated speeches of each president with their first and last year sentiment values were compared to one another in order to identify possible patterns.
For topic modeling, LDA dictionary was used to determine the inferred topics while Multidimensional scaling was utilized to identify the number of topics. Subsequently, the speeches in the dataset were chunked and joined together (e.g., Aquino 2010-2015) to identify which topic emerged from each president. For example, the speeches of President Aquino III from 2010 to 2015 were aggregated.
The authors labeled the results of the LDA. In order to label the LDA results, sentences relating to the words were gathered and analyzed. The labels were based on the context of the sentences. The authors deliberated until they reached a consensus on the labels. Table 1 shows the 10 most occurring words in the corpus. It is interesting to note that "development" is the most occurring word in the previous SONAs. This implies that for the last 82 years (1935 to 2016), the past presidents were indeed focused on the growth of the country. However, the findings on whether the desired growth has been achieved or not is already beyond the scope of this paper. As shown in the same table, the second most appearing word is "people." This result provides an insight that these SONAs were addressing the needs of the citizens. It is understood that as a democratic country, public officials in the Philippines are expected to serve the Filipino people. Figure 1 shows the sentiment values of the SONAs from 1935 to 2016. It can be observed that the highest sentiment positivity score of 82 was from President Garcia's speech in 1961, while the lowest score of −41 was from President Marcos' in 1974. This can be attributed to the fact that President Garcia's speech was highly optimistic during his term. Meanwhile, the 1974 speech was delivered during the Martial Law period (Table 2). This is not surprising since during this period, the country was experiencing civil unrest. It is worth noting that the sentiments show an interesting pattern: the outlooks of the presidents became less positive in the next year and will increase again on the following year relative to the previous year. This pattern has been consistent since 1935. The years with higher sentiment values indicate the aspirations of the presidents. This means that the presidents were hopeful that the country would get better because of the programs that are about to be implemented. On the following year, however, the speech will have to report the problems, concerns, and challenges the administration had encountered relative to the implementation of these programs. The speech of President Aquino in 1987 supports this claim (Table 2).

Results and discussion
Figure 1 also shows that, through the speeches of the presidents, it can be gleaned that the Philippines experienced challenges since the Commonwealth period (1935)(1936)(1937)(1938)(1939)(1940)(1941)(1942)(1943)(1944)(1945). Nonetheless, Filipinos remained resilient and hopeful for better situation. With the exception of 1974, it can be observed from Figure 1 that the overall sentiments throughout the 13 periods were generally positive (29.2% for the 77 speeches).
Among the 13 presidents, Quirino's inaugural SONA has the highest positive sentiments, while that of Estrada's has the lowest positive sentiments. The largest change of sentiments (344%) was drawn from the speech delivered by Aquino III. With the exception of President Quirino (−56%), all presidents reported positive changes about their accomplishments during their presidency. This is reasonable since a SONA is a venue that mainly focuses on reporting the positive outcomes of the programs of the government and not the negative ones. It can be inferred that in terms of the delivery of SONA throughout the Philippine history , all presidents (except Quirino), used the same political strategy.
When the sentiments of the first speech of an incoming president were compared to the last speech of an outgoing president, a pattern also emerged ( Figure 2). The sentiments of the first speech of an incoming president tend to be lower as compared to the sentiments in the speech of an outgoing president. This can be attributed to the fact that an incoming president informs the citizens of the problems that will be passed on to the new administration-a result of the failure of the previous administration to address these problems. The largest change in sentiment was President Aquino III (344%) while the lowest was President Quirino (−56%). President Osmeña had only one SONA. Only President Quirino had a reported negative change in sentiment while President Aquino III had the highest positive change in sentiment (Table 3).  Quezon "They nurture patriotism, loyalty, courage, and discipline." "This necessary control, direction, and coordination of a myriad of essential details represents one of the most difficult and critical problems of the entire project." Osmeña "That great and distinguished friend of the Filipino people, General of the Army Douglas MacArthur, once said that they are only fit to live who are not afraid to die." "In the passing of President Roosevelt we, with the entire world, have suffered an irreparable loss." Roxas "I feel perfectly at home here." "Most of our railroads are depleted of rolling stock and the lines themselves are in a sad state of disrepair." Quirino "This past year additional relief accrued to our people from a substantial increase in employment." "They began again to harass the people dur-ing harvest time." Magsaysay "Today, there is a new spirit of confidence in our land." "In the past, such programs have not made adequate progress because of ineffective implementation and insufficient support." Garcia "It is with deep satisfaction and warranted optimism that I now open the fourth regular session of this Congress." "The Administration before us suffered a rude interruption with the tragic death of my illustrious predecessor, Ramon Magsaysay." Macapagal "We are laying the basis for stable and orderly growth with a stable and orderly currency." "To discourage conspicuous consumption by imposing prohibitive tariffs on luxuries." Marcos "We have still the threat in Mindanao and Sulu but we are in control. Yes, there are ambuscades, you all know what is happening there, and you have the barangays." "I still remember when I was young, my grandfather used to take me on his knee and tell me of the battles of the Revolution." Aquino "I responded with an economic reform program aimed at recovery in the short, and sustainable growth in the long run." "In short, I inherited an economy in shambles and a polity with no institutions save my presidency to serve as the cornerstone of the new democracy that we set out to build." Ramos "Peace and security are the first urgent problem." "But this much we must realize: reform will not come easy." Estrada "Democracy, freedom, and the Constitution are alive and well in this country." "One problem we faced was how to finance the deficit." Arroyo "The third focus to resolve the crises and build a strong republic was to restore macroeconomic stability and win back investor confidence." "And we must be viable if we are to win the most fundamental war, the war against poverty." Aquino III "We recognize the efforts of the MILF to discipline those within its ranks. We are hopeful that the negotiations will begin after Ramadan." "For a long time, our country lost its way in the crooked path. As days go by (since I became President), the massive scope of the problems we have inherited becomes much clearer. I could almost feel the weight of my responsibilities." The findings suggest that Philippine presidents start and end their SONAs with positive outlooks on what they have accomplished during their presidency. With the exception of President Quirino, the past presidents have always started their presidencies with lower sentiments than their predecessors, and have utilized their last speech to highlight their accomplishments.
The LDA results in Table 4 show that there are three topics emerged from these SONAs. The speeches of the past presidents can be summarized into three topics: (a) economic development; (b) enhancement of public services; and (c) addressing challenges. As shown in Table 5, all presidents cited these topics throughout their presidency. However, majority of the presidents (7 out of 13) were more focused on discussing the problems on the implementation of the existing laws and economic programs of the country.

Conclusion, implications, and recommendations
This study intended to determine the sentiments of Philippine Presidents' SONAs and to uncover the topics emerged from their speeches during their respective terms. It is found that speeches reflect the feelings of the presidents about the state of the country. Furthermore, a pattern emerged from the sentiments, i.e., an incoming president had lower sentiments than that of the outgoing. In terms of the topic discussed, the past presidents' speeches, though with different foci, mainly focused on economic  "year", "development", "increase" "program", "economy", "land", "fund", "project", "production", "last" "We must achieve not only a balanced budget with respect to current expenditures but, if possible, generate a surplus which we can use to finance capital expenditures for economic development."-Marcos "Our people were reaping the benefits of schools and sanitation, expanded irrigation systems, increased credit facilities, and improved transportation."-Garcia "In order to encourage agricultural production and eliminate in absenteelandlordism, I propose that Congress study the advisability of imposing special taxes on lands left uncultivated for an unreasonably long period and without justifiable cause."-Magsaysay Economic Development 1 "government", "make", "public", "law", "give", "need", "system","also", "shall", "measure" "Our system of public education must be inspired in Filipino patriotism and consecrated to the formation of citizens of high moral character and civic virtues."-Quezon "Bridges and roads are in crying need of reconstruction."-Roxas "This social bias consists in immediate measures for the poor as well as improving and ensuring the quality of life of the masses."-Arroyo Enhancement of Public Services 2 "must", "people", "nation", "country", "world","national", "many", "problem", "today", "time" "They are working together to creatively solve the problems that have long plagued our country."-Aquino III "This country was entirely preoccupied with the problems of her mighty war effort and her attention was concentrated on the European front."-Osmeña "My principal object has been to know all that one can possibly know-in that brief time-of the problems facing the nation."-Ramos Addressing Challenges development, public services improvement, and addressing challenges. It can be concluded that the Philippine presidents utilized SONAs as fora to express their aspirations and frustrations, as well as their visions in serving the country.
Considering the topics were just the same for all the speeches of the 17 presidents, this implies that the country is still facing the same challenges since the presidency of Quezon. Similarly, the analyses implied that the country does not meet its desired economic growth and public servicesa perennial problem that confronted the 17 presidents.
While the study provided insights on the sentiments and the goals of the previous administrations, there are still areas that are not yet investigated. Future researchers may apply word association algorithms in order to understand further the context of the speeches. Automatic keyword extraction that can classify the topics in real-time may also be developed. Lastly, verb extraction and its usage in the speeches is also a topic worth investigating.