eWOM for public institutions: application to the case of the Portuguese Army

Social media platforms provide easy access to the public opinion (called electronic word-of-mouth), which can be collected and analyzed to extract knowledge about the reputation of an organization. Monitoring this reputation in the public sector may bring several benefits for its institutions, especially in supporting decision-making and developing marketing campaigns. Thus, to offer a solution aimed at the needs of this sector, the goal of this research was to develop a methodology capable of extracting relevant information about eWOM in social media, using text mining and natural language processing techniques. Our goal was achieved through a methodology capable of handling the small amount of information regarding public state organizations in social media. Additionally, our work was validated using the context of the Portuguese Army and revealed the potential to provide indicators of institutional reputation. Our results present one of the first cases of the application of this type of techniques to an Army organization and to understand its negative reputation among the population.


Introduction
With the widespread use of the Internet, our social behavior was reproduced in the digital world. Social media applications are an example of this phenomenon, providing an environment designed to stimulate behaviors of online sharing and interaction that are natural to humans. The growing communication through these applications (blogs, social networks or online communities) offers several benefits for its users, including the possibility of learning information about organizations, services and products (Hennig-Thurau et al. 2004).
The search for information online gives access to electronic word-of-mouth (eWOM) defined as "any negative or positive statement, made by potential, actual or former customer about a product or organization and that is available for several people and institutions through the Internet" (Hennig-Thurau et al. 2004). This information publicly available plays an important role in the users' decision-making process (Ash 1951;Banerjee 1992), thus becoming crucial to the performance of organizations. Together, the high volume of data generated in social media and the existing eWOM become a source of information with the potential to support reputation management and increase business competitiveness through business intelligence (He et al. 2015).
Although the relevance of eWOM in the private sector is well known, its monitoring in the public sector is limited (Gurău 2008) and relies mainly on methods that are not sufficiently customized, comprehensive and able to provide up-to-date information. The problem in this sector is worsened by its organizational culture, limited human resources, budget constraints and low technological conditions, which make it difficult to develop efficient and accessible solutions (Gurău 2008). Thus, to meet the needs of the public sector, this study aims to develop a methodology that extracts relevant information about eWOM in social media, using text mining and natural language processing techniques. The resulting solution must be accessible, customizable and systematic, to overcome the identified constraints and to allow the monitorization of institutional reputation.
To demonstrate and evaluate the proposed methodology, we will use the case of the Portuguese Army, which currently faces recruitment and retention difficulties that

eWOM
Word-of-mouth (WOM) refers to the interpersonal influence exercised by the sharing of information about products, services or brands, which impact the customer's decisions. With the use of the Internet, these public opinions have also come to be expressed and consulted online, giving rise to the concept of electronic word-of-mouth (eWOM): "any negative or positive statement, made by potential, current or former customer, about a product or organization and which is available to various people and institutions through the Internet" (Hennig-Thurau et al. 2004). This eWOM is shared through several platforms, such as forums, social networks, microblogs, discussion groups, news sites or review platform, and intends to provide information or recommendations (Cheung and Thadani 2012).
Potential customers tend to trust in WOM and eWOM more than in traditional advertising media (Cheung and Thadani 2012) and to rely on this interpersonal communication to shape their expectations, create perceptions, make choices and acquisitions (Jalilvand et al. 2011). In addition to these effects on the potential client's decision, eWOM is also characterized by the fast message dissemination, lack of control by the organizations and high cost to reverse negative opinion consequences. Therefore, monitoring these phenomena is essential to manage the reputation, image and success of organizations (Kirby and Marsden 2006) and therefore to improve the customer and employee recruitment, the suppliers' relationship and the influencers' support (Christopher et al. 2002).

Social media
The term social media emerged with Web 2.0 and can be defined as a set of applications that allow creating, sharing and consulting content generated by users through the Internet. Currently, there is a great diversity of applications that can be included in the context of social media, as an example, they can be grouped into: blogs, collaborative projects, social networks, online communities, virtual games, virtual worlds (Kaplan and Haenlein 2010). With the emergence of new platforms, literature also includes microblogs, social bookmarks, media sharing, among others (Sterne 2010). Nowadays, social media has facilitated open communication between users and organizations, leading to a constant flow of data useful for the development of strategies to increase profit, decrease costs and proximity to the client (Sterne 2010). To monitor such data, eWOM has been considered one of the main indicators on the market and public opinion, suggesting the use of social media as a tool for analysis through discussion and publications made by its users.

eWOM, social media and reputation in the public sector
Currently, the study of a reputation for institutions in the public sector has limited scientific evidence, usually refers to situations limited in time (e.g., analyzing the results of a marketing campaign) (Hoye and Lievens 2007) and uses self-response assessments such as questionnaires (Luomaaho 2007). Despite the short amount of similar work, the literature shows that, measuring eWOM as a way of monitoring an organization's reputation, is a modern and increasingly used practice, with documented cases in the banking (Augusto and Torres 2018), hotel (Anagnostopoulou et al. 2020), commerce (Kousheshi et al. 2019), academic (Le et al. 2019) and health sectors (Purcărea et al. 2015). The solutions used for evaluating the users' opinions can be divided into two components: collection methods and analysis methods. Regarding data collection, eWOM is usually obtained from conventional sources such as interviews (Royo-Vela and Casamassima 2011), face-to-face questionnaires (Kirby and Marsden 2006;Christopher et al. 2002;Hoye and Lievens 2007;Vázquez-Casielles et al. 2013), telephone questions (Anderson 1998) or internet surveys (Lee and Youn 2015;Poulis et al. 2019). Other alternatives include metrics obtained from social networking sites (Constantinides 2014) or online comments/reviews (Le et al. 2019) and, although less frequent, through automated data collection using the application program interface (API) of applications such as Twitter (Jansen et al. 2009), Facebook (Lo and Lin 2017) or TripAdvisor (Saura et al. 2018). This automation stands out for its significant benefits, namely, the fast extraction of a larger amount of data (due to the process automation), with greater accuracy (do not require the use of memory) and without restrictions (it is not limited to concrete questions or socially accepted answers) (Vargo et al. 2019). However, it is recommended that social networks are not the only source of information used for eWOM analysis, with some authors suggesting the use of other online platforms to expand the population and the information included in the online reputation analysis (Aramendia-Muneta 2017). Regarding the methods of analysis, models developed for eWOM use mainly conventional statistics, with confirmatory and/or exploratory analysis. These methods are suitable for low data volumes, to answer concrete questions and for processes with low automation (Hand 1999). Alternatively, text mining techniques are being used for analyzing large unstructured data sets and discover existing patterns (Aggarwal 2011). Its application in social media content and eWOM analysis shows potential for discovering the most commonly used terms, opinions, message characteristics and their value related to brand products or services (Sterne 2010;Jansen et al. 2009).
Overall, the state of the art suggests that, although innovative, it is possible to develop a solution capable of studying reputation using automatic extraction and text mining methods to analyze the eWOM from social media.

Our proposal
In this section, we propose a methodology based on the Cross Industry Standard Process for Data Mining (CRISP-DM) (Wirth and Hipp 2000) as shown in Fig. 1. This adaptation to the original model aims to respond specifically to the public sector needs and presupposes the lack of data available; the difficulty in implementing a fully automated system; and the importance of developing personalized recommendations. Thus, we used a set of phases and tasks for extracting a comprehensive sample of opinions in social media which, through text mining and natural language processing, allows to evaluate an institution's reputation. In  Wirth and Hipp (2000) order to demonstrate its application, we will describe each phase and present the results obtained with its application to the Portuguese Army. Additionally, this section includes the replication of results for the Portuguese Navy, Air Force and Armed Forces.

Institutional characteristics
The first stage refers to the institutional context analysis and its requirements. In the case of the Portuguese military forces, they are responsible for the readiness and operational use of forces to perform national defense missions. They are organized in different institutions, among which are three military branches-Navy, Army and Air Force.
Since 2004, Portugal has abolished mandatory military service and adopted an all-volunteer force. Since then, military institutions need to compete in the labor market and differentiate themselves from other employers in order to attract human resources capable of maintaining their operational readiness. 1 For this purpose, the defense institutions have implemented several perks, benefits and marketing campaigns aimed at raising recruitment rates. Such measures included the increase of online presence through social networks, websites and other communication channels. Despite those efforts, the Portuguese Armed Forces experience low rates of recruitment and retention which led to a shortage of military personnel and compromises the proper fulfillment of defense missions. This need is particularly severe for junior enlisted 2 "Praças"): population between 18 and 24 years of age, with a middle or high school diploma, adequate physical and mental fitness and willingness to serve with a maximum of 6 years contract.
This negative trend has been reported in the Portuguese Army, wherein 2018 recruitment reached less than one applicant per opening. Faced with this situation, the Army studied the most important influence for the enlistment of new recruits and concluded that most applications for military service come from the advice of family and friends (civilian or military) or information obtained through social media platforms (Silva et al. 2019). This influence exerted by external sources of information has already been reported in studies regarding the military context (Lievens 2007) and, together, these results suggest the importance of eWOM for the analysis of military institutions attractiveness and for supporting institutional image management.

Identification of information sources
After collecting institutional information and according to the proposed methodology, this phase aims to identify sources of information in social media containing enough content for the application of text mining techniques, but also capable of reflecting the broad spectrum of online users' opinion about the institution.
In the scope of this study, a specialized army officer supervised and provided help during the development of each one of the following tasks:

Select target population
This step aims to define the characteristics of users who create or search online information and that are relevant to the organization success. In the case of the Portuguese Army, previous studies on recruits have suggested that the decision to enlist is often influenced by referrers (parents or friends of the potential candidate) and social networks (Silva et al. 2019). Thus, in addition to the opinion of citizens eligible for military service, we intended to access a broad public opinion and chose not to restrict information by users.

Search terms
This task consists of identifying terms that can be used to search for online information sources. For the subject of Portuguese Army, Sect. 3.1 provides the context understanding required to develop an extended list of search terms. During the search process, supervised by an army officer, terms that did not return results or did not meet the selection criteria were eliminated (e.g., "serviço militar", "tropa", "treino militar"). The final list included 10 military related terms comprehensive enough to produce many research results but also related to the case study (e.g., "exército", "ministério da defesa", "forças armadas portuguesas").

Selection of information sources
This task consists of selecting the sources of information capable of providing unstructured data in sufficient quantity and diversity, suitable for the analysis of the reputation of an institution.
In the case of the Portuguese Army, the selection of eligible information sources was an iterative task, requiring the definition of inclusion and exclusion criteria. Thus, we excluded information sources having few interactions, deactivated comments section or excessively specific information (e.g., armored brigade Facebook community). In addition, we included sources with at least 3 posts and 50 comments related to the previously defined search terms.

Information collection
Following the selection of information sources, this phase intends to develop the necessary processes for extracting information. Overall, this phase included the following steps:

Information extraction
This step refers to a set of tasks for designing and performing the process of collecting information. These tasks included the selection of which attributes to extract in each information source, the development of derived attributes, the selection of a date range for data extraction, the definition of criteria for filtering publications and the development of the extraction process with web scraping.
For this study, we used Web Scraping to automatically collect information from each selected page in social media applications (e.g., Facebook page @ExercitoRecrutamento). It must be noted that the diversity of information sources, the type of content and the need for inclusion/exclusion criteria make this complex process. Thus, total automation was not achieved, and a human intervention was required for validation. To illustrate these difficulties, during information extraction the Armed Forces official pages revealed several interactions but with little content, in contrast, the independent sources had fewer publications about the subject but generated more information exchange between users. Thus, to extract relevant and sufficient content, the web scraping covered a time span of 2 years (2018-2019) for official sources and 5 years (2014-2019) for independent sources. In total, our dataset contained 13,358 comments and 706 posts, collected from 19 information sources.

Information exploration
This task aims to analyze the characteristics of the information extracted. During its development, information collected in social media presented a number of constraints, of which we highlight texts with a low number of characters; the identification of users in comments; text with links and/ or quotations; the existence of promotional publications repetitive responses or user-specific actions; publications without answers; and also, texts with themes not related to the institution. Additionally, for the present study, the frequent use of emojis without any other additional text in some social media platforms (e.g., Instagram) was also considered a constraint, not only due to the limitations of adopting an emoji lexicon (Fernández-Gavilanes et al. 2018), but specially because emojis alone provide little meaningful information regarding the institutional reputation.

Information description
This step aims to describe and evaluate the extracted information, analyzing its feasibility for this study and building the first hypotheses about its patterns. The main results obtained during this stage are shown in Table 1 and reveal two different types of information sources: the official social media pages, controlled by the institutions and showing a higher number of posts; and the independent information sources, which include news channels, online communities and forums, where comments shared by its users were most frequent, more extensive and reaches a higher number of commenting users.

Information preparation
This phase aims to transform the collected information into data ready for computation and modeling. The tasks of preparing information refer to a set of procedures which should include the following: (1) standardization of attributes from different sources; (2) merge of all extracted information; (3) deletion of rows with empty records; (4) word transformation (e.g., abbreviations into words); and (5) detection and elimination of usernames in text.
After the preparation, we recommend the application of the following preprocessing techniques: (1) transformation of text to lower case; (2) elimination of punctuation marks; (3) tokenization; and (4) stemming or lemmatization, as a way to reduce the feature space.

Knowledge extraction
The knowledge extraction phase aims to analyze institutional eWOM. To demonstrate this capability, we used the collected dataset to analyze opinions and polarities related to the Portuguese Army and then tested our methodology's generalizability through replication to other branches and the Armed Forces in general. Finally, we evaluated the possibility of training automatic classification models and topic analysis.

Manual sentiment analysis
To overcome the lack of trained models for Portuguese sentiment classification and the specific features of the collected comments (such as the frequent use of military vocabulary, irony and spelling errors), we decided to perform a manual classification. This task must be executed by experts in the field and is time-consuming; thus, a subset of the collected information was selected to demonstrate the proposed eWOM analysis.
To perform this sentiment analysis, we used the criteria of information independence, higher number of comments and higher number of characters to select the best source available: the Reddit community. The dataset extracted from Reddit is described in Table 2 and reveals a balanced number of publications and comments for each institution. In  the same table, user engagement is calculated by averaging the number of comments per publication and suggests that the topic of defense-evaluated through the "Armed Forces"-has reached the highest number of users, followed by the Army and Navy branches. These results suggest that shared information on the Armed Forces, Navy and Army stimulates greater interaction than posts about the Air Force.
In addition to interaction metrics, the content and polarity of these comments were analyzed. Regarding the polarity, Fig. 2 reveals a prevalence of negative feelings over positive ones, suggesting an unfavorable eWOM toward the military institution. This negative opinion occurred more frequently in the Armed Forces, with 70% (N = 199) of comments revealing negative opinions, but also for the branches, with a higher incidence in the Navy with 62% (N = 172) and the Army with 61% (N = 144). These results are congruent with the recruitment shortfall reported by the branches but also with studies showing greater organizational satisfaction among the military serving in the Air Force. 3 Regarding the text length for each category in Table 3, the average comment length showed that negative opinions were the most elaborate. These results also revealed that information shared about the Armed Forces is the most extensive, the Navy had significantly shorter comments, and the Army showed the most balanced size of positive and negative comments. Overall, the analysis revealed that negative opinions were more frequent and extensive, which-combined with the superior impact of negative eWOM over the positive-may have a significant impact on the reputation of defense institutions and their attractiveness.
After the dataset description, we performed a sentiment analysis using comments about the Portuguese Army. At first, the annual eWOM variation was visualized using Fig. 3, which revealed differences in the frequency and polarity of comments for each year and relevant events.
Next, we performed the term frequency and applied it to a word cloud in Fig. 4. In this figure, red includes words found mainly in negative comments about the Army, green refers to mostly positive terms, and black represents words used equally in both categories. Regarding the negative eWOM, our results suggest an overall negative perception of the institution, showed by a predominance of red terms and with the word "Army" standing out in the word cloud. These negative opinions also included terms such as "being", "Portugal", "force", "armed forces", "money", "homeland" and "defend" referring to a perception of low public utility and questioning the usefulness of the public funds the institution receives. Additionally, they also mentioned the terms "contract", "years", "personnel", "job market" and "life", which reveal the Army's image as an employer regarding its contractual terms for enlisted, military life and staffing shortage.
On the opposite direction, positive opinions (in green) highlight the Army's strengths, such as the training programs provided after basic training ("course", "training"), the duty and the soldier and officer ranks.
In addition to this interpretation, Table 4 presents a list of key topics and the corresponding most informative terms, also including a percentage of the occurrence of these terms within negative or positive comments.  Next, we evaluated the results regarding the application of the proposed methodology in new contexts. Initially, we used comments related to the Portuguese Navy and developed the sentiment analysis visualized through the word cloud in Fig. 5.
Results show a different pattern from the Army and the term "Navy" being neutral. Regarding negative eWOM (in red) refers the Navy public utility ("country", "festival") 4 working conditions or staffing shortage ("shit", "personnel") and performance ("drone", "sinking"). On the positive side, the green terms revealed that the academic training ("naval" (academy), "mathematics") and the institution's equipment and mission ("ships", "submarine", "patrol", "base") were positively perceived by users. In addition to this interpretation, Table 5 presents a list of key topics and the corresponding most informative terms, also including a percentage of the occurrence of these terms within negative or positive comments.
Secondly, we replicated the analysis for the Portuguese Air Force. The resulting word cloud is shown in Fig. 6 and suggest a positive perception (in green) about its military training ("course", "academy"), the professional characteristics ("pilot", "work") and the civilian professional opportunities it provides ("airline"). In the opposite direction, the negative (red) terms demonstrate the impact of criminal activity inside the Air Force and publicized throughout the media ("system", "millions"), but also, the lack of readiness in defense support of civil authorities ("competence", "civilian", "military", "aircraft"). Table 6 presents a list of key topics and the corresponding most informative terms, also including a percentage of the occurrence of these terms within negative or positive comments.
Finally, the global reputation of the Armed Forces was also analyzed using the data extracted on this subject. As for the military branches, it was possible to explore the eWOM communication related to the defense sector through the terms found in positive and negative comments shown in Fig. 7. For negative opinions-in red-we find terms related to the possibility of reintroducing compulsory military service ("compulsory military", "against"), the presence of Portugal in the North Atlantic Treaty Organization (NATO), the cost/benefit ratio for spending public money in military institutions ("defense", "portugal", "the need", "state", "pay", "money") and the possibility of an armed conflict ("war"). Additionally, the shortage of military personnel ("lack", "problem", "less", "military personnel") and the working conditions or contractual terms offered ("salary", "life", "conditions") were unfavorable issues among the population. In the opposite sense, the positive terms are less frequent in the resulting word cloud, however, they present a positive opinion related to the basic and specialized training ("basic training" and "course") and the deployment opportunities ("missions", "countries"). It is important to note that, while a positive image of military service is necessary to increase recruitment, the terms "military service", "armed forces", "youth" and "army" are often used in both positive and negative opinions."). To support this interpretation, Table 7 presents a list of key topics and the corresponding most informative terms, also including a percentage of the occurrence of these terms within negative or positive comments.

Automatic sentiment analysis
After confirming the applicability of manual sentiment analysis, we tested the possibility of developing an automatic sentiment classification model using the Reddit comments dataset. Initially, the Portuguese lexicon "SentiLexPT-02" (Silva et al. 2012) was used to automatically classify comments. However, the model performance was below the expected and motivated us to the use supervised learning techniques, namely generative classifiers (Naive Bayes) and discriminatory classifiers (maximum entropy/logistic regression and support vector machine). The use of these traditional machine learning techniques-as opposed to deep learning approaches-was based on their performance without larger datasets (O'Mahony et al. 2019). Different models were tested with the previously mentioned classifiers, using 80% of the data for training and 20% for evaluation. We have used word-based features, the bag-of-words and TF-IDF weighting schemes. We have also used stemming for word normalization by applying the RSLP Stemmer (Orengo 2001), a rule-based suffix stripping algorithm created for the Portuguese Language that works well even with out-of-vocabulary words. We have used the RSLP implementation included in the NLTK library.

Topic modeling
Finally, topic modeling analysis was performed to test the methodology's ability to detect emerging themes from social media comments. The optimal topic model was obtained using TF-IDF for feature extraction (terms with minimum frequency of 3 and trigrams) and LSA for dimensional reduction. This model included 5 topics detected in the comments on the institutions under analysis and which are described in Table 8. The first topic presents generic terms related to military service in the Army. The 2nd topic is more specific and reveals the discussion on the usefulness of the Armed Forces, which covers compulsory military service and membership in NATO. Regarding topics 3 and 4, they address respectively the operational component of the Armed Forces and the branches (Air Force and Navy). Finally, topic 5, although less specific, denotes a negative image about the public defense institutions.

Evaluation
After the knowledge extraction, the evaluation phase aims to evaluate and select the results and models obtained during the previous phase. To do so, this section begins by assessing the suitability of manual sentiment classification for this

Manual sentiment analysis
Regarding the manual categorization of sentiments, the level of subjectivity was evaluated through interrater agreement. Due to the limited number of experts available, we asked a second rater to classify each sentiment in the Reddit dataset and then measured the resulting Cohen's Kappa. Our expert's agreement showed a moderate and acceptable interrater score (k = 0.62) for this type of technique (McHugh 2012).

Automatic sentiment analysis
Regarding the models for automatic sentiment classification, Table 9 reveals the performance measures calculated from the comparison between the reference and the automatic classifications obtained. In general, most models presented acceptable results; however, NB, LR and SVM classifiers showed better performance when text was normalized using stemming and features where extracted using bag-of-word.

Topic modeling
Regarding the topic model developed, its evaluation was based on the estimation obtained in 10 comments reserved for testing. When comparing the results with the textual content, it was confirmed that the topic was correctly assigned for 80% of the test sample.

Recommendations
In this section, the extracted knowledge is used to carry out a set of recommendations for monitoring and managing the institutional image. To demonstrate this application, this section describes customized suggestions for the Portuguese Army, the remaining branches and Armed Forces General Staff.

Recommendations for managing the Army's reputation
The shortage of military personnel in the Army together with the level of negative eWOM detected in social media suggest the need for an institutional reputation management strategy capable of improving its attractiveness. This strategy should be developed by a dedicated department with access to up-to-date information and responsible for: • Implementing a marketing intelligence tool based in our methodology to analyze both independent and official social media. The application of eWOM to provide insight on public opinion (Anagnostopoulou et al. 2020) and the benefits of automatic extraction and analysis (Aggarwal 2011) have been previously described and, together, have proven to be capable of providing useful decision support for institutional reputation management. Thus, the continuous monitorization would allow to quickly intervene when online communication is potentially harmful to the institution's goals. • Supporting the development of a national strategy for the Army's brand, ensuring consistent and coherent communication. The importance of managing the Army's brand to improve its attractiveness has been previously suggested (Hoye and Lievens 2007;Lievens 2007) and, according to our findings during information source selection and analysis, this would allow to reduce the number of official social media platforms, centralize the institutional message, reach a broader audience and enhance the user's interaction. • Supporting the Army's command structure, proposing new policies and measures to maintain or improve positively perceived Army features (such as training programs), but also, to intervene in the negative aspects that can damage its institutional reputation (e.g., improving its attractiveness in the labor market) (Kirby and Marsden 2006).
Additionally, it is also important to comment on the current state of the Army's image. Overall, the institution's eWOM suggests a negative reputation that may contribute to the continued difficulties of attracting and retaining human resources. To enhance that reputation, it is necessary to create organizational changes based on the positive and negative aspects mirrored in public opinion. Thus, our findings highlight the importance of developing policies aimed at increasing the Army's competitiveness in the labor market (e.g., improving the conditions offered) and improving the perception of its patriotic and homeland defense institutional values. In addition to the existence vulnerabilities, the Army should take advantage of its strengths in the Portuguese public opinion, for instance, advertising education opportunities and Military Occupational Specialties offered to military personnel.
Overall, our results and recommendations are in line with the MoD professionalization action plan. 5 However, they rely on evidence from public opinion and provide a new methodology capable of monitoring the impact of the action plan over time.

Other recommendations
For the other branches and the Armed Forces as a whole, our findings suggest that they could benefit from reputation management. In this sense-and considering the similarity between defense institution's problems and characteristicswe suggest the uniformization of communication channels and the development of a common strategy for improving the image of the Portuguese Armed Forces.
Additionally, our results during sentiment analysis and topic modeling suggest that the Armed Forces reputation would benefit from promoting public support to NATO membership and from publicly promoting military enlistment as a professional opportunity as opposed to the former mandatory military service.

Conclusion
The aim of this study was to develop a methodology capable of extracting relevant information about eWOM in social media, using text mining strategies and natural language processing. The result was a methodology that proposes a set of information sources in social media, has a process of semiautomatic information extraction and applies non-conventional techniques capable of obtaining relevant knowledge about eWOM and its sentiment polarity. The proposed solution is based on the public sector context, and it stands out for being innovative, cost free, partially automated, capable of accessing a large sample and of handling the problem of insufficient and sparse information available in social media in this type of organization where users do not often express their opinion online.
Our methodology was tested through the case of the Portuguese Army and replicated to other defense institutions. Our results supported the use of online communication to obtain institutional eWOM and demonstrated its ability to support decision-making through the information extracted from a set of information sources. Additionally, methodology's replicability was tested and confirmed the flexibility and systematization necessary to operate in different public institutions. Overall, our findings suggest that a negative reputation has been formed for the Armed Forces and some of its branches, which is congruent with the difficulty of obtaining military personnel in recent years.
According to our analysis, opinions shared online revealed that defense institutions are perceived negatively in terms of the cost-benefit ratio of public funds allocated to them and regarding the possibility of reintroducing compulsory military service. In addition, it was observed that public confidence declines when information incongruent with military values and mission is released, for example, when corruption schemes, poorly trained military exercises and a lack of response to the needs of the population are publicly exposed. Regarding the difference between institutions, we have been able to show that eWOM of the Armed Forces and the Army demonstrates a negative employer brand image, mainly related to the working conditions offered and lack of human resources. However, the analysis of positive opinions revealed that training and education opportunities are the main strengths of both the Army and the Air Force. In addition, the Army is also valued for the nature of the military profession, while the Air Force has been able to enhance its reputation by supporting civil society.
Considering the difficulty in controlling information shared online and the costs of reversing its negative effect, our methodology proved to be useful and applicable to eWOM monitoring in public sector institutions. Regarding the context of the Portuguese national defense, it has been shown that it can benefit from measures directed at the organizational functioning and its public image among the Portuguese citizens and, in particular, youth of recruitable age and their referrers (family and friends).
Although useful, the methodology here presented still presents some limitations. The main limitation of this study concerns the level of subjectivity involved in the tasks of selecting information, performing manual sentiment analysis and interpreting the results. This subjectivity should be controlled with the use of experts from the evaluated institution.
Taking these results into account, we believe it would be interesting to test the performance of our methodology in a public sector institution outside the context of Defense, allowing to gain more insight on its application. Future works could also include the analysis of posts and different comment levels (responses to the posts or previous comments) to gain a more detailed knowledge of the institution's reputation, and also, the development of deep learning approaches to improve automatic sentiment classification. The use of embeddings in general, and the most recent advances concerning contextual embeddings, may be a promising way of tackling irony and exploiting the real semantic context hidden in the context-specific terms found in the comments.