According to the reasons mentioned in Section 2.1.3 , and considering Table 1 , most of the retrieved reviews were not conducted systematically, their paper selection processes were unclear, and they did not propose any lucid classification in their papers. To the best of our scrutiny, only three SLRs have been conducted on this topic ( Sebei et al., 2018 , Al-Garadi, 2019 , Lerena et al., 2019 ) none of which has provided a complete systematic review to investigate SNA techniques, tools, strengths, weaknesses, open issues, evaluation parameters, and the application and critical role of big data in social networks. The two most similar efforts are in ( Ghani et al., 2018 ), which is a survey not an SLR; It only covers journal papers between 2011 and 2017 and excludes conferences, and ( Sebei et al., 2018 ), which is an SLR, covers the works between 2008 and 2018, but does not present evaluation parameters used in each studied paper. In ( Al-Garadi, 2019 ), researchers only examined cyber-attacks and security issues in social big data, which differed from our paper, and the time range of studied papers was not specified. Additionally, open issues were not specified in ( Lerena et al., 2019 ) and researchers in ( Al-Garadi, 2019 , Lerena et al., 2019 ) did not investigate the evaluation parameters and applied tools; therefore, writing an SLR that covers these weaknesses and highlights open issues and future research directions precisely is timely.
Researchers have conducted various studies on social networks and big data , their applications, and their challenges. In order to accomplish a comprehensive study of big data analytic approaches, this section presents an SLR method of big data analytic approaches in social networks. An SLR is a methodology to identify, classify, assess, and synthesize a comparative overview of the state-of-the-art in a specific subject ( Brereton et al., 2007 , Kitchenham et al., 2009 ). In contrast to other types of review papers, an SLR is a process of presenting a taxonomical review and performing a methodological analysis of the research literature to find the answers to problems and the given research questions related to specific research topics. The SLR has been used for the first time in medical fields ( Aznoli and Navimipour, 2017 ) and can be conducted in any field of study for an accurate understanding, reducing bias, and identifying open issues and future directions ( Rahimi et al., 2020 , Haghi Kashani et al., 2020 ). Since most review articles on big data analytic approaches in social networks were written in unstructured procedures, the purpose of this paper is to provide a rigorous process of the methodological steps for researching the literature in this scope.
In this systematic process, a three-phase guideline, namely planning , conducting , and documenting ( Brereton et al., 2007 ) is adopted, as depicted in Fig. 2 . The review is accompanied by an external evaluation of the outcome of each phase. We first identify the questions and the needs that are the motivation of this SLR in the planning phase. Then the articles in this subject are selected based on inclusion/exclusion criteria in the conducting phase. Ultimately, in the documenting phase, the observations are documented, and the results are analysed, compared, and visualized, which yields the answers to the research questions, then the final reports are represented. The three phases of the research methodology that are followed in this SLR are discussed below:
Overview of research methodology.
Planning begins with the determination of the research motivation for this SLR and finishes in a review protocol as follows:
Stage 1- Specifying the research motivation. According to the contribution of this SLR that is justified by comparing the available reviews explained in Section 2.2 , the motivation is specified at the first stage.
Stage 2- Defining research questions. In the second stage, according to the motivation of this paper, the research questions are defined that assists the development and validation of the review protocol. The research questions are stated below. By finding the answers to the questions, available gaps on this subject can be found, which can facilitate reaching new ideas in documenting phase.
Q1: What are the existing big data analytic approaches applied in social networks? Q2: What parameters do the researchers employ to evaluate the big data analytics in social networks? Q3: What are the tools used in social network analysis and big data areas? Q4: What are the social big data analysis applications in the studied papers? Q5: What are the datasets and case studies used in social big data analysis? Q6: What evaluation methods are applied to measure the big data analytic approaches in social networks? Q7: What are the challenges and future perspectives of big data analytic approaches in social networks? |
Stage 3- Determining the review protocol. According to the goals of this SLR, in the previous stage, the research questions and the review scope were identified to adjust search strings for literature extraction ( Brereton et al., 2007 ). Moreover, a protocol was developed by following ( Calero et al., 2013 ) and our previous experience with SLR ( Haghi Kashani et al., 2020 , Rahimi et al., 2020 ). To evaluate the defined protocol before its execution, we requested an external specialist for feedback, who was experienced in conducting SLRs in this era. His feedback was applied in the upgraded protocol. A pilot study (approximately 25%) of the included papers was performed to reduce the bias between researchers and to enhance the data extraction process. We also enhanced the review scope, search strategies, and inclusion/exclusion during the pilot stage.
The second phase of the research methodology is conducting, starting with paper selection, and culminating in data extraction. This section aims to represent the process of searching and selecting papers conducted in the second phase of the SLR. The process of selecting papers consists of a three-step guideline as depicted in Fig. 3 .
|
Inclusion/Exclusion criteria.
Studies that focus on social big data analytics | Having a clear picture of big data analytic approaches in social networks | |
Paper published online from 2013 to August 2020 | The results of classical and fundamental literature on this subject have been mentioned in recent papers | |
Short papers that are less than six pages | These studies do not provide us with enough information to be used in our research. | |
Surveys and review papers. | These studies do not offer any reasonable, significant, novel solutions, and information. | |
Unjudged papers or papers that are not in English | Because of not trusting the quality of the unjudged papers and not having a possibility to probe non-English papers, these papers were excluded. | |
Book chapters and theses | The result of book chapters or theses are mentioned in journal and conference papers |
Paper selection process.
As determined in Fig. 2 , in documenting phase, after documenting the observations, threats to validity and limitations are explored which is presented in Section 7 . Then the results are analysed, visualized, and reported in Section 5 .
In this section, 74 chosen papers are explored to examine social big data analysis objectives, techniques, and innovations; a review of the advantages and disadvantages of each approach is also presented. A taxonomy of the related literature is given in this paper, and the pictorial description of the proposed taxonomy for the reviewed papers is shown in Fig. 4 . Offering a taxonomy for social big data analysis is not a trivial and easy task. As researchers look at the problems in this area from various perspectives, each researcher performs this classification differently. By using this categorization, the reader can easily refer to each of these papers as a categorical reference. The selected papers use big data analytic techniques for analyzing social networks. These techniques are categorized into two major groups: Content-oriented approaches, and network-oriented approaches.
Taxonomy of social big data analysis.
Content-oriented approaches are classified into two subgroups, namely topical learning and opinion/sentiment learning. Topical learning can be performed in a single modal or a multimodal approach. Opinion/sentiment learning can be carried out in lexicon-based, learning-based, or hybrid approaches. Further, network-oriented approaches are classified into two groups: Embedding learning and community learning. Embedding learning has graph-based, non-graph based, and explanatory models, while, community learning is node-based or group-based. The papers relevant to content-oriented approaches and network-oriented approaches are reviewed in 4.1 , 4.2 , respectively. In this study, the methods of big data analysis on social networks are examined and evaluated with a list of important evaluation parameters. Further, the definition associated with evaluation parameters of the reviewed papers, as well as their formulas, is presented in Appendix A .
Nowadays, with the explosion of data in social networks that provides the researchers with a different type of contents instead of the traditional books and libraries, it is essential to analyse this immense volume of data. In this paper, the selected papers with topical learning and opinion/sentiment learning are reviewed in 4.1.1 , 4.1.2 , respectively. In 4.1.1 , 4.1.2 , classification of techniques, the definition of methods, and the related papers are discussed.
In content-oriented approaches, topical learning focuses on the communication contents of social networks, consisting of text mining, video content analysis, and image analysis. It is the process of analyzing various types of unstructured data, like images, audio and video files, or different types of text including word, PDF files, PowerPoint slides, posts of weblogs and social network sites, or semi-structured data such as XML, HTML, JSON, and CSV files with the purpose of uncovering underlying similarities and hidden associations and transforming them into structured data for further analysis. The topical learning may be either performed “single modal” or “multi-modal” in which a “single modal” collects and analyses one modality (text OR audio OR image OR video) whereas “multi-modal” analyse a combination of various types of datasets such as text, audio, image, and video. According to the reviewed papers, the comparison between the specification and evaluation parameters is illustrated in Tables 3 and and4 . 4 . Table 3 summarizes the main ideas, advantages, disadvantages, evaluation methods, tools, and case studies along with their categories related to the papers in this approach. Table 4 presents a side by side comparison of the evaluation parameters in papers related to topical learning approaches.
Reviewing and comparing papers with topical learning approaches.
Category | Ref. | Main ideas | Advantages | Disadvantages | Evaluation methods | Tools | Case studies |
---|---|---|---|---|---|---|---|
Single modal | ( ) | Creating a linear network autocorrelation model | Real test bed | MySQL database, RMySQL, R studio, R programming language | The proed forum on www.reddit.com | ||
( ) | Proposing a novel logic and flexible TS_u_Datalog | Example application | Not mentioned | Not mentioned | |||
( ) | Presenting spatio-temporal big data analysis to detect real-time behavioral patterns during the flu season | Real test bed | Hadoop,Big R released by IBM,Sqoop,Apache Flume | Twitter,Cerner HealthFacts data warehouse | |||
( ) | Presenting a password creation and validation system for social media platforms | Real test bed | C#, SQL Server 2014 | ||||
( ) | Proposing a new early warning system for adverse drug reactions | Data sets | Not mentioned | the online health community, MedHelp | |||
( ) | Proposing a recommendation system using big data of user-shared images in social media | Simulation | Matlab | Skyrock,Sina Weibo,Flickr | |||
( ) | Presenting a multiclass classification to reveal mental disorders by investigating people’s posts on Reddit website | Data sets | Python, Scikit-learn library | Reddit website | |||
( ) | Presenting an algorithmic model employing social media analytics and statistical machine learning to predict cyber risks | Data sets | MySQL, Rweka package, RStudio (R Statistical software) | ||||
( ) | Applying artificial neural networks and deep learning to predict Facebook posts | Data sets | Not mentioned | ||||
( ) | Analyzing Turkish news on Twitter with Apache Spark | Data sets | Python,Apache Spark | ||||
( ) | Presenting a framework for trend detection in social networks | Real test bed | Hadoop,Apache Drill,Apache Storm | ||||
( ) | A novel face recognition framework in social networks based on ML | Data sets | Apache Giraph,Apache Hive | ||||
( ) | Presenting a hybrid content-based cyberbullying detection model based on the metaheuristic approach in social networks | Data sets | Python | Twitter, ASKfm, FormSpring | |||
( ) | Applying genetic algorithm in clustering social big data | Data sets | Hadoop, Java, Mahout | ||||
( ) | Offering a framework for analyzing the video transcoding based on cloud | Real test bed | Hadoop, NoSQL, Amazon S (Amazon cloud storage provider),CLEVER (Cloud-Enabled Virtual Environment) | Not mentioned | |||
( ) | Presenting a traffic event detection tool | Data sets | Apache Spark,MongoDB, Python | ||||
Multimodal | ( ) | Introducing a private video recommendation system based on cloud and online learning | Simulation | Not mentioned | Sina microblog,Youku (video sharing site) | ||
( ) | Proposing a content-centric networking architecture based on Monte Carlo Tree Search | Simulation | Not mentioned | Sina Weibo | |||
( ) | Presenting a Facebook fake profile detection framework | Data sets | Weka | ||||
( ) | Presenting a multi-modal microblog emotion analyzer based on deep learning | Data sets | Not mentioned | Sina Weibo |
An overview of the evaluation parameters in papers with topical learning approaches.
Category | Ref. | Centrality Measures | Security | Accuracy | Precision | Recall | F-measure | Scalability | Time | Cost | ROC (AUC) | Specificity | Matthews correlation coefficient |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Single modal | ( ) | ✓ | |||||||||||
( ) | ✓ | ✓ | |||||||||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||
( ) | ✓ | ✓ | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||
( ) | ✓ | ✓ | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ✓ | ||||||||||
( ) | ✓ | ✓ | |||||||||||
( ) | ✓ | ✓ | |||||||||||
( ) | ✓ | ✓ | ✓ | ||||||||||
( ) | ✓ | ||||||||||||
( ) | ✓ | ✓ | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||
( ) | ✓ | ||||||||||||
( ) | ✓ | ✓ | ✓ | ✓ | |||||||||
Multi modal | ( ) | ✓ | ✓ | ||||||||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
( ) | ✓ | ✓ | ✓ |
In order to investigate the effects of social media on Eating Disorders (ED), Moessner et al. ( 2018 ) applied texts, linguistics, and lexical analysis with an unsupervised, bottom-up method to identify harmful posts. They did not investigate social media data in real-time, otherwise, the safety of ED-related communication could have been improved. Further, to execute the balance policies in the business application of social networks, Huo et al. ( 2018 ) presented a new logic Datalog. TS_u_Datalog was presented as the most appropriate logic Datalogs and a new programming language with both Active_U_Datalog and Distributed Temporal Logic (DTL) was introduced to implement contractual policies in a dynamic social media. The results of the time evaluation parameter of TS_u_Datalog could have been improved and used for blockchain systems, privacy-preserving of smartphones, and as a fault tolerance technique for wireless sensor networks.
To enhance health monitoring systems to detect infectious disease and to take preventive actions, Zadeh et al. ( 2019 ) presented a spatio-temporal platform to check out whether social posts could discover flu outbreaks in a particular area during the flu season. As some people do not activate their GPS or do not express their geographic locations in a social network profile, the geographic analysis cannot be done more deeply and accurately. More efficient ML techniques were needed to perform more in-depth analysis and to identify noise and unrelated social network posts. To recognize all repetitive and non-repetitive substring in passwords, Xylogiannopoulos et al. ( 2020 ) designed an efficient pattern detection system that can be embedded in social network platforms to generate a more robust and valid password. The results indicated that, contrary to common belief, long passwords are not safe, but passwords that are a combination of small/capital numbers and symbols are stronger than the others. This methodology did not have a limitation on the length and the type of characters. However, the proposed system could have been tested on other datasets, leading to different results.
In order to prevent the death caused by Adverse Drug Reactions (ADRs), Yang et al. ( 2015 ) used text classification to propose an automated framework to filter ADR related posts. A supervised learning method was applied to classify the extracted posts into positive/negative examples. The results of classification were used as an input to build an early warning system to prohibit future ADRs. Although the presented method generally outperformed in precision, recall, and F-measure, they did not extend their framework for various types of drugs. Furthermore, Cheung et al. ( 2015 ) presented a connection discovery system for follower/followee recommendations instead of user-generated tags and social graphs. They used Bag-of-Features Tagging (BoFT) to label user-shared images with BoFT labels, and a computer vision approach was employed to model the characteristics of user-shared images. In addition to the identification of user’s gender in the proposed system, the image classification performance was higher than K-mean, and there was no need to know K (the number of clusters in the clustering) in advance. However, the runtime of clustering and feature extraction was high. Subsequently, for more users and user-shared images, a big data system is required to manage and discover data.
Furthermore, to identify mental disorders in advance, Thorstad and Wolff ( 2019 ) scrutinized people’s every day mental and non-mental health topic posts on Reddit website. The outcome of the accuracy assessment indicated that people’s posts on clinical and non-clinical subreddits were highly and moderate predictive of mental disease, respectively. Also, it revealed that the predictions were more precise on recent past posts compared to distant past posts. The limitation was that posting a clinical post may not be a significant criterion for early diagnosis of psychological disorders, as some people may be affected by mental illnesses before posting. Besides, to identify vulnerabilities, Subroto and Apriyana ( 2019 ) offered an algorithmic model applying social media analytics and ML algorithms to protect cyber-attacks. Despite the highest accuracy of the model created by artificial neural networks, it was not scalable, having hardware limitations, and was tested only on a small sample of Twitter dataset, but the authors claimed that it did not affect the accuracy of the model.
Moreover, many other studies adopted clustering and ML algorithms in text mining and trending topics on big data of social platforms ( Straton et al., 2017 , Makaroğlu et al., 2019 , Vakali et al., 2016 , Aa et al., 2015 ). Also, researchers in ( Singh and Kaur, 2019 , Sachar and Khullar, 2017 ) proposed hybrid models by applying a metaheuristic approach to enhance the classification performance in the content analysis of social big data. Nowadays, as millions of users produce and share videos in various social media, Panarello et al. ( 2020 ) developed a framework for video transcoding processing in a short time. They applied Hadoop in their cloud federation framework to transcode videos to be compatible with sharing of users with different hardware/software devices. The evaluation results on real testbed demonstrated performance enhancement in terms of speed, scalability, and transcoding time, but security and privacy issues were neglected.
Alomari et al. ( 2020 ) developed a methodology based on text mining by using big data technologies for road traffic detection from Arabic tweets. The authors applied three machine learning algorithms, namely Logistic Regression, Support Vector Machine, and Naïve Bayes for classifying eight types of events. The evaluation results showed enhancement in text processing, leading to more accurate event detection with no prior knowledge about those events. However, this methodology could also be used to identify events other than road transportation. They did not focus on improving scalability and data management of the proposed method.
Zhou et al. ( 2016 ) proposed a private video recommendation system based on distributed online learning. Multimedia such as images, audios, and videos produced by users were sent and stored in remote and decentralized data centers. The user’s context vectors were extracted by BOFT (bag-of-features tagging) and converted into distributed video service servers. At last, the recommended video was transferred to multimedia applications in online social networks. The evaluation results on real datasets in Sina microblog and Youku, a video sharing site (VSS) in China, achieved sublinear regret bound and established a trade-off between the performance loss and the privacy protection level. However, for simplicity, a small dataset was chosen in those social networks, so it suffered from low scalability. In another study, Feng et al. ( 2018 ) proposed a Content-Centric Networking (CCN) architecture based on the Monte Carlo Tree Search (MCTS) algorithm. Since the volume and variety of both users and contents are rapidly growing, the MCTS algorithm solved the accurate content push problem in big data. Their algorithm outperformed in the experimental results of push accuracy, scalability, and robustness of users’ arrivals in Sina Weibo on an offline dataset. Although the proposed architecture could evaluate the performance in a real-world CCN-based social media, energy efficiency was neglected.
Sahoo and Gupta ( 2020 ) proposed a framework to distinguish fake profiles on Facebook. The authors applied various ML algorithms along with content analysis and account-based features to detect suspicious accounts from genuine ones. The evaluation results indicated that the presented framework gave the best outcome in terms of accuracy, precision, recall, F-measure, and Matthews’s correlation coefficient, but they did not evaluate the responding time of the presented approach. Moreover, applying this approach on other platforms such as Twitter and Google + or adding an aggregator module for comparing various account features and their activities may lead to different results. Since various microblogs contain videos, emoticons, and pictures as well as texts, Zhang et al. ( 2019 ) proposed a multi-modal emotion analyzer based on deep learning. The authors applied a two-way Long and Short Term Memory network (LSTM) model to integrate contents and user’s features. The offered model attained a higher accuracy, precision, and F-measure compared to previous models, but users’ personalities were not considered and in the proposed model, user-based emotions could not be classified.
In this section, the selected papers with opinion/sentiment learning approaches are reviewed. Opinion/sentiment learning approaches entail Natural Language Processing (NLP) to extract opinions from the text and classify the polarity of subjects into positive, negative, or neutral to determine what they are talking about and to identify the public group perception. With the help of sentiment analysis, opinions about products, services, brands, politics, or any topic that people care about are extracted. These data can be used in many applications like marketing analysis, product reviews and feedback, emotion detection, intent analysis, customer support and services, social media monitoring, and brand monitoring ( Shirdastian et al., 2019 ).
By reviewing papers relevant to opinion/sentiment learning, we recognized three methods, namely lexicon-based, learning-based, and hybrid approaches, employed to extract and analyse opinion/sentiment in social media contents. In lexicon-based approaches, a set of predefined lexical wordlist, corpus, and dictionaries are used to extract subjectivity, the orientation, and the polarity of opinions and sentiments. Learning-based approaches utilize various ML algorithms (supervised or unsupervised) to classify text into positive or negative classes. Moreover, some of the reviewed papers combine both learning-based and lexicon-based approaches that mentioned hybrid approaches. Table 5 depicts a comparison of the selected papers with opinion/sentiment learning approaches. It includes main ideas, advantages, disadvantages, evaluation methods, tools, and case studies along with their categories. In some studies, the applied tools for analyzing and implementing approaches have not been mentioned. Table 6 shows the parameters used by papers relevant to opinion/sentiment learning approaches to evaluate the intended methods.
Reviewing and comparing papers with opinion/sentiment learning approaches.
Category | Ref. | Main ideas | Advantages | Disadvantages | Evaluation methods | Tools | Case studies |
---|---|---|---|---|---|---|---|
Lexicon-based | ( ) | Presenting a fake review detection framework for sentiment analysis of social networks | Data sets | R language,Set of NLP tools such as Stanford CoreNLP, OpenNR, Tidytext, Afinn sentiment lexicon | Amazon website | ||
( ) | Introducing a sentiment computing method based on the social media big data | Data sets | Not mentioned | Sina microblog | |||
Learning-based | ( ) | Introducing a data integration approach based on calibration | Data sets | GeNIe Software V 2.1,R package ROSE | San Francisco international airport passengers dataset, Skytrax dataset | ||
( ) | Proposing a two-stage big data and ML framework to analyse social media content | Data sets | Spark, Python, MySQL, Natural Language Toolkit (NLK & Pandas package) | Tourism data from Yelp dataset (Yelp.com) | |||
( ) | Analyzing the opinions of users on COVID-19 epidemic on microblog | Data sets | Python | Sina Weibo | |||
( ) | Presenting an ML model to analyse the tweets of English national team fans during 2018 FIFA world cup | Data sets | Python | ||||
( ) | Offering a framework to explore brand validity sentiments | Data sets | Python | ||||
( ) | Presenting a real-time processing system to analyse stock market tweets | Data sets | Apache Spark, Apache Kafka | ||||
( ) | Presenting a metaheuristic approach in sentiment analysis of tweets | Data sets | Apache Spark | ||||
Hybrid | ( ) | Presenting a framework to examine the relationship between volatility in the stock markets and UGCs | Data sets | Matlab | |||
( ) | Offering a new methodology for sentiment analysis to explore the impact of social sensing on weather events | Data sets | Python | ||||
( ) | Introducing a distributed and parallel parsing system on the MapReduce framework | Data sets | Hadoop, Java | KISTI, NDSL | |||
( ) | Analyzing tweets regarding vaccination sentiments and their trends in Twitter | Real test bed | Hadoop, Mahout | ||||
( ) | Mining tweets that contain reporting on drug side effects | Data sets | Apache Spark’s ML library (MLlib), Python | ||||
( ) | Presenting a sentiment analysis framework by applying ML techniques | Data sets | Apache Spark’s ML library (MLlib) | ||||
( ) | Designing a microblog abnormal emotion detection model based on the neural network and CNN-LSTM | (Improving the threshold selection which is a time-consuming process) | Data sets | Not mentioned | Sina Weibo | ||
( ) | Presenting a mechanism to gather and to envision social media information for big data | Data sets | Apache Flume, Hadoop, Java platform | Twitter,Facebook,Amazon dataset,Kaggle dataset | |||
( ) | Proposing a topic classification and sentiment analysis framework | Data sets | Hadoop platform,Apache Flume,Apache Hive |
An overview of the evaluation parameters in papers with opinion/sentiment learning approaches.
Category | Ref. | Accuracy | Precision | Recall | F-measure | Scalability | Time | Cost |
---|---|---|---|---|---|---|---|---|
Lexicon-based | ( ) | ✓ | ✓ | ✓ | ||||
( ) | ✓ | |||||||
Learning-based | ( ) | ✓ | ||||||
( ) | ✓ | ✓ | ✓ | ✓ | ||||
( ) | ✓ | |||||||
( ) | ✓ | ✓ | ✓ | ✓ | ||||
( ) | ✓ | ✓ | ||||||
( ) | ✓ | ✓ | ||||||
( ) | ✓ | ✓ | ||||||
Hybrid | ( ) | ✓ | ||||||
( ) | ✓ | ✓ | ||||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
( ) | ✓ | |||||||
( ) | ✓ | ✓ | ✓ | ✓ | ||||
( ) | ✓ | |||||||
( ) | ✓ | ✓ | ✓ | |||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ |
Kauffmann et al. ( 2019 ) offered a modular framework for qualitative interpretation of UGC by employing NLP techniques and applying cosine similarity measures to recognize fake reviews. Their Fake Review Detections Framework (FRDF) utilized NLP techniques to discover similarities between reviews and eliminate fake and unreliable reviews of a product. The major weakness was that FRDF set a threshold in the cosine similarity measure to detect fake reviews; other thresholds or other sentiment analysis tools, except for lexicon Afinn, may produce different outcomes. Furthermore, Jiang et al. ( 2017 ) suggested a method for performing sentiment computing of the news event in social big data. First, a Word Emotion Association Network (WEAN) was constructed to compute both word and text emotions at a specific time. After dividing emotions, a questionnaire was designed to collect ideas about the six-dimensional sentiment emotion of emoticons, and emoticons were used to calculate the emotions of each sentence. Second, based on WEAN, a word emotion computation algorithm was presented to get the primary word emotions. Then an emotional refinement algorithm was offered by employing the standard emotional thesaurus to improve the sentiments of news with high accuracy, but emotion distance and word emotion patterns were not considered into text sentiment computations.
Moreover, Dalla Valle and Kenett ( 2018 ) presented a new approach to integrate online review data with customer survey data. The sentiments of online users were calibrated with customer surveys by resampling and merging data via Bayesian networks in their method. This approach was used in various areas, and the data integration between online blogs and customer satisfaction led to enhancement in sentiment analysis. However, it did not consider methods for integrating vast data sources to enhance the accuracy of results. In addition, Jimenez-Marquez et al. ( 2019 ) presented a two-stage framework to analyse UGC in social media. The first stage, which aimed at managing big data and processing UGC, built a Machine Learning Model (MLM). The second stage, which took MLM of stage one, involved a series of layers to build a big data architecture that analysed unstructured and heterogeneous data. The proposed framework was superior to its competitors in both quantitative and qualitative analysis. Despite high accuracy, better results may be obtained by applying the integration of advanced ML algorithms on different domains.
Despite the advancement and development of medical science, COVID-19 is the most perilous disease of the 21st century around the world, which is a critical threat to the physical and mental health of individuals. In this respect, Zhu et al. ( 2020 ) analysed the topics about COVID-19 in Weibo from January 24 to February 25, 2020. The authors tried to grasp the opinions of users about the epidemic from a temporal and spatial perspective in China. However, the study had some drawbacks. The spatial perspective of opinion analysis was limited to a provincial region. The age and gender of Weibo users were not considered, so they were not reflected in the analysis results. Moreover, since some users did not apply Sina Weibo to express their opinions, the result cannot be generalized. Thus, employing a high volume of data may lead to more predictive and accurate opinion analysis for relevant organizations in emergency conditions.
Fan et al. ( 2020 ) introduced a novel method for exploring real-time sentiments, team identification, and national identification of tweets during the 2018 FIFA world cup. The authors observed how the sentiments of fans’ tweets in two matches (England vs. Croatia and England vs. Colombia) fluctuated during the match. They applied python and ensemble methods not only to design a model with high accuracy for sentiment analysis at different temporal points during the match, but also to analyse emojis as well as their valence. However, since 4% of the collected tweets were in Spanish and Croatian and the ML approaches cannot perform properly on Multilingual datasets, their method had low reliability. Moreover, they only analysed two English competitions, other international matches or other countries were not considered. Finally, all ML techniques did not have the ability to analyse the available sarcasm in tweets, so the results attained a low level of precision, recall, and F-measure.
Shirdastian et al. ( 2019 ) presented a framework to explore brand validity and their sentiment polarity both qualitatively and quantitatively. The authors explored opinion and sentiment polarity towards brand validity on Twitter dataset in terms of uniqueness, heritage, quality commitment, and symbolism. The study results indicate the enhancement of the proposed framework in precision and accuracy to find out the brand authenticity by exploring the related brand sentiments. The main drawback of this study was that neither was the variation of sentiments over time explored, and nor was the sentiment mining of bot-created brands excluded. Sayed et al. ( 2020 ) presented a hybrid approach that applied a combination of ML and lexicon techniques for sentiment analysis of tweets. The authors suggested a new metaheuristic approach based on Particle Swarm Optimization (PSO) and K-means to optimize data clustering. They evaluated their approach on four Twitter datasets with various topics employing spark streaming, leading to better accuracy in real-time analytics compared to previous approaches, but deep learning methods probably may lead to more accurate predictions.
To examine the relationship between volatility in the stock markets and UGCs, van Dieijen et al. ( 2020 ) presented a framework through the use of multivariate regression analysis and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model. The results showed the asymmetric impact of UGC on volatility, which means negative comments, compared to positive ones, increased volatility and had a significant effect on customers. For future research, scaling up may lead to practical implementation. Spruce et al. ( 2020 ) presented a new methodology for exploring the impact of social sensing and social data sentiment analysis of real-world events on named storms in the United Kingdom and Ireland. The authors collected tweets posted in winters 2017 and 2018. Then time zone, bot, and weather-related filters were applied to extract data related to weather incidents. By analyzing the sentiments of tweets during extreme climate events, the effects of weather incidents and their social impacts in terms of physical, emotional, spatial, and temporal perspectives were revealed and enhanced. The main limitation of this study was low scalability due to the small number of tweets retained in filtering weather-related tweets after the collecting phase. Further, the results were somewhat unreliable due to applying the python’s sentiment analysis package (TextBlob) which has a training corpus based on movie review datasets.
Um et al. ( 2013 ) introduced a distributed and parallel parsing system based on MapReduce to analyse users’ sentences in social sensor networks. To conduct the study, a Stanford parser with loose coupling was applied, which led to high scalability. Due to the parallel environment, the parsing time was low, the proposed system had high precision and high portability. The main limitation was that the actual data of social sensor networks like Twitter was not considered, and technical sentences were not analysed in the same way as ordinary users’ phrases were. Moreover, researchers in ( Baltas et al., 2016 , Lee and Paik, 2017 , Moise, 2016 ) employed ML along with NLP for opinion and polarity mining of social big data in sentiment analysis that were applied for various decision-making purposes including marketing or health care issues like reporting drug side effects. In order to conduct sentiment analysis on a microblog big data platform, Sun et al. ( 2018 ) presented a model called Convolutional Neural Network-Long-Short Term Memory (CNN-LSTM). Each type of emotion was modeled through a Single Gaussian Model (SGM). The authors used CNN for extracting local attributes and LSTM as a global attribute extractor. The findings indicated that the sentiment of social language performed through CNN-LSTM model achieved high accuracy, but time was neglected in their model, and threshold selection was still taking too much time.
Also, BalaAnand et al. ( 2019 ) presented a mechanism to collect contents from social media by utilizing big sheets, big vision schemes, and sentiment assessment. In addition to Deep Learning Modified Neural Network (DMNN), which was used to investigate sentiments, the Modified Threshold-based Cuckoo Search Algorithm (MTCSA) was applied as a heuristic search algorithm for weight optimization. The experimental results revealed that the proposed Deep MNN outperformed in terms of reliability, robustness, scalability, accuracy, precision, recall, F-measure, and computational time in comparison with other algorithms, but the cost of the proposed method was not assessed. For topic classification and sentiment analysis of social big data, Rodrigues and Chiplunkar ( 2019 ) presented a distributed Hadoop framework. Additionally, the Bag-of-words method was used to classify the relevant tweets into six different groups. Then four various NLP methods, namely Lexicon uni-gram, bi-gram Lexicon, uni-gram NB, bi-gram NB, and Hybrid Lexicon-Naive Bayesian Classifier (HL-NBC), were employed. HL-NBC was more effective and outperformed other classifiers in terms of accuracy, execution, and response time. However, separating and classifying sarcastic sentences and cross-lingual opinions for sentiment analysis were still unsolved challenges.
Network-oriented approaches analyse big social data based on nodes or entities and their relations within social networks. Network-oriented approaches are classified into two groups: Embedding learning and community learning. We review the selected papers with embedding learning and community learning approaches in 4.2.1 , 4.2.2 , respectively. In 4.2.1 , 4.2.2 , the classification of techniques, the definition of methods, and the related papers are discussed.
Some of the reviewed papers presented embedding learning that focused on extracting valuable information about users and nodes inside a network for link prediction, influence analysis, and information diffusion in social networks. Social influence means an individual’s ability to influence another user in a network; the more influential a person is, the more followers he will have ( Kumaran and Chitrakala, 2017 ). The embedding learning approach aims to analyse a network based on users and their features and model the process of information diffusion on online social networks through learning user’s characteristics and dissemination of information among users. Embedding learning approaches try to find the influence of different nodes in a network by identifying the position of a node in a path or a number of paths in which it occurs; the node that is most often in the center of a network and has more paths is more influential.
In the aspect of predicting the underlying diffusion process, three categories are distinguished in embedding learning approaches: Graph-based, non-graph based, and explanatory. Graph-based and non-graph based are kinds of predictive models in which, by investigating the previous information propagation, the information dissemination is predicted from spatial or/and temporal points of view. Graph-based approaches focus on the static and graphical structure of a network in which information is transmitted and predicts who influences whom. In this approach, each node can be activated or deactivated, such as Independent Cascades (IC) and Linear Threshold (LT), while in non-graph based approaches, the topology and structure of a network are not taken into account and each node is randomly connected to other nodes in the network with an equal probability such as epidemic models, Linear Influence Model (LIM) and Partial Differential Equations (PDEs). The main goal of explanatory models is to infer the information propagation path and to show how the information is propagated in social networks. Propagation characteristics such as pairwise transmission rate, pairwise transmission probability, and cascade properties are explored in this model whereas the network in which information diffusion takes place is unknown.
This section presents the selected papers with embedding learning approaches. In addition, the selected papers that use this approach in social big data analysis are reviewed. Finally, they are compared and summarized in Table 7 , Table 8 . Table 7 compares them in terms of main ideas, advantages, disadvantages, evaluation methods, tools, and case studies along with their categories. In some studies, the applied tools for analyzing and implementing the intended approach were not mentioned. The evaluation parameters are also specified in Table 8 .
Reviewing and comparing papers with embedding learning approaches.
Category | Ref. | Main ideas | Advantages | Disadvantages | Evaluation methods | Tools | Case studies |
---|---|---|---|---|---|---|---|
Graph-based | ( ) | Introducing a social influence rank-based determination method on big data streams in online social networks | Data sets | Python,Hadoop, MongoDB | |||
( ) | Introducing an influence maximization and diffusion algorithm | Real test bed | Apache Storm, Apache Spark, Microsoft Azure HDInsight | Yahoo Flickr Creative Commons 100 Million (YFCC100M) | |||
( ) | Presenting an information-dependent embedding based diffusion prediction model | Real test bed | Not mentioned | Digg,Meme tracker,GOOGLE + | |||
( ) | Introducing a network-based model to predict disease activity across geographical locations | Real test bed | Not mentioned | ||||
( ) | Presenting a heuristic approach to maximize influence in social networks | Simulation | Not mentioned | Political blogs, Netscience dataset | |||
( ) | Proposing a heuristic model for minimizing viral marketing costs in social networks | Simulation | Python | Facebook, Epinions | |||
( ) | Presenting a topic-aware influence maximization model based on cloud computing | Simulation | Not mentioned | NetHEPT,Epinions,DBLP,LiveJournal,Friendster | |||
Non-graph based | ( ) | Offering a protection and recovery model, examining the influential users, and studying virus propagation | Simulation | Matlab | Undirected network BlogCatalog and directed network As-level network | ||
( ) | Proposing an algorithm and calculation model for searching the relationship between nodes, big data, and small data | Simulation | Not mentioned | Population map of Beijing city in China | |||
( ) | Presenting mobile nodes to explore and limit the spread of rumors in social networks | Simulation | C# | ||||
( ) | Introducing an immunization framework for mobile social networks | Simulation | C#Simulator | Largest Cellular Network in China | |||
Explanatory | ( ) | Proposing a financial credit scoring model which uses mobile phone data and social network analytics | Data sets | Not mentioned | CDR data of cell phone numbers and the data bank of customers that both operate in the same country | ||
( ) | Proposing mathematical models to compute the probability of staying in social network and FIAEC | Real test bed | Not mentioned | ||||
( ) | Introducing a framework to deliver mobile social data over content-centric mobile social networks | Simulation | Not mentioned | Not mentioned | |||
( ) | Recognizing the influential user on Twitter by applying the number of followers and friends | Real test bed | R, Hadoop, Python | ||||
( ) | Analyzing real-world device-to-device datasets in mobile social networks | Real test bed | Apache Spark, Apache Kafka, Hadoop | Not mentioned | |||
( ) | Analyzing the impact of various sampling approach on the influence diffusion on social big data | Real test bed | Not mentioned | ||||
( ) | Suggesting a depression detection framework by applying ML techniques | Data sets | Apache Spark, R programming language | ||||
( ) | Proposing two social network measures of communicative activities to characterize information diffusion | Real test bed | Gephi (Network analysis software) | Twitter discussion of TTIP in Europe |
An overview of the evaluation parameters in papers with embedding learning approaches.
Category | Ref. | Accuracy | Precision | Recall | F-measure | Scalability | Time | Cost | Influence Diffusion | ROC (AUC) | Kappa | Security |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Graph-based | ( ) | ✓ | ✓ | ✓ | ✓ | |||||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
( ) | ✓ | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ||||||||||
( ) | ✓ | |||||||||||
( ) | ✓ | ✓ | ||||||||||
( ) | ✓ | ✓ | ✓ | |||||||||
Non-graph based | ( ) | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ||||||||||
( ) | ✓ | |||||||||||
( ) | ✓ | ✓ | ||||||||||
Explanatory | ( ) | ✓ | ||||||||||
( ) | ✓ | |||||||||||
( ) | ✓ | |||||||||||
( ) | ✓ | |||||||||||
( ) | ✓ | ✓ | ||||||||||
( ) | ✓ | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
( ) | ✓ |
Kumaran and Chitrakala ( 2017 ) offered a social influence method based on rank-sampling approach. After collecting Twitter’s data, parallel information diffusion modelling, which took the users’ queries as input, determined forwarding nodes and calculated the path of information flow. The next portion was influential spreader ranking, which took a search query and applied topological and users’ attributes to calculate users’ feature scores. At last, two solutions were provided for an influence maximization problem. Ranking-based sampling, MapReduce, and parallel processing were applied to ensure accuracy and time reduction, respectively. Despite scalability, the sample size was considered fixed, so an approach that could define the most appropriate sample size was needed to be performed.
In another research, Persico et al. ( 2018 ) analysed the efficiency of two big data architectures, namely Lambda and Kappa. Although the size of the dataset affects the performance, both architectures provided good scalability, but in case of increasing input size, Lambda had higher performance than Kappa due to its in-memory computation. Findings indicated that the deployment for Kappa with the same number of executors was more expensive than Lambda. Besides, in both architectures, the performance was improved when the algorithm was executed on more massive clusters. In case of virtual machines (VMs) characteristic enhancement (or with resource-richer nodes), Kappa significantly improved the performance (vertical scaling). In general, reports showed that Lambda performed better, and both architectures supported social network applications properly. To predict information diffusion in the content of social big data, Gao et al. ( 2017 ) offered an efficient Information-dependent Embedding Based Diffusion Prediction (IEDP) model. They also extended a typical margin-based optimization algorithm and presented an efficient learning algorithm based on Stochastic Gradient Descent (SGD). The complexity of the proposed model was significantly reduced, but the social structure was not considered in their proposed embedding model.
Additionally, for illness control and prediction in advance, Elkin et al. ( 2017 ) introduced a network-based approach for modeling illness activity and generated predictions about ILI based (Influenza-Like Illnesses) across geographical locations. This prediction model could help with illness control and provided predictions for one week in advance. Meanwhile, it was unsuccessful with airline traffic data in predicting ILI activity across geographies and had a low level of scalability, and except for geographical locations, other factors such as weather patterns or low population density were not considered. By discovering more factors, the model could have been stronger. Moreover, a heuristic model called PRDiscount was proposed in ( Wang et al., 2014 ) to select the first seeds for maximizing the influence diffusion in social networks. On the contrary, Talukder and Hong ( 2019 ) introduced a heuristic mixed approach to minimize and optimize viral marketing costs in social media.
Since nowadays social networks have a great impact on the dissemination of information and users’ comments and on individuals’ daily lives, Chen et al. ( 2020 ) suggested a topic-aware influence maximization model based on cloud computing. They employed a sketching technique along with a greedy algorithm to discover the optimal top-k seed users that maximize the influence of information being spread within a network. Compared with available influence maximization approaches, the proposed approach achieved low running time and low storage, but a limited number of evaluation parameters were applied to verify the accuracy of the model.
Moreover, to discover the influential users, Wu et al. ( 2020 ) offered a Protection and Recovery Strategy model (PRS) to study the propagation of the virus in social networks. In the proposed mechanism, the users were divided into five groups based on their reactions to the virus: Susceptible, Contagious, Doubt, Immune, and Recoverable (SCDIR). The PRS model made it possible to control viruses and to reduce infected users. Despite the low running time and low cost of the model, a fixed number of nodes and connections were assumed; the dynamic changes in a number of nodes and their connections may lead to different results. Wu et al. ( 2018 ) suggested a model to search small data and to compute the effect of small data nodes to use them instead of big data. They believed that obtaining small data leads to a reduction in the complexity of big data. Results showed that 1% of small data could connect 15% of communication nodes, and 20% of small data could broadcast 80% of data packets, so the other nodes were in waiting status. Although complexity was decreased and the delivery ratio was improved, a new algorithm was needed to establish a trade-off between reliability, delivery ratio, delay, and the use of limited network resources.
Wu et al. ( 2018 ) presented a developed model to recognize and restrict the process of rumor dissemination among users by considering all the users’ behaviors. A time threshold was dedicated to each user to indicate the delays in users’ reactions. The authors suggested a mobile node to propagate authorized information to decrease the penetration of rumors. They simulated the proposed model on the Facebook dataset to investigate the influence of speed, arrival time, and strategies of the mobile node on rumors. The speed and the strategy of mobile nodes could not reduce the spread time point of rumors earlier, but in general, it reduced the spread time of rumor; therefore, the best solution to detect rumors is to send mobile nodes to neighbor nodes with the highest degree.
Furthermore, to prevent the spread of malwares, Peng et al. ( 2017 ) presented a big data-based framework in which social interactions were transformed into a bidirectional weighted graph that displayed people’s daily SMSs/MMSs. Moreover, social influence, involving direct and indirect influence, was measured. Then a set of immunization algorithms were designed, and the Susceptible Infectious Recovery (SIR) model was developed because the top k influential nodes had more influence on the distribution of malware propagation. Thus, based on the presented immunization strategy, the top k influential nodes were minimized; meanwhile, it did not detect social media malware in real-time.
In order to improve the statistical and economic performance of credit scoring applications both, Óskarsdóttir et al. ( 2019 ) employed personalized Page Rank (PR) and SPreading Activation (SPA) methods on Call-Detail Records (CDR), credit and debit account information. The results showed that the features of calling behavior were most effective, and the information extracted from CDR data in terms of “value” facilitated financial prediction. The major challenge was how to maintain privacy-preserving of customer’s data. Moreover, only one type of credit was analysed; other types of credits may lead to different results.
Furthermore, Raj and Babu ( 2015 ) proposed Firefly Inspired Algorithm for Establishing Connections (FIAEC) and mathematical models for computing the probability of staying in social networks. The goal of this algorithm was to maximize the number of connections concerning n individual in social network sites. By using the proposed algorithm, the number of connections was increased, and so did the interaction between connections. On the other hand, FIAEC was not scalable, and it was only tested for a sample size of 10,200 and 600.
Su et al. ( 2016 ) studied the characteristics of mobile big data and presented a new framework to spread these data over content-centric Mobile Social Networks (MSNs). To resolve volume, variety, control, and manage mobile big data challenges, the framework was delivered over CCNs. Findings showed that a low value of weight coefficient for a data packet led to a low delay. As their proposed framework was based on static characteristics, it did not consider dynamic mobile social users and was tested on a limited number of users, so it was not scalable. The limited resource allocation, such as bandwidth and buffer space, was not considered, and security was not maintained for the data stored out of their own mobile devices. In addition, to recognize the influential users, Kumar et al. ( 2016 ) developed a methodology by applying the number of friends and followers of accounts. In another study, Zhang et al. ( 2017 ) analysed an offline device-to-device dataset in mobile social big data and pushed interesting contents to the most influential users.
Besides, Xu et al. ( 2015 ) investigated the impact of various sampling approaches on the distribution of tweets and measured retweets to identify the influence diffusion in social network analysis. Since a notable amount of data in social networks are related to people who declare their opinions and thoughts, Yang et al. ( 2020 ) offered a social big data analysis framework to diagnose depression efficiently. The authors applied a large Facebook dataset to evaluate the proposed framework by investigating the effect of both friendship influence and users’ intentions and interactions on users’ mental health. They evaluated the performance of the framework with a various subset of social and user-level features to indicate that the users' social interactions with their friends on social networks could show their mental states. Unlike other researchers, to analyse friendships’ influence, both indirect and direct neighbors of a user were investigated; however, the topics of users’ posts were not considered as well as various genders, age groups, and their depression risk level.
Additionally, in order to investigate the diffusion structure of networks, Maireder et al. ( 2017 ) presented two new social network measures, namely Audience Diversity Score (ADS) and Communication Connector Bridging Score (CCBS). ADS identified the diversity of a particular actor’s followers, and CCBS highlighted the account that bridge and diffuse information throughout the entire network. The results demonstrated that the network was not divided by a unique factor but by a set of influential ones, like language, geo-identity, and political trends. Despite the advancement in communication patterns, the contents and types of tweets broadcast across the network were not analysed. Moreover, ADS and CCBD measures were not combined to detect the two-factor interaction in the spread of information.
As we stated earlier, social networks comprise a set of vertices or nodes in which nodes stand for users and individuals, which are associated with one another through numerous edges that represent their relations and interactions ( Leung and Zhang, 2016 ). “Community” is referred to as groups of individuals who have similar interests, attitudes, or common characteristics ( Wu et al., 2018 ). From the social aspect, detecting groups of individuals in a network on structural and topological properties is known as community learning which is crucial for various perspectives in society such as business and recommendation systems. Thus, it leads to innovative approaches for identification of communities that can be carried out in micro (micro-communities) or macro (macro-communities) network structural features. In community detection, the assumption is that people in one community interact more with one another because of the similarity of interests among them compared with other communities, so the network is divided into various communities.
In community learning, after identifying clusters of nodes, the number of clusters is determined. A cluster is mapped into a community, then the probability distribution over interactions among users and also within and among clusters is estimated. Community learning approaches can be categorized into node-based or group-based approaches to recognize the communities. Node-based approaches are carried out based on the properties of network nodes. Since similar nodes belong to the same communities, node degree, node similarity, or node reachability are considered in this approach. While group-based approaches do not regard characteristics at the node-level and consider the characteristics and the connections of the whole group and network by recognizing balanced, robust, modular, dense, or hierarchical communities.
In this section, the selected papers with community learning approaches are reviewed. Table 9 depicts a comparison of the selected papers with community learning approaches. It includes the main ideas, advantages, disadvantages, evaluation methods, tools, and case studies along with their categories. Table 10 shows the parameters that these papers with community learning approaches have used to evaluate their methods.
Reviewing and comparing papers with community learning approaches.
Category | Ref. | Main ideas | Advantages | Disadvantages | Evaluation methods | Tools | Case studies |
---|---|---|---|---|---|---|---|
Node-based | ( ) | Presenting a multi-resolution community detection algorithm on the Hadoop platform | Real testbed | Java, Hadoop, Apache HBase | Orkut,LiveJournal,Flickr,Patents,Skitter,BerkStan,YouTube,WikiTalk,Dblp | ||
( ) | Proposing an incremental community detection model | Real testbed | Not mentioned | DBLP (Digital Bibliography & Library Project Dataset) | |||
( ) | Presenting big data analytics for exploratory social network analysis | Real test bed | Pajek | An electronic store with 98 employees | |||
( ) | Presenting a cloud-based service to manage social big data | Prototype | MySQL, Apache Hadoop, GraphLab(Java), Apache Flume | ||||
Group-based | ( ) | Introducing an expert finder system based on big data analytics | Prototype | Hadoop | Scholar Mate | ||
( ) | Proposing a social based localization algorithm and OHSC model | Simulation | Java SE development | Not mentioned | |||
( ) | Introducing a tweet ranking model | Real test bed | Not mentioned | Sina microblog | |||
( ) | Designing a distributed community structure mining framework by using MapReduce | Real test bed | Hadoop | Large-scale artificial dataset,Real-world social media networks | |||
( ) | Presenting a cloud-based online learning algorithm for social big data analysis | Simulation | Hadoop | Not mentioned | |||
( ) | Proposing a parallel approach for creating a graph network | Real test bed | Hadoop | Not mentioned | |||
( ) | Analyzing defrauding information in social networks by employing Apache Hadoop | Real test bed | Apache Hadoop, Gephi, Apache Nifi, Apache Solr | ||||
( ) | Proposing a method to represent and manage social big data | Real test bed | Apache Hadoop, Java | The Stanford Network Analysis Project(SNAP) ego-Facebook, ego-Twitter dataset | |||
( ) | Presenting a real-time framework for analyzing Twitter data by applying graph analysis | Real test bed | Apache Spark | Twitter,Sina Weibo, Tencent Weibo | |||
( ) | Offering SNA method by adding semantics into nodes and edges in the weighted undirected graph | Real test bed | Not mentioned | Dow Jones Industrial Average (DJIA), Stock exchange markets (NYSE and NASDAQ) | |||
( ) | Presenting a U-model for directed and undirected graph based on similarities | Simulation | Not mentioned | Sina Weibo, Tencent Weibo, Twitter | |||
( ) | Offering a fuzzy logic and density-based clustering algorithm for big data analysis | Real test bed | Not mentioned | Facebook,YouTube | |||
( ) | Analyzing entrepreneurial social big data | Real test bed | MongoDB |
An overview of the evaluation parameters in papers with community learning approaches.
Category | Ref. | Accuracy | Precision | Recall | Scalability | Time | Security | NMI | Cost | Centrality Measures | Clustering Coefficient |
---|---|---|---|---|---|---|---|---|---|---|---|
Node-based | ( ) | ✓ | ✓ | ||||||||
( ) | ✓ | ✓ | ✓ | ✓ | |||||||
( ) | ✓ | ✓ | ✓ | ||||||||
( ) | ✓ | ||||||||||
Group-based | ( ) | ✓ | ✓ | ✓ | |||||||
( ) | ✓ | ✓ | ✓ | ||||||||
( ) | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
( ) | ✓ | ✓ | ✓ | ||||||||
( ) | ✓ | ✓ | ✓ | ✓ | |||||||
( ) | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ✓ | ||||||||
( ) | ✓ | ||||||||||
( ) | ✓ | ✓ | |||||||||
( ) | ✓ | ✓ | ✓ | ||||||||
( ) | ✓ | ||||||||||
( ) | ✓ |
Aksu et al. ( 2013 ) presented a multi K-core and multi-resolution solution for social network community detection. The authors offered a distributed and scalable algorithm that ran on Apache HBase to compute K-core subgraphs for both client and server-side. The experimental results on dynamic networks indicated that despite such advantages as robustness, parallel, and distributed processing, the proposed algorithm was very costly in case of inserting and deleting edges. Wu, et al. ( Wu et al., 2018 ) presented a hash-based approach along with graph mining to discover interactions and communities among users in social media in which a trade-off between efficiency and effectiveness of incremental and time slices-based approaches was guaranteed.
Since the result of SNA helps managers in decision making for their markets, Dabas ( 2017 ) considered an electronic store with 98 employees, who were responsible for selling, maintaining, and installing mobile phones, tablets, and so on. For experiments, Pajek and different metrics of SNA like degree centrality, betweenness centrality, stress centrality, Power Centrality (PC), Information Centrality (IC), reachability matrix, and clustering coefficient were used. The social analysis informed executive managers of customers’ reactions in real-time to respond quickly if necessary, but it suffered from inadequate security of sensitive and personal data. While Yousfi et al. ( 2016 ) proposed a solution to construct the graph of social big data to enhance the semantic extraction by graph analysis.
As finding the right researcher with the best experience and knowledge is time-consuming and critical in research communities, Sun et al. ( 2015 ) presented an expert recommendation method based on topic relevance, expert quality, and researcher connectivity for experts in scientific communities. The architecture of this expert finder system contained three phases (profiling, modeling, and ranking). Large-scale computation task was supported as well as linear speed up and high accuracy. In their method, except for AHP in the ranking phase, the authors did not use other techniques as the rank aggregation model. In another study, to enhance the quality of vehicle localization in vehicular networks, Lin et al. ( 2016 ) proposed an Overlapping and Hierarchical Social Clustering (OHSC) model. The OHSC model explored the social relations between vehicles, and then classified the vehicles into different social clusters. As a result of OHSC, a Social based Localization Algorithm (SBL) was presented to support the global localization through vehicle location prediction even without the GPS devices. Although SBL had a high overall performance in the vehicle localization, the SBL algorithm had low stability and the worst performance in location error.
By increasing active users and daily tweets, users are faced with a severe problem of overloading information. To overcome ranking and recommending challenge, most micro-blogging services organize tweets in a timely order that place newer tweets at the top, but all these tweets may not be attractive to users. Kuang et al. ( 2016 ) proposed a new tweet ranking model considered three main aspects, consisting of the popularity of a tweet itself, the intimacy between the user and the tweet publisher, and the user’s interest areas. This ranking model improved tweet ranking performance; however, more indicators for ranking in analysing users’ behaviors were not considered. In order to identify all hidden communities in social media networks, Jin et al. ( 2015 ) designed a framework for community structure mining in which network partitioning process was avoided, and map equation process ran directly on MapReduce in the new framework. Instead of PageRank, the authors employed local information of nodes and their neighbors for calculating the distribution probability related to each node. The framework outperformed the previous algorithms, such as Radetal and FastGN, in accuracy, velocity, and scalability. However, the greedy search method that was applied to find an appropriate node for combining had some limitations that needed to be improved.
Additionally, Li et al. ( 2016 ) offered a distributed algorithm for data centers to handle social data to ensure privacy and guarantee the prediction accuracy improvement in real-time. Further, Paik et al. ( 2017 ) presented an effective service discovery through the creation of a graph-based algorithm based on MapReduce and parallel programming. In ( Karimi et al., 2018 ), Twitter data were analysed, and the degree centrality was calculated to investigate deceiving information based on a parallel approach. Leung and Zhang ( 2016 ) offered a novel method to represent and manage social big data. They employed graph mining approaches in directed, bi-directed, undirected, and bipartite graphs for analyzing and mining social big data in distributed settings. In ( Sharma, 2018 ), researchers designed a framework to analyse real-time Twitter hashtags by employing hashtag co-occurrence graph and connected components algorithm. Moreover, Du ( 2018 ) developed a high-frequency pair trading algorithm to perform semantic analysis on a weighted undirected graph by employing SNA approaches along calculating centrality parameters in a stock market.
Since similar nodes are usually placed in the same cluster, in ( Wang et al., 2017 ), a U-model was introduced for directed and undirected graphs based on similarity, which could define social big data characteristics, clustering coefficient, degree, and distance distribution accurately. In order to analyse the conversation in a social network, Ghosh et al. ( 2016 ) offered a new algorithm utilizing fuzzy methodology and density-based clustering on social clouds. This study was applied to examine the rate of users’ participations to find the popularity of the subject under discussion. Besides, this algorithm could have been developed towards more heuristic-based graph mining and put a benchmark towards heuristic optimization. Further, to represent the structures of network communities, Wang et al. ( 2017 ) digitally analysed Twitter’s data about diverse actors involved in entrepreneurial networks by applying the Clauset-Newman-Moore algorithm. The counties that were in the same cluster had stronger internal interactions than those in different clusters, but this research did not analyse entrepreneurial networks on Twitter data and in case of lacking the participation of users in low population regions of the country.
The results of this systematic review are analysed in this section. Section 5.1 presents an overview of the selected papers. Since the goal of this review is to highlight the differences, advantages, and disadvantages of various big data analytic approaches in social networks, a discussion of the mentioned classification is outlined in Section 5.2 .
The following complementary questions are defined to explore the state-of-the-art on big data analytic approaches applied in social networks.
In this section, the distribution of 74 papers reviewed in Section 4 —categorized by publishers, the year of publication, the number of papers by year, and the percentage of papers classified by publishers—is shown in Fig. 5 , Fig. 6 , Fig. 7 , respectively. Fig. 5 , which states the papers over time, indicates that ScienceDirect, and Inderscience, have published papers in this field since 2015. IEEE, Springer, and ScienceDirect have provided the highest number of papers in this area, respectively. Also, Emerald and Taylor&Francis have presented the least number of papers. Fig. 6 shows that most papers in this subject were published in 2017 and 2019. Fig. 7 illustrates the classification of papers among nine publishers, out of which IEEE and Springer have provided 37% and 27% of the papers, respectively. 19% of the total papers were related to ScienceDirect, while, ACM, Inderscience, and SAGE publishers had 4% of the papers each. Also, 3% of the papers were published by Wiley. Additionally, Taylor&Francis, and Emerald, had 1% of the reviewed papers each.
The number of the studied papers categorized by publishers and years.
The number of the studied papers by years.
Percentage of the studied papers categorized by the publishers.
In Table 11 , we demonstrate the distribution of publication channel that published more than one paper among 74 studied papers. Table 11 depicts that 23 papers were published in IEEE Access (IF = 3.745), TMM (IF = 5.452), IJIM (IF = 8.210), IMMGT (IF = 4.695), FGCS (IF = 6.125), MTAP (IF = 2.313), WPC (IF = 1.061), I4C, and IEEE Big Data.
Distribution of the studies per publication channel.
IEEE Access | 4 | ||
IEEE Transactions on Multimedia (TMM) | 2 | ||
International Journal of Information Management (IJIM) | 3 | ||
Industrial Marketing Management (IMMGT) | 2 | ||
Future Generation Computer Systems (FGCS) | 2 | ||
Multimedia Tools and Applications (MTAP) | 3 | ||
Wireless Personal Communications (WPC) | 2 | ||
IEEE International Conference on Big Data (Big Data Congress) (IEEE Big Data) | 3 | ||
International Conference on Circuits, Controls, Communications and Computing (I4C) | 2 |
The reviewed studies were studied and classified according to various characteristics to answer some of the research questions listed in Section 3.1 , as explained below:
Big data analysis has many applications in social networks and is performed in various ways. As it was stated earlier, selected papers were reviewed, and big data analytic approaches in social networks were described in two main categories based on their analysis method: Content-oriented approaches, and network-oriented approaches. In content-oriented approaches, user-generated posts are analysed with the aid of lexical codes, linguistic codes, and statistical tools. Meanwhile, network-oriented approaches considered nodes or users and their relations for big social analysis. Also, the interaction between social group members and the relationship between group members and people outside the group are discovered. We categorized content-oriented approaches into two groups, topical learning and opinion/sentiment learning, and network-oriented approaches into two groups: Embedding learning and community learning.
Fig. 8 represents the percentage of social big data analytic techniques in reviewed papers based on Fig. 4 . Fig. 8 shows that the content-oriented approaches have the highest percentage (51%) in which topical learning and opinion/sentiment learning comprise 27% and 24% of the studied papers in the literature, respectively. Further, 49% of the papers are network-oriented approaches out of which 26% and 23% of the papers are related to embedding learning and community learning, respectively. The main properties of the selected papers reviewed were shown in Table 3 , Table 5 , Table 7 , Table 9 . The selected papers were evaluated based on critical parameters such as accuracy, scalability, precision, recall, F-measure, cost, and time. The advantages and disadvantages of the discussed taxonomy are summarized in Table 12 based on Table 3 , Table 5 , Table 7 , Table 9 . As specified in Table 12 , the main focus of researchers in content-oriented approaches are on some parameters such as accuracy, precision, recall, and time. This table also illustrates that accuracy and scalability are enhanced in network-oriented approaches, but privacy and security are not considered by most researchers. Moreover, findings have shown that since manipulating community-based features is challenging and not user-controlled, and extracting these features requires an in-depth analysis of a large and complex social community, which has high complexity and requires plenty of resources, community learning approaches have high costs. Besides, according to Table 12 , security and privacy-preserving are still the main drawbacks of community learning approaches.
Percentage of social big data analytic techniques in the selected papers.
A summarization of the advantages and disadvantages of the discussed taxonomy.
Content-oriented approaches | |||
Network-oriented approaches | |||
In this study, reviewed papers have been evaluated by various evaluation parameters, which were presented in Table 4 , Table 6 , Table 8 , Table 10 . Fig. 9 , illustrates the parameters used by researchers to evaluate the techniques and methods applied in reviewed papers. The results of the provided comparison in Fig. 9 show that 20% of the studies have enhanced accuracy, 16% of them have reduced time, and 12% of the studies have assessed scalability. Recall, precision, F-measure, and cost were also important among parameters. Based on the mentioned parameters, the percentage of each parameter was computed using (1) ( Hamzei and Navimipour, 2018 ). This equation means that the number of each occurrence was counted and divided by the sum of the whole number of occurrences, then the answer was multiplied by 100 (Eq. (1) ).
Percentage of evaluation parameters in the selected papers.
Fig. 10 indicates that in topical learning approaches, researchers focused on accuracy (23%) and recall (15%), while in opinion/sentiment learning approaches, accuracy (31%) and F-measure (18%) are the crucial ones. The significant parameters in embedding learning approaches were time and cost by 23% and 16%, respectively. To say more, 20% of the papers with community learning approaches have optimized scalability and 18% of them have reduced time, so the results showed that accuracy is essential in most approaches; however, privacy, reliability, and security are somewhat neglected in these approaches.
Percentage of evaluation parameters in each approach of the selected papers.
Some of the papers did not mention any tools for analyzing and implementing the intended approaches. According to tool columns in Table 3 , Table 5 , Table 7 , Table 9 , along with python programming language, Hadoop was the top used tool in 74 research studies of social network analysis. The high frequent application of Hadoop is due to its open-source libraries for distributed and parallel processing of large datasets, cost-effective, big storage, reliability, scalability, and handling unstructured and semi-structured data.
Fig. 11 demonstrates the social big data analysis applications of the reviewed papers, along with their percentage of repetitions. The results showed that, in the reviewed papers, the business and decision making, and parsing and sentiment analysis platform had the highest applications with 19% each. Along with these two applications, health care (15%) was a significant application of big social data analysis in studied papers.
Percentage of social big data analysis applications in the studied papers.
Selected studies have used various datasets to evaluate their approaches for analyzing the results of experiments. Based on the findings shown in Fig. 12 , most of the researchers used Twitter. In addition to Twitter, the most significant percentage of the usage of datasets belongs to Sina microblog and Facebook.
Repetition of used datasets and case studies in the selected papers.
Based on Table 3 , Table 5 , Table 7 , Table 9 , which have depicted the evaluation methods applied in each approach, there were five evaluation methods in the reviewed papers: Simulation, prototype, data sets, real testbed, and example application. As shown in Fig. 13 , 42% of assessments were related to data sets, while 35% of them were associated with real testbed. Lucidly, simulation dedicated 19% in itself. Fig. 14 , displays the repetition of evaluation methods in each learning approach. The comparative results illustrate that in topical and opinion/sentiment learning, most evaluation methods are data sets. ML algorithms and data sets were widely used in semantic analysis and incorporated many ideas and innovations into social networks, welcoming virtual world users and social network growth; however, in community learning approaches, the real testbed has the highest usage in most evaluations. Finally, real testbed and simulation cover most of the evaluations for embedding learning approaches.
Percentage of evaluation methods in the selected papers.
Repetition of evaluation methods in each approach in the selected papers.
Given the vast quantity of live social media streams and their impact on society, many techniques have been proposed to collect and analyse live UGC to support various applications. The techniques studied in this paper assist us in gaining insights into social data via big data analytics. The presented systematic literature is a good starting point to reveal open challenges. However, content-oriented and network-oriented approaches still face many vital challenges as mentioned below:
In this respect, public care organizations can start-up social health networks for diagnosing and preventing the spread of contagious disease in various geographical locations at different times by exploring public health posts in various social networks ( Elkin et al., 2017 ). On the other hand, by analyzing the graph of interactions between users on social networks and examining influential users, nodes with multiple edges have been identified, so by limiting and quarantining them, the transmission rate of contagious disease can be forecasted, which allows us in better decision making to control infectious ailments. This would ultimately lead to a notable reduction in healthcare costs ( Zadeh et al., 2019 ).
They can also track the origin of diseases, the transmission of diseases from generation to generation, the effects of drugs, and their interactions in different diseases ( Thorstad and Wolff, 2019 ). This helps the pharmaceutical industry as well as healthcare promotion and health disorder diagnosis. One of the limitations of the current work in this area is that the nodes and their relations were considered static over time. Considering and analyzing the network in real-time and the dynamic interaction among nodes are still open issues that can achieve more accurate predictions. Most researchers also have studied social influence and information diffusion in a particular platform; analyzing information diffusion and social influence across multiple platforms simultaneously can also be a challenge in the future. However, among the reviewed literature, there were few papers on political and e-commerce applications, so these two issues are good topics for future research.
Many researchers try to mitigate a limited number of these challenges ( Sun et al., 2018 , Kauffmann et al., 2019 , Jimenez-Marquez et al., 2019 ), but they failed to achieve high accuracy, so most of these challenges in sentiment analysis have not yet been resolved, and further research is needed.
This SLR presents a taxonomy and a comparison of big data analytics in social networks. These types of review papers usually have constraints ( Brereton et al., 2007 ), but the results of SLRs are mainly reliable ( Zhang and Babar, 2013 ). The major limitations and threats to the validity of this SLR are discussed below.
As a matter of fact, by defining a review protocol, following a systematic procedure, and the involvement of various researchers, this SLR has high validity.
This paper presents a systematic review of big data analytics in social networks. We explained the research methodology, paper selection process, and selected 74 papers between 2013 and August 2020, from among 785 papers in our search query. A significant number of the studied papers were related to IEEE, Springer, and ScienceDirect journals, with 37%, 27%, and 19%, respectively. On the other hand, each of Taylor&Francis and Emerald publishers with 1% had the lowest number of published papers. From these studies, 74 papers were categorized into two approaches: Content-oriented approaches (51%) and network-oriented approaches (49%). Besides, the main ideas, advantages, disadvantages, evaluation methods, tools, and evaluation parameters of each studied paper were discussed. It was found that the most widely considered evaluation parameters were accuracy (20%), time (16%), and scalability (12%), but privacy, reliability, and security measures were somewhat neglected. Considering the applied tools, it is observed that, in the selected studies, along with Python programming language, Hadoop was used more than other tools. Concerning the outcome of this SLR, the existing social big data analytic approaches have inadequate capability to guarantee privacy-preserving and scalability and have faced several open issues such as latency, real-time processing, and high run-time of feature selection. Lucidly, the most unresolved challenges are various aspects of opinion/sentiment analysis such as domain dependency, the rare resource languages, detecting sarcasm and slangs, subjectivity detection, and multiple data sources. We hope that the findings of this paper will assist researchers to propose novel contributions to overcome social big data challenges.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
We are grateful for the insightful and constructive comments offered by Dr. Mohammad Akbari, and also appreciate anonymous reviewers for their precious comments which improved the final version.
1 https://scholar.google.com
2 https://link.springer.com
3 https://ieeeexplore.ieee.org
4 https://www.sciencedirect.com
5 https://online.sagepub.com
6 https://www.tandfonline.com
7 https://onlinelibrary.wiley.com
8 https://www.emeraldinsight.com
9 https://dl.acm.org
10 https://www.inderscienceonline.com
Evaluation parameter | Description and formula |
---|---|
Confusion matrix | For a binary classifier, four possible outputs of the confusion matrix are defined as below: (TP): The number of correctly positive predictions (TN): The number of correctly negative predictions (FP): The number of predictions that are labelled positive incorrectly (FN): The number of predictions that are labelled negative incorrectly |
Accuracy | Accuracy in the social network refers to the degree of similarity between the actual structure of a relationship and the individuals’ perceptions of the structure of the same relationship in a particular social media ( ). In other words, it is the number of correctly predicted observations over the total number of observations. |
Precision | Precision focuses on false positives and is the number of correctly predicted positive observations over the total predicted positive observations. Indeed, it is the ability of the model not to label a negative sample as a positive. |
Recall (Sensitivity or TPR) | Recall is the fraction of correctly predicted positive observations by a proposed model among all positive observations in the actual class of the dataset. Intuitively, it is the ability of a classifier to discover all positive samples correctly. |
F-measure | F-measure is a harmonic mean of precision and recall to identify if a presented model reaches the objective of high precision and recall at a time. Since it is a weighted average of precision and recall and takes both FP and FN into account, it can be applied for measuring the efficiency of the model in many domains. |
Specificity (TNR) | Specificity shows what the proportion of negative observations is predicted correctly. |
ROC (AUC) | ROC curve is illustrated graphically to show the trade-off between sensitivity (TPR) on Y-axis and (1-specificity) (FPR) on X-axis for every possible threshold value. The area under the curve refers to AUC that is applied to determine the ability of a classifier in distinguishing positive and negative classes. The higher the AUC, the better the performance of a classifier is. |
Kappa coefficient | Kappa is an inter-rater reliability measure to evaluate the agreement between two raters. In other words, it shows how closely the observations classified by a classifier are in agreement with the data labeled as ground truth. It can be calculated by this formula: |
Matthews Correlation Coefficient (MCC) | It evaluates the correlation between the observed and predicted classifications of an instance. The formula of the MCC is: |
Clustering coefficient | Clustering coefficient indicates how much each node is willing to create clusters in a network. There are two types of clustering coefficients: the local clustering and the global clustering. The local refers to the embeddedness of every single node, while the global refers to an overall indication of clustering in the network. A clustering parameter is a real number between zero and one ( , ). When there are no clusters, this coefficient is equal to zero, and in case of disjoint cliques, in which the maximal clustering occurs, it is equal to one ( , ). |
Security | Security refers to the requirements that the system needs to protect against potential attacks, threats, unauthorized access, and privacy-preserving issues ( ). |
Scalability | Scalability means the ability of a social network to expand in case of rising demand for processors, networks, or file system resources. Scalability consists of two categories, as follows: It refers to the addition of new hardware instead of increasing the capability of the existing hardware. It can be performed by adding resources, or powerful hardware to (or removing resources from) a system like adding CPU or RAM to a single system node or a single computer. |
Time | In this paper, all the factors related to time, such as execution time, average response time, statistical analysis time (starting time), delay, and running time are considered as the time factor. |
Normalized mutual information (NMI) | NMI is an information theoretic-based measure that can be used to assess the quality of clustering to compare community detection methods. This measure compares different clusters, and whenever its value is high, it means that the two clusters are similar ( ). If clusters X and Y are precisely the same, their NMI is equal to one ( ). |
Cost | The price of acquiring, producing, performing, or maintaining the requested service |
Influence diffusion | This measure shows how one person’s actions affect other people in a network ( ); it shows how many users are affected by the most influential users in the network. |
Centrality measures | In the context of web information retrieval, using centrality measures is a vital task in community analysis ( ). By using centrality measures, researchers try to answer the question “who is the most important, impressive, or central person in the network?” ( ). Some of the popular centrality measures are discussed below: : It means the degree and the number of neighbors of a node and is computed by the number of direct links to a node. In the undirected graph, the more central the node is, the higher the degree will be ( ). In a digraph, there are two types of this measure, in-degree, which refers to the number of inbound links to a node, and out-degree, which is the number of outbound links of a node ( ). : Closeness centrality that calculates the shortest path among all nodes and is defined for a node V as the inverse of the distance (Eq. ). In other words, closeness means the length of time it takes to transfer information from one node to all other nodes ( ). |
: It refers to the number of times a node is placed among the shortest paths of other nodes, that is, after identifying all the shortest paths, the number of paths in which a given node is located is counted ( ). : Eigenvector centrality is different from in-degree centrality, referring to the importance of each node of the graph. A node with high in-degree centrality does not necessarily have a high eigenvector centrality and vice versa ( ), so this parameter shows the important nodes that influence the entire network ( ). : PageRank is calculated to determine the importance of the node by considering the degree and quality of the nodes. It focuses on the centrality of linkers, link directions, and their weights ( ). It is a recursive measure where the value for one node grows with the PageRank of its neighbors weighted by the reciprocal of their degrees. It can be thought of as the probability of visiting a node under the random surfer model ( ). |
Boost your website's performance with a free SEO audit report. Don't miss out on the opportunity to enhance your SEO strategy for free!
Social Network Analysis (SNA) offers deep insights into interconnected relationships and network structures, aiding decision-making processes across various domains.
Understanding data privacy concerns and ethical considerations is crucial in conducting responsible SNA research and analysis.
Effectively handling big data is a significant challenge in SNA, requiring advanced tools and strategies for accurate analysis and interpretation.
Ethical guidelines and transparency are paramount in navigating the complexities of SNA research, ensuring integrity and respect for individual privacy.
Mastering SNA involves striking a balance between technical proficiency and ethical considerations, unlocking its full potential for impactful insights and applications.
Learning Social Network Analysis (SNA) reveals a world of connections and data insights. This guide will teach you its basics, methods, and uses. Yet, a key question remains: How can we fully use SNA to better understand human interactions and decisions?
Social Network Analysis (SNA) uses networks and graph theory to study social structures. It maps and measures relationships and flows among people, groups, or organizations.
This reveals interaction patterns and network structures. SNA is key in understanding information, resources, and influence flow. It offers insights beyond traditional methods.
Modern research values Social Network Analysis (SNA). It uncovers social interaction dynamics and complexity. SNA is crucial in sociology, anthropology, epidemiology, and organizational studies. It reveals how relationships shape behavior.
For example, in public health, SNA tracks disease spread in communities. In business, it shows how informal networks impact effectiveness. Researchers, through SNA, gain insights into social issues. This leads to better interventions and strategies.
Social Network Analysis (SNA) focuses on two key elements: nodes and edges. Nodes stand for network members, like people, organizations, or computers. Meanwhile, edges are their direct connections, showing interactions. Knowledge of these elements is vital. They are the building blocks of social networks, allowing analysts to understand complex relationships.
Different types of networks are essential to grasp in Social Network Analysis. These include:
Network metrics are critical for quantifying the structure and properties of social networks. Key metrics in Social Network Analysis include:
In Social Network Analysis (SNA), data collection plays a pivotal role in extracting meaningful insights from social networks.
One of the primary techniques used is surveying , where individuals are asked to identify their connections and relationships within a network. This approach helps in mapping out the structure and dynamics of the network.
State of Technology 2024
Humanity's Quantum Leap Forward
Explore 'State of Technology 2024' for strategic insights into 7 emerging technologies reshaping 10 critical industries. Dive into sector-wide transformations and global tech dynamics, offering critical analysis for tech leaders and enthusiasts alike, on how to navigate the future's technology landscape.
With a Foundation of 1,900+ Projects, Offered by Over 1500+ Digital Agencies, EMB Excels in offering Advanced AI Solutions. Our expertise lies in providing a comprehensive suite of services designed to build your robust and scalable digital transformation journey.
Another valuable technique is archival data analysis , which involves studying existing records such as communication logs, email threads, or organizational charts to uncover patterns and relationships within the network. This method provides a historical perspective and can reveal how networks evolve over time.
Several software tools are available for conducting Social Network Analysis (SNA), each offering unique features and functionalities.
Gephi is a popular open-source tool known for its interactive visualization capabilities and extensive network analysis algorithms. It allows users to explore and analyze large-scale networks with ease.
UCINET (UCI Network) is another widely used software package that provides a comprehensive suite of tools for network analysis, including centrality measures, clustering algorithms, and statistical tests. It is favored by researchers and analysts for its robustness and versatility in handling diverse network datasets.
NodeXL stands out for its integration with Microsoft Excel, making it accessible to users familiar with spreadsheet-based data manipulation. It offers a user-friendly interface and supports various network metrics and visualizations, making it suitable for both beginners and advanced analysts.
Visualization is a crucial aspect of Social Network Analysis (SNA) as it allows researchers and practitioners to interpret complex network structures and patterns visually.
Node-Link Diagrams represent nodes (individual entities) and edges (relationships) in a network graphically, providing a clear depiction of connections and clusters.
Heatmaps and matrix plots are employed to visualize network data in a matrix format, highlighting the strength and density of relationships between nodes. These visualizations aid in identifying key influencers, detecting communities, and understanding the flow of information or resources within the network.
Interactive visualizations enhance the exploration and analysis process by enabling users to interactively navigate and filter network data, zoom into specific regions, and extract detailed information on nodes and edges. This dynamic approach fosters deeper insights and facilitates communication of findings to stakeholders effectively.
Social Network Analysis (SNA) plays a pivotal role in understanding the dynamics of social media platforms . It helps in analyzing the relationships, interactions, and influence among individuals or entities within these digital networks.
By applying SNA techniques, businesses can gain insights into user behavior, identify key influencers, track information flow, and optimize their social media strategies for better engagement and ROI .
In the realm of healthcare, Social Network Analysis (SNA) has emerged as a valuable tool for studying patient-provider relationships, healthcare collaborations, and disease transmission patterns.
By mapping out the social networks within healthcare settings, researchers and practitioners can identify central nodes, assess information dissemination, detect potential bottlenecks, and enhance care coordination for improved patient outcomes and organizational efficiency.
Social Network Analysis (SNA) offers profound insights into organizational behavior by examining the relationships, communication patterns, and knowledge sharing among employees, departments, and external stakeholders.
By leveraging SNA, organizations can identify informal leaders, enhance collaboration, streamline decision-making processes, foster innovation, and strengthen overall performance and productivity.
In the realm of political science, Social Network Analysis (SNA) provides a systematic approach to studying political actors, alliances, power dynamics, and information dissemination within political systems.
By employing SNA techniques, researchers can analyze political networks, assess influence flows, map out lobbying efforts, understand coalition formations, and gain a deeper understanding of the complex socio-political landscape for informed decision-making and policy development.
Social Network Analysis (SNA) delves into the dynamic nature of networks, exploring how they evolve and transform over time. This field investigates the intricate processes that drive changes within networks, encompassing both growth and decline phenomena. By studying these dynamics, analysts gain valuable insights into the underlying mechanisms that shape network structures.
In understanding Social Network Analysis, it’s essential to grasp the methodologies used to model network growth and decline. Researchers employ various mathematical and computational models to simulate these processes, allowing them to predict and analyze network changes over time. These models play a crucial role in forecasting network trends and anticipating potential shifts in connectivity patterns.
A fundamental aspect of Social Network Analysis involves community detection algorithms. These algorithms are designed to identify clusters and subgroups within a network, revealing distinct communities based on shared attributes or interactions. Different methods, such as modularity optimization and hierarchical clustering, are employed to uncover meaningful structures within complex networks.
Social Network Analysis encompasses a range of community detection methods, each offering unique advantages and applications. From traditional approaches like hierarchical clustering to advanced techniques like spectral clustering and Louvain algorithm, analysts have a diverse toolkit to explore and analyze network communities. These methods facilitate a nuanced understanding of network dynamics and community structures.
To conduct in-depth analyses, researchers and practitioners rely on specialized Social Network Analysis tools and software. Popular packages like Gephi and NetworkX provide comprehensive functionalities for visualizing, modeling, and analyzing networks.
Additionally, online platforms and resources offer accessible tools for conducting SNA studies, enhancing collaboration and knowledge sharing within the field.
When delving into Social Network Analysis (SNA), one immediate challenge is navigating data privacy concerns. The intricate web of connections analyzed in SNA often involves personal information, raising questions about consent, confidentiality, and data protection.
Striking a balance between extracting valuable insights and respecting individuals’ privacy rights remains a critical consideration in SNA research and practice.
Another significant challenge in Social Network Analysis is effectively handling big data. With the exponential growth of digital interactions, SNA researchers often encounter vast amounts of data that require advanced tools and techniques for processing and analysis.
Scalability, computational resources, and data management strategies become paramount in ensuring the accuracy and reliability of SNA outcomes.
Ethical considerations play a crucial role in Social Network Analysis research endeavors. Researchers must navigate ethical dilemmas concerning data collection methods, participant consent, and the potential impact of their findings on individuals and communities. Maintaining transparency, integrity, and adherence to ethical guidelines are fundamental pillars in conducting ethically sound SNA studies.
In conclusion, Social Network Analysis is a powerful tool for understanding relationships and interactions within networks. By analyzing connections, nodes, and patterns, businesses can gain valuable insights into their audience, improve decision-making, and enhance network performance. Mastering these concepts can lead to more effective strategies and meaningful outcomes in various fields.
Social Network Analysis (SNA) is a methodology used to study relationships and interactions within a network of individuals, groups, or organizations. It involves mapping and measuring the relationships and flows between people, groups, organizations, computers, or other information/knowledge processing entities. By analyzing these networks, SNA can uncover patterns and insights that are not apparent through traditional analysis.
Social Network Analysis is crucial for understanding the complex dynamics of interactions within various networks, from social media platforms to organizational structures. It helps identify key influencers, understand information flow, and detect communities or clusters. This analysis is vital for strategic decision-making in marketing, public health, organizational management, and more.
Common tools for Social Network Analysis include Gephi, UCINET, and NodeXL, which provide powerful visualization and analysis capabilities. These tools help researchers and analysts map networks, calculate network metrics, and visualize relationships. Each tool offers unique features tailored to different types of network analysis, making them essential for both beginners and experts.
Key metrics in Social Network Analysis include degree centrality, betweenness centrality, and closeness centrality. Degree centrality measures the number of direct connections an entity has, betweenness centrality indicates the entity’s role as a bridge within the network, and closeness centrality measures how quickly an entity can access others in the network. These metrics help identify influential nodes and understand the network’s structure.
Ethical considerations in Social Network Analysis include data privacy, consent, and the potential misuse of network data. Researchers must ensure that data is collected and used responsibly, protecting individuals’ privacy and obtaining necessary permissions. It’s also important to consider the impact of network analysis findings on individuals and groups, avoiding harm or exploitation.
What is file management things to know, a beginner’s guide to issue tracking, what is web filtering essential things to know, all you should know about it strategy, what is bot management and how does it work, understanding human-computer interaction (hci), table of contents.
Expand My Business is Asia's largest marketplace platform which helps you find various IT Services like Web and App Development, Digital Marketing Services and all others.
Article Categories
Sitemap / Glossary
Copyright © 2024 Mantarav Private Limited. All Rights Reserved.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
Glove: global vectors for word representation, long short-term memory, mining and summarizing customer reviews, learning long-term dependencies with gradient descent is difficult, related papers.
Showing 1 through 3 of 0 Related Papers
Numbers, Facts and Trends Shaping Your World
Read our research on:
Full Topic List
Read Our Research On:
Debates over who is Hispanic have often fueled conversations about identity among Americans who trace their heritage to Latin America or Spain .
So, who is considered Hispanic in the United States today? How exactly do the federal government and others count the Hispanic population? And what role does race play in deciding who counts as Hispanic?
We’ll answer these and other common questions here.
To answer the question of who is Hispanic, this analysis draws on about five decades of U.S. Census Bureau data and about two decades of Pew Research Center surveys of Hispanic adults in the United States.
National counts of the Latino population come from the Census Bureau’s decennial census (this includes P.L. 94-171 census data ) and official population estimates . The bureau’s American Community Survey (ACS) provides demographic details such as race, country of origin and intermarriage rates. Some ACS data was accessed through IPUMS USA from the University of Minnesota.
Views of Hispanic identity draw on the Center’s National Survey of Latinos (NSL), which is fielded in English and Spanish. The survey has been conducted online since 2019, primarily through the Center’s American Trends Panel (ATP), which is recruited through national, random sampling of residential addresses. This way nearly all adults have a chance of selection. The survey is weighted to be representative of the U.S. Hispanic adult population by gender, Hispanic origin, partisan affiliation, education and other categories. Read more about the ATP’s methodology . The NSL was conducted by phone from 2002 to 2018.
Read further details on how the Census Bureau asked about race and ethnicity and coded responses in the 2020 census. Here is a full list of origin groups that were coded as Hispanic in the 2020 census.
The Census Bureau estimates there were 65.2 million Hispanics in the U.S. as of July 1, 2023, a new high. They made up more than 19% of the nation’s population .
Before diving into the details, keep in mind that some surveys ask about Hispanic origin and race separately, following current Census Bureau practices – though these are soon to change.
One way to count Hispanics is to include those who say they are Hispanic, with no exceptions – that is, you are Hispanic if you say you are. Pew Research Center uses this approach in our surveys, as do other polling firms such as Gallup and voter exit polls .
The Census Bureau largely counts Hispanics this way, too, but with some exceptions. If respondents select only the “Other Hispanic” category and write in only non-Hispanic responses such as “Irish,” the Census Bureau recodes the response as non-Hispanic.
However, beginning in 2020 , the bureau widened the lens to include a relatively small number of people who did not check a Hispanic box on the census form but answered the race question in a way that implied a Hispanic background. As a result, someone who answered the race question by saying that they are “Mexican” or “Argentinean” was counted as Hispanic, even if they did not check the Hispanic box.
From the available data, the exact number of respondents affected by this change is difficult to determine. But it appears to be about 1% of Hispanics or fewer, according to a Pew Research Center analysis of U.S. Census Bureau data.
In the eyes of the Census Bureau, Hispanics can be of any race, because “Hispanic” is an ethnicity and not a race. However, this distinction is subject to debate . A 2015 Center survey found that 17% of Hispanic adults said being Hispanic is mainly a matter of race, while 29% said it is mainly a matter of ancestry. Another 42% said it is mainly a matter of culture.
Nonetheless, the Census Bureau’s 2022 American Community Survey (ACS) provides the self-reported racial identity of Hispanics: 22.5 million single-race Hispanics identified only as “some other race.” This group mostly includes those who wrote in a Hispanic origin or nationality as their race. Another 10.7 million identified as White. Fewer Hispanics identified as American Indian (1.5 million), Black (1.0 million) or Asian (300,000).
Another roughly 27.5 million Hispanics identified as more than one race in 2022, up from just 3 million in 2010.
Growth in the number of multiracial Hispanics comes primarily from those who identify as White and “some other race.” That population grew from 1.6 million to 24.9 million between 2010 and 2022. The number of Hispanics who identify as White and no other race declined from 26.7 million to 10.7 million.
The sharp increase in multiracial Hispanics could be due to several factors, including changes to the census form introduced in 2020 that added more space for written responses to the race question and growing racial diversity among Hispanics. This explanation is supported by the fact that almost 25 million of the Hispanics who identified as two or more races in 2022 were coded as “some other race” (and wrote in a response) and one of the specific races (such as Black or White). About 2.6 million Hispanics identified with two or more of the five major races offered in the census.
The 2030 census will combine the race and ethnicity questions , a change that other federal surveys will implement in coming years. The new question will add checkboxes for “Hispanic or Latino” and “Middle Eastern or North African” among other race groups long captured in Census Bureau surveys.
Officials hope the changes will reduce the number of Americans who choose the “Some other race” category, especially among Hispanics . However, it’s worth noting that public feedback has raised a variety of concerns, including that combining the race and ethnicity questions could lead to an undercount of the nation’s Afro-Latino population .
In 1976, Congress passed a law that required the government to collect and analyze data for a specific ethnic group: “Americans of Spanish origin or descent.” That legislation defined this group as “Americans [who] identify themselves as being of Spanish-speaking background and trace their origin or descent from Mexico, Puerto Rico, Cuba, Central and South America, and other Spanish-speaking countries.” This includes around 20 Spanish-speaking nations from Latin America and Spain itself, but not Portugal or Portuguese-speaking Brazil.
To implement this law, the U.S. Office of Management and Budget (OMB) developed Statistical Policy Directive No. 15 (SPD 15) in 1977, then revised it in 1997 and again in March 2024. In the most recent revision, OMB updated racial and ethnic definitions when it announced the combined race and ethnicity question. The current definition of “ Hispanic or Latino ” is “individuals of Mexican, Puerto Rican, Salvadoran, Cuban, Dominican, Guatemalan, and other Central or South American or Spanish culture or origin.”
The Census Bureau first asked everybody in the U.S. about Hispanic ethnicity in 1980. But it made some efforts before then to count people who today would be considered Hispanic. The Census Bureau also has a long history of changing labels and shifting categories . In the 1930 census, for example, the race question had a category for “Mexican.”
The first major attempt to estimate the size of the nation’s Hispanic population came in 1970 and prompted widespread concerns among Hispanic organizations about an undercount. A portion of the U.S. population (5%) was asked if their origin or descent was from the following categories: “Mexican, Puerto Rican, Cuban, Central or South American, Other Spanish” or “No, none of these.”
This approach indeed undercounted about 1 million Hispanics. Many second-generation Hispanics did not select one of the Hispanic groups because the question did not include terms like “Mexican American.” The question wording also resulted in hundreds of thousands of people living in the Central or Southern regions of the U.S. being mistakenly included in the “Central or South American” category.
By 1980, the current approach – in which someone is asked if they are Hispanic – had taken hold, with some changes to the question and response categories since then. In 2000, for example, the term “Latino” was added to make the question read, “Is this person Spanish/Hispanic/Latino?”
“Hispanic” and “Latino” are pan-ethnic terms meant to describe – and summarize – the population of people of that ethnic background living in the U.S. In practice, the Census Bureau often uses the term “Hispanic” or “Hispanic or Latino.”
Some people have drawn sharp distinctions between these two terms . For example, some say that Hispanics are from Spain or from Spanish-speaking countries in Latin America, which matches the federal definition, and Latinos are people from Latin America, regardless of language. In this definition, Latinos would include people from Brazil (where Portuguese is the official language) but not Spain or Portugal.
Pan-ethnic labels like Hispanic and Latino, though widely used, are not universally embraced by the population being labeled. Our 2023 National Survey of Latinos shows a preference for other terms to describe identity: 52% of respondents most often described themselves by their family’s country of origin, while 30% used the terms Hispanic, Latino, Latinx or Latine, and 17% most often described themselves as American.
The 2023 survey also finds varying preferences for pan-ethnic labels: 52% of Hispanics prefer to describe themselves as Hispanic, 29% prefer Latino, 2% prefer Latinx, 1% prefer Latine and 15% have no preference.
Latinx is a pan-ethnic identity term that has emerged in recent years as an alternative to Hispanic and Latino. Some news and entertainment outlets, corporations , local governments and universities use it to describe the nation’s Hispanic population.
However, its popularity has brought increased scrutiny in the U.S. and abroad . Some critics say it ignores the gendered forms of Spanish language, while others see Latinx as a gender- and LGBTQ+-inclusive term . Adding to the debate, some state lawmakers favor banning the use of the term entirely in government documents; Arkansas has done so already .
A 2023 survey found that awareness of Latinx has doubled among U.S. Hispanics since 2019, with growth across all major demographic subgroups. Still, the share of Hispanic adults who use Latinx to describe themselves is statistically unchanged: In 2023, 4% said they use it, compared with 3% in 2019.
Latinx is also broadly unpopular among Latinos who know the term. Three-in-four Latino adults who are aware of Latinx say the term should not be used to describe Hispanics or Latinos.
The emergence of Latinx coincides with a global movement to introduce gender-neutral nouns and pronouns into many languages that have traditionally used male or female constructions. In the U.S., Latinx first appeared more than a decade ago, and it was added to a widely used English dictionary in 2018.
Latine is another pan-ethnic term that has emerged in recent years. Our 2023 survey found that 18% of U.S. Hispanics have heard of the term.
Similar to familiarity with Latinx, awareness of Latine varies by age, education and sexual orientation. Among Latinos, awareness of Latine is highest among those ages 18 to 29 (22%), college graduates (24%) and lesbian, gay and bisexual adults (32%).
Many U.S. Hispanics have an inclusive view of what it means to be Hispanic:
Views of Hispanic identity may change in the coming decades as broad societal changes, such as rising intermarriage rates, produce an increasingly diverse and multiracial U.S. population .
Today, many Hispanic families include people who are not Hispanic:
Spouses: Among all married Hispanics in 2022, 22% had a spouse who is not Hispanic. And in a 2023 Center survey , 27% of Hispanics with a spouse or partner said their spouse or partner is not Hispanic.
Newlyweds: In 2022, 30% of Hispanic newlyweds married someone who is not Hispanic. Among them, 41% of those born in the U.S. married someone who is not Hispanic, compared with 11% of immigrant newlyweds, according to an analysis of ACS data.
Parents: Our 2015 survey found that 15% of U.S. Hispanic adults had at least one parent who is not Hispanic. This share rose to 29% among the U.S. born and 48% among the third or higher generation – those born in the U.S. to parents who were also U.S. born.
In surveys like those from the Census Bureau, skin color does not play a role in determining who is Hispanic or not. However, as with race, Latinos can have many different skin tones. A 2021 Center survey of Latino adults showed respondents a palette of 10 skin colors and asked them to choose which one most closely resembled their own.
Latinos reported having a variety of skin tones, reflecting the diversity within the group. Eight-in-ten Latinos selected one of the four lightest skin colors. By contrast, only 3% selected one of the four darkest skin colors.
A majority of Latino adults (57%) say skin color shapes their daily life experiences at least somewhat. Similar shares say having a lighter skin color helps Latinos get ahead in the U.S. (59%) and that having a darker skin color hurts Latinos’ ability to get ahead (62%).
Afro-Latino identity is distinct from and can exist alongside a person’s Hispanic identity. Afro-Latinos’ life experiences are shaped by race, skin tone and other factors in ways that differ from other Hispanics. While most Afro-Latinos identify as Hispanic or Latino, not all do, according to our estimates based on a survey of U.S. adults conducted in 2019 and 2020.
In 2020, about 6 million Afro-Latino adults lived in the U.S., making up about 2% of the U.S. adult population and 12% of the adult Latino population. About one-in-seven Afro-Latinos – an estimated 800,000 adults – do not identify as Hispanic.
Officially, Brazilians are not considered Hispanic or Latino because the federal government’s definition applies only to those of “Spanish culture or origin.” In most cases, people who report their Hispanic or Latino ethnicity as Brazilian in Census Bureau surveys are later recategorized – or “back coded” – as not Hispanic or Latino . The same is true for people with origins in Belize, the Philippines and Portugal.
An error in how the Census Bureau processed data from a 2020 national survey omitted some of this coding and provided a rare window into how Brazilians (and other groups) living in the U.S. view their identity.
In 2020, at least 416,000 Brazilians — more than two-thirds of Brazilians in the U.S. — described themselves as Hispanic or Latino on the ACS and were mistakenly counted that way. Only 14,000 Brazilians were counted as Hispanic in 2019, and 16,000 were in 2021.
The large number of Brazilians who self-identified as Hispanic or Latino highlights how their view of their own identity does not necessarily align with official government definitions. It also underscores that being Hispanic or Latino means different things to different people .
Of the 42.7 million adults with Hispanic ancestry living in the U.S. in 2015, an estimated 5 million people, or 11%, said they do not identify as Hispanic or Latino , according to a 2015-16 Center survey. These people aren’t counted as Hispanic in our surveys.
Notably, Hispanic self-identification varies across immigrant generations. Among immigrants from Latin America, nearly all identify as Hispanic. But by the fourth generation, only half of people with Hispanic heritage in the U.S. identify as Hispanic.
Note: This is an update of a post originally published on May 28, 2009.
Mark Hugo Lopez is director of race and ethnicity research at Pew Research Center .
Jens Manuel Krogstad is a senior writer and editor at Pew Research Center .
Jeffrey S. Passel is a senior demographer at Pew Research Center .
A majority of latinas feel pressure to support their families or to succeed at work, key facts about u.s. latinos for national hispanic heritage month, latinos’ views of and experiences with the spanish language, 11 facts about hispanic origin groups in the u.s., most popular.
901 E St. NW, Suite 300 Washington, DC 20004 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 | Media Inquiries
ABOUT PEW RESEARCH CENTER Pew Research Center is a nonpartisan, nonadvocacy fact tank that informs the public about the issues, attitudes and trends shaping the world. It does not take policy positions. The Center conducts public opinion polling, demographic research, computational social science research and other data-driven research. Pew Research Center is a subsidiary of The Pew Charitable Trusts , its primary funder.
© 2024 Pew Research Center
Data Sharing Statement
Sign up for emails based on your interests, select your interests.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Others also liked.
Chiu DT , Hamlat EJ , Zhang J , Epel ES , Laraia BA. Essential Nutrients, Added Sugar Intake, and Epigenetic Age in Midlife Black and White Women : NIMHD Social Epigenomics Program . JAMA Netw Open. 2024;7(7):e2422749. doi:10.1001/jamanetworkopen.2024.22749
© 2024
Question Are dietary patterns, including essential nutrients and added sugar intakes, and scores of nutrient indices associated with epigenetic aging?
Findings In this cross-sectional study of 342 Black and White women at midlife, higher added sugar intake was associated with older epigenetic age, whereas higher essential, pro-epigenetic nutrient intake and higher Alternate Mediterranean Diet (aMED) and Alternate Healthy Eating Index (AHEI)–2010 scores (reflecting dietary alignment with Mediterranean diet and chronic disease prevention guidelines, respectively) were associated with younger epigenetic age.
Meaning The findings of this study suggest a tandem importance in both optimizing nutrient intake and reducing added sugar intake for epigenetic health.
Importance Nutritive compounds play critical roles in DNA replication, maintenance, and repair, and also serve as antioxidant and anti-inflammatory agents. Sufficient dietary intakes support genomic stability and preserve health.
Objective To investigate the associations of dietary patterns, including intakes of essential nutrients and added sugar, and diet quality scores of established and new nutrient indices with epigenetic age in a diverse cohort of Black and White women at midlife.
Design, Setting, and Participants This cross-sectional study included analyses (2021-2023) of past women participants of the 1987-1997 National Heart, Lung, and Blood Institute Growth and Health Study (NGHS), which examined cardiovascular health in a community cohort of Black and White females aged between 9 and 19 years. Of these participants who were recruited between 2015 and 2019 from NGHS’s California site, 342 females had valid completed diet and epigenetic assessments. The data were analyzed from October 2021 to November 2023.
Exposure Diet quality scores of established nutrient indices (Alternate Mediterranean Diet [aMED], Alternate Healthy Eating Index [AHEI]–2010); scores for a novel, a priori–developed Epigenetic Nutrient Index [ENI]; and mean added sugar intake amounts were derived from 3-day food records.
Main Outcomes and Measures GrimAge2, a second-generation epigenetic clock marker, was calculated from salivary DNA. Hypotheses were formulated after data collection. Healthier diet indicators were hypothesized to be associated with younger epigenetic age.
Results A total of 342 women composed the analytic sample (mean [SD] age, 39.2 [1.1] years; 171 [50.0%] Black and 171 [50.0%] White participants). In fully adjusted models, aMED (β, −0.41; 95% CI, −0.69 to −0.13), AHEI-2010 (β, −0.05; 95% CI, −0.08 to −0.01), and ENI (β, −0.17; 95% CI, −0.29 to −0.06) scores, and added sugar intake (β, 0.02; 95% CI, 0.01-0.04) were each significantly associated with GrimAge2 in expected directions. In combined analyses, the aforementioned results with GrimAge2 were preserved with the association estimates for aMED and added sugar intake retaining their statistical significance.
Conclusions and Relevance In this cross-sectional study, independent associations were observed for both healthy diet and added sugar intake with epigenetic age. To our knowledge, these are among the first findings to demonstrate associations between added sugar intake and epigenetic aging using second-generation epigenetic clocks and one of the first to extend analyses to a diverse population of Black and White women at midlife. Promoting diets aligned with chronic disease prevention recommendations and replete with antioxidant or anti-inflammatory and pro-epigenetic health nutrients while emphasizing low added sugar consumption may support slower cellular aging relative to chronological age, although longitudinal analyses are needed.
Epigenetic clocks powerfully predict biological age independent of chronological age. These clocks reflect altered gene and protein expression patterns, particularly those resulting from differential DNA methylation (DNAm) at CpG (5′-C-phosphate-G-3′) sites. DNAm that accumulates over time is a testament to the toll social, behavioral, and environmental forces can have on the body. 1 - 3 These alterations often result in pathogenic processes (eg, genomic instability, systemic inflammation, and oxidative stress) characteristic of aging and chronic disease. 1 , 4 , 5 As such, myriad clocks reflecting epigenetic age have been developed for a range of age- or disease-related targets. 4 , 6 The GrimAge series contains second-generation markers of epigenetic aging that account for clinical and functional biomarkers, and is most notable for its robust associations with human mortality and morbidity risk, including time to death and comorbidity counts. 6 , 7 The recently developed version 2 of the GrimAge clock (hereafter, GrimAge2) improved on the first’s predictive abilities and confirmed its applicability for people at midlife and of different racial and ethnic backgrounds. 1 , 6
Epigenetic changes are modifiable and efforts to counter epigenetic alteration in humans have centered on lifestyle factors including diet, inspiring concepts of an “epigenetic diet” and “nutriepigenetics.” 8 , 9 So far, 2 epidemiological studies have found inverse associations between higher diet quality and slower epigenetic aging using clock measures related to mortality, including the first version of GrimAge. 7 , 10 In those studies, diet measures were reflective of healthy dietary patterns (eg, the Dietary Approaches to Stop Hypertension [DASH] diet, the Alternate Mediterranean Diet [aMED] score) emphasizing consumption of fruits, vegetables, whole grains, nuts and seeds, and legumes. 8 , 11 For example, the Mediterranean-style diet is largely plant-based with emphasis on extra virgin olive oil and seafood. This makes it replete with bioactive nutrients and phytotherapeutic compounds and low in highly processed, high fat, and nutrient-poor foods, a mixture hypothesized to be protective against low-grade chronic inflammation (“inflammaging”), oxidative stress, intracellular and extracellular waste accumulation, and disrupted intracellular signaling and protein-protein interactions. Thus, such a pattern is likely effective in preventing and reversing the epigenetic changes and pathogenic processes associated with aging, disease, and decline. 4 , 8 , 12 - 14
Dietary Reference Intakes (DRIs) are an established set of nutrient-specific reference values determined by experts that guide population intakes for adequacy and toxic effects. 15 Recent thinking, however, suggests that diets may not always adequately supply nutrients and other bioactives, particularly relative to the amounts necessary to fully condition gene expression or counteract epigenetic alterations to ensure optimal physiological metabolism. 8 Macronutrients and micronutrients play crucial roles in DNA replication, damage prevention, and repair, whereas nutrient deficiencies (and excesses) can cause genomic damage to the same degree as physical or chemical exposures. 16 Given that (1) progenome effects of some micronutrients have been observed at different and higher levels than the established DRIs and (2) determination of DRIs does not solely consider genomic stability (ie, lesser susceptibility to genomic alterations), experts have called for refining the DRIs to be better aligned for genomic health maintenance. 14 , 16 - 18 Diet quality inventories, such as those for Mediterranean-style diets, have not generally incorporated DRIs, although such considerations could clarify how food-based indices compare against requirements for related nutrients (eg, those with epigenetic properties) and refine epidemiological and intervention efforts. Accordingly, for this study, a novel nutrient index theoretically associated with epigenetic health was created and its associations with epigenetic aging were tested alongside established diet quality indices.
To date, nutriepigenetic work has mostly involved older White populations and focused on healthy dietary aspects. It is therefore important to examine the associations between nutrition and epigenetic aging in more diverse samples and to better understand what specific dietary aspects could be underlying the observed associations. Nutrients with established epigenetic action should be examined, especially considering intakes relative to amounts set forth in the DRIs and nutritional recommendations. Similarly, sugar is an established pro-inflammatory and oxidative agent that has been implicated in cancer as well as cardiometabolic diseases. 19 - 21 However, in diet quality indices often studied in the epigenetic context (eg, the aMED), sugar is noticeably unaccounted for, and it has also yet to be examined alone. Given the high consumption of sugar globally and the demographic variations within, 22 - 24 elucidating this association could motivate future dietary interventions and guidelines as well as health disparities research. This study sought to examine associations of diet with GrimAge2 in a midlife cohort comprising Black and White US women. The central hypothesis was that indicators of a healthier diet may be associated with decelerated epigenetic aging, and added sugar intake with accelerated aging.
This cross-sectional study used data from the original National Heart, Lung, and Blood Institute (NHLBI) Growth and Health Study (NGHS) (1987-1999) and its follow-up (2015-2019), which studied a cohort of Black and White females aged from 9 or 10 years into midlife (age 36-43 years), examining cardiometabolic health and related determinants. The participants were recruited based on biological female sex at age 9 or 10 years. The follow-up study re-recruited women from the California site. 25 , 26 Participants (and/or their parent[s] or guardian[s]) provided demographic data and completed online or paper surveys and new assessments. Participants received remuneration and provided informed consent. The institutional review board of the University of California, Berkeley, approved all study protocols. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology ( STROBE ) reporting guideline.
For inclusion in current analyses, the participants needed valid diet records and epigenetic data at midlife along with age and race and ethnicity information (participant self-reported); after excluding 5 women with epigenetic data quality issues, 342 individuals were included in the analytic sample. Complete case analyses were done. Among the 624 women who were followed up, the women composing the analytic sample were younger (39.2 years vs 39.9 years; P < .001) and had greater body mass index (BMI, calculated as weight in kilograms divided by height in meters squared) compared with women without complete diet and epigenetic data (32.5 vs 30.7; P = .02) ( Table 1 ). No differences were otherwise observed.
Participants provided saliva samples used for DNAm analyses performed by the University of California, Los Angeles Neuroscience Genomics Core (UNGC) of the Semel Institute for Neuroscience and Human Behavior using the Infinium HumanMethylation450 BeadChip platform (Illumina, Inc). DNAm profiles were generated by Horvath’s online calculator, 27 which provided (1) estimates of epigenetic age based on GrimAge2 estimation methods; and (2) assessments of data quality (again, 5 observations did not pass quality checks). GrimAge2 uses Cox proportional hazards regression models that regress time to death (due to all-cause mortality) on DNAm-based surrogates of plasma proteins, a DNAm-based estimator of smoking pack-years, age, and female sex. It was updated from GrimAge, version 1 6 by including 2 new DNAm-based estimators of plasma proteins—high-sensitivity C-reactive protein (logCRP) and hemoglobin A 1c (log A 1c )—beyond the original 7. Linear transformation of results from these models allows GrimAge2 to be taken as an epigenetic age estimate (in years). Further information can be accessed from studies on DNA treatment and isolation and advanced analysis options for generating output files 28 or GrimAge2. 1
The participants were instructed by the NGHS study staff to self-complete a 3-day food record at follow-up for 3 nonconsecutive days. 29 Data were entered into and analyzed by the Nutrition Data System for Research (NDSR) software, version 2018 (University of Minnesota Nutrition Coordinating Center).
Mean nutrient and food intakes were calculated across valid food records for each woman based on the NDSR 2018 output. These values were used to calculate the scores of 2 overall diet quality nutrient indices (aMED and the Alternate Healthy Eating Index [AHEI]–2010) and a novel index (Epigenetic Nutrient Index [ENI]) score as described below. The aMED (Mediterranean-style diet) followed published scoring methodology 30 reflecting the degree of adherence to 9 components of an anti-inflammatory, antioxidant-rich diet. The AHEI-2010 was assessed following published scoring instructions 31 and reflects the degree of adherence to 11 dietary components associated with decreased risk for chronic disease.
This study developed a novel nutrient index (ENI) after the Mediterranean-style diet, but via a nutrient-based approach rather than a food-based one. Nutrient selection was done a priori based on antioxidant and/or anti-inflammatory capacities as well as roles in DNA maintenance and repair documented in the literature. 16 , 32 , 33 Scores can range from 0 to 24, with higher scores reflecting higher DRI adherence ( Table 2 ). 34 The internal consistency of the ENI was acceptable (Cronbach α = 0.79). The ENI also demonstrated convergent validity with r = 0.51 ENI-aMED correlation as well as higher ENI scores in women from childhood households with higher annual incomes (13.9 vs 11.7, for ≥$40 000/y vs <$10 000/y, respectively) and parental educational attainment (14.7 vs 12.3, for ≥college graduate vs < high school graduate, respectively), corresponding to the literature. 36 Pearson correlations between the ENI and diet scores and added sugar intake were also calculated. The ENI score was moderately correlated with the AHEI-2010 score ( r = 0.44) but not correlated with added sugar intake. The aMED and AHEI-2010 scores were highly correlated at r = 0.73. Added sugar intake had moderate correlation with the AHEI-2010 score ( r = −0.44) and low correlation with the aMED score ( r = −0.28).
Added sugar intake was calculated as the mean across valid food records using NDSR output. The NDSR defines added sugar intake as the total sugar added to foods (eg, as syrups and sugars) during food preparation and commercial food processing. Monosaccharides and disaccharides naturally occurring in foods are not included. 35
To maximize internal validity and minimize confounding, several covariates were included. Age and sample batch were controlled for as well as naive CD8 and CD8pCD28nCD45Ran memory and effector T-cell counts, thus accounting for normal cell count variation. To control for baseline factors and their potential influence on diet and epigenetic age over time, the following parameters assessed at age 9 or 10 years (mostly parent or caregiver reported) were further adjusted for annual household income, highest parental educational attainment, number of parents in household, and number of siblings. Additionally, self-reported race (Black or White) as well as the current health and lifestyle factors of self-reported chronic conditions (yes to any of the following ever: cancer, diabetes [including gestational, prediabetes], hypertension, or hypercholesterolemia) or medication use (currently yes for any of the following conditions: diabetes, hypertension, hypercholesterolemia, or thyroid), BMI (measured), having ever smoked (yes or no), and mean daily total energy intake (as higher diet quality scores might result from higher energy intake) 37 were also included.
Descriptive analyses provided summary statistics. Linear regression models estimated unadjusted and adjusted cross-sectional associations between each of the 4 dietary exposures with GrimAge2. Per expert recommendations, unadjusted models controlled for women’s current age, sample batch, and both naive CD8 and CD8pCD28nCD45Ran memory and effector T-cell counts. Adjusted models controlled for those variables in addition to relevant sociodemographic and health behavior–related covariates already listed. To examine the association between healthy diet measures together with added sugar intake and GrimAge2, aMED, AHEI-2010, and ENI scores were each separately put into the same fully adjusted multivariable linear regression model. The threshold for statistical significance was 2-tailed (α = .05) and all statistical analyses were conducted from October 2021 to November 2023 with Stata15 SE, version 15.1 (StataCorp LLC).
The analytic sample of this study comprised 342 women (mean [SD] age at follow-up, 39.2 [1.1] years; 171 [50.0%] Black and 171 [50.0%] White participants; mean [SD] BMI, 32.5 [10.0]; 150 [43.9%] ever smokers; 164 [48.0%] ever diagnosed with a chronic condition; and 58 [17.0%] currently taking medication) ( Table 1 ). The participants were well distributed across socioeconomic status categories at baseline (9-10 years old). The participants presented with low to moderate levels of diet quality; the mean (SD) scores were 3.9 (1.9) (possible range, 0-9) on the anti-inflammatory, antioxidant Mediterranean-style pattern (aMED); 55.4 (14.7) (possible range, 0-110) on the AHEI-2010 for chronic disease risk; and 13.5 (5.0) (possible range, 0-24) on the ENI for intakes of epigenetic-relevant nutrients relative to DRIs. The participants also reported mean (SD) daily added sugar intake of 61.5 (44.6) g, although the score range was large (2.7-316.5 g).
Table 3 provides the overall unadjusted and adjusted associations between each dietary exposure of interest and GrimAge2 resulting from multivariable linear regression models. In both unadjusted and adjusted models, all dietary exposures were statistically and significantly associated with GrimAge2 in the hypothesized, anticipated direction. In adjusted models, the associations observed for each dietary exposure were slightly attenuated. Each unit increase in the scores was associated with year changes in GrimAge2, as follows: aMED (β, −0.41; 95% CI, −0.69 to −0.13), AHEI-2010 (β, −0.05; 95% CI, −0.08 to −0.01), and ENI (β, −0.17; 95% CI, −0.29 to −0.06), indicating that healthier diets were associated with decelerated epigenetic aging. Each gram increase in added sugar intake was associated with a 0.02 (95% CI, 0.01 to 0.04) increase in GrimAge2, reflecting accelerated epigenetic aging.
Table 4 illustrates the associations of healthy diet measures (aMED, AHEI-2010, and ENI scores) and added sugar intake with epigenetic aging and gives the adjusted results for each healthy diet measure and added sugar intake with GrimAge2 in the context of each other. In all instances, healthier diet measures and added sugar intake appeared to maintain their independent associations with GrimAge2 in the expected directions. Associations were statistically significant for added sugar intake in all models as well as for aMED scores; 95% CIs were more imprecise for AHEI-2010 and ENI scores.
The findings of this cross-sectional study are among the first, to our knowledge, to demonstrate the association of added sugar intake with an epigenetic clock. Further, to our knowledge, it is the first study to examine the associations of diet with GrimAge2 and extend the applicability of such results to a cohort of Black and White women at midlife. As hypothesized, measures of healthy dietary patterns (aMED, AHEI-2010 scores), and high intakes of nutrients theoretically related to epigenetics (ENI) were associated with younger epigenetic age, while a higher intake of added sugar was associated with older epigenetic age. Additionally, this study examined indicators of healthy and less healthy diets in the same model, allowing simultaneous evaluation of each in the presence of the other. Although the magnitudes of associations were diminished and some 95% CIs became wider, their statistical significance generally persisted, supporting the existence of independent epigenetic associations of both healthy and less healthy diet measures. This approach is informative, as dietary components are often examined singularly or in indices, which can lead to erroneous conclusions if key contextual dietary components are not accounted for or are obscured. From these findings, even in healthy dietary contexts, added sugar still has detrimental associations with epigenetic age. Similarly, despite higher added sugar intake, healthier dietary intakes appear to remain generally associated with younger epigenetic age.
The number of published nutriepigenetic studies, particularly on examining second-generation epigenetic clock markers, is still relatively small. However, the results of the present study are consistent with the literature. Two other studies 7 , 10 have examined GrimAge1-associated outcomes and found higher diet quality scores, including the DASH and aMED, were associated with slower epigenetic aging. However, those studies were limited to older (>50 years) and White populations, limiting their demographic generalizability. Analyses of epigenetic aging and added sugar intake are new, but findings are consistent with the larger body of epidemiological work that has drawn connections between added sugar intake and cardiometabolic disease, 19 , 20 perhaps suggesting a potential mechanism underlying such observations. Granted, point and 95% CI estimates for the added sugar–GrimAge2 associations were close to zero, suggesting a smaller role for added sugar compared with healthy dietary measures; however, more studies are needed. Nevertheless, their statistical significance was persistent.
Nutrient-based inventories can provide epidemiological contributions for genomic health studies. The idea of epigenetically critical nutrients is important for 2 reasons. First, it supports the notion that epigenetic nutrient intakes above DRI levels could boost epigenetic preservation and potentially motivate updates to nutritional guidelines, an outcome advocated for by nutriepigenetic experts. 16 - 18 In the novel ENI constructed for the present study, points were awarded based on comparisons of average daily intakes with: (1) estimated average requirements, or the requirement considered adequate for half of the healthy individuals in a population, and (2) recommended dietary allowances or adequate intakes, or where 97% to 98% or essentially all of a population’s healthy individuals’ requirements for a nutrient are met. 15 Future iterations could test varying ENI scoring parameters relative to DRIs for epigenetic benefit. Second, taking a nutrient approach suggests that any dietary pattern rich in vitamins, minerals, and other bioactives could be useful for preserving epigenetic health. This is helpful because dietary patterns are socioculturally influenced, but a nutrient focus rather than a focus on foods could help bridge cultures, class, and geography. 9 The Okinawan diet, for example, is nutritionally similar to the Mediterranean-style diet but more aligned to Asian tastes. 38 In general, the sociodemographic determinants of diet should not be discounted. Across the US population, for instance, it is known that overall diet quality is mediocre and relatively low while added sugar intake is considerably high, as also observed in the sample of the present study. However, specific nutrient intakes will vary based on the particulars of dietary patterns. 22 , 36 As dietetics and medicine progresses into the era of personalized nutrition and personalized medicine, the role of social factors including diet will be important to consider in epigenetic studies and could figure prominently in work on health disparities.
Strengths of this study are its inclusion of a diverse group of women as well as use of robust measures of diet and DNAm. It was also possible to control for several potential sociodemographic confounders.
This study also has limitations. As a cross-sectional study, it is not possible to infer causality without temporality, and therefore longitudinal studies are needed. Additionally, diet was self-reported via 3-day food records, which may lead to underestimates and overestimates of intakes depending on the nutrient. Therefore, augmenting dietary assessment with food frequency questionnaires and/or biomarkers could be helpful. 39 Also, other nutrients with pro-epigenetic properties were not included in the current ENI. Still, the Cronbach α for this first ENI version was acceptable at 0.79 and it demonstrated good convergent validity with customary socioeconomic and demographic characteristics. The tolerable upper intake levels of the DRIs were not considered in constructing the ENI. Future work should assess the prevalence of intakes beyond upper limits to assess whether toxicity could be a concern.
To our knowledge, the findings of this cross-sectional study are among the first to find associations between indicators of healthy diet as well as added sugar intake and second-generation epigenetic aging markers and one of the first to include a cohort of Black women. Higher diet quality and higher consumption of antioxidants or anti-inflammatory nutrients were associated with younger epigenetic age, whereas higher consumption of added sugar was associated with older epigenetic age. Promotion of healthy diets aligned with chronic disease prevention and decreased added sugar consumption may support slower cellular aging relative to chronological age, although longitudinal analyses are needed.
Accepted for Publication: April 29, 2024.
Published: July 29, 2024. doi:10.1001/jamanetworkopen.2024.22749
Open Access: This is an open access article distributed under the terms of the CC-BY License . © 2024 Chiu DT et al. JAMA Network Open .
Corresponding Author: Dorothy T. Chiu, PhD, Osher Center for Integrative Health, University of California, San Francisco, 1545 Divisadero St, #301D, San Francisco, CA 94115 ( [email protected] ); Barbara A. Laraia, PhD, MPH, RD, Community Health Sciences Division, School of Public Health, University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA 94720 ( [email protected] ).
Author Contributions: Drs Chiu and Laraia had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Epel and Laraia share co–senior authorship on this article.
Concept and design: Chiu, Hamlat, Epel, Laraia.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Chiu, Hamlat, Laraia.
Critical review of the manuscript for important intellectual content: Hamlat, Zhang, Epel, Laraia.
Statistical analysis: Chiu, Hamlat, Zhang.
Obtained funding: Epel, Laraia.
Administrative, technical, or material support: Chiu, Laraia.
Supervision: Epel, Laraia.
Conflict of Interest Disclosures: Dr Chiu reported receiving support from grants from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD); the National Heart, Lung, and Blood Institute (NHLBI); the National Institute on Aging (NIA); and the National Center for Complementary and Integrative Health (NCCIH) during the conduct of the study. Dr Hamlat reported receiving grants from the National Institutes of Health (NIH) during the conduct of the study. Dr Laraia reported receiving grants from NIH NICHD during the conduct of the study. No other disclosures were reported.
Funding/Support: The research reported in this publication was supported by grant R01HD073568 from the Eunice Kennedy Shriver NICHD (Drs Laraia and Epel, principal investigators [PIs]); grant R56HL141878 from the NHLBI; and grants R56AG059677 and R01AG059677 from the NIA (both for Drs Epel and Laraia, PIs). The participation of Dr Chiu was supported by the University of California, San Francisco Osher Center research training fellowship program under grant T32AT003997 from NCCIH.
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Data Sharing Statement: See the Supplement .
Additional Contributions: We recognize the past and present NHLBI Growth and Health Study (NGHS) staff for their talents and dedication, without which the study and these analyses would not have been possible. We also thank the Nutrition Policy Institute for providing consultation and support with historical study data. Additionally, we express immense gratitude to Ake T. Lu, PhD, and Steve Horvath, PhD, now of Altos Labs, for their epigenetic clock expertise and consultation. Neither was financially compensated for their contributions beyond their usual salary. Of note, we thank the NGHS participants for their time and efforts over the years.
17 Accesses
Explore all metrics
The expansion of social media has unlocked a real-time barometer of public opinion. This paper introduces a novel framework to analyze sentiment shifts in social network comment sections, a reflection of the broader public discourse over time. Leveraging a pre-trained uncased \(RoBERTa_{large}\) model, we predict emotional scores from user comments, mapping these to key sentiment trends such as Approval, Toxicity, Obscenity, Threat, Hate, Offensive, and Neutral. Our methodology employs machine learning techniques to train a dataset that connects emotional scores with these trends, generating trend probability scores. We utilize a bottom-up recursive algorithm to aggregate emotional scores within comment threads, enabling the prediction of trend scores using three distinct aggregation methods. The results demonstrate that our emotional prediction model achieves an AUC of 0.92, and XGBoost stands out with an F1 score exceeding 0.40. Our research elucidates the temporal evolution of online public sentiment, enhancing the understanding of digital social dynamics and offering insights for strategic online interaction, intervention, and content moderation.
This is a preview of subscription content, log in via an institution to check access.
Subscribe and save.
Price includes VAT (Russian Federation)
Instant access to the full article PDF.
Rent this article via DeepDyve
Institutional subscriptions
Explore related subjects.
No datasets were generated or analysed during the current study.
Anusha PV, Anuradha C, Murty PSC, Kiran CS (2019) Detecting outliers in high dimensional data sets using z-score methodology. Int J Innovat Technol Explor Eng 9(1):48–53
Article Google Scholar
Atagün E, Hartoka B, Albayrak A (2021) Topic modeling using LDA and bert techniques: Teknofest example. In: 2021 6th International conference on computer science and engineering (UBMK), pp 660–664. IEEE
Backstrom L, Kleinberg J, Lee L, Danescu-Niculescu-Mizil C (2018) Characterizing and curating conversation threads: expansion, focus, volume, re-entry
Blackburn J, Kwak H (2014) STFU NOOB! Predicting crowdsourced decisions on toxic behavior in online games
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8
Bollen J, Mao H, Pepe A (2011) Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: Proceedings of the international AAAI conference on web and social media, vol 5, pp 450–453
Chang JS, Danescu-Niculescu-Mizil C (2019) Trouble on the Horizon: forecasting the derailment of online conversations as they develop. https://doi.org/10.48550/ARXIV.1909.01362
cjadams J.E.L.D.M.M.n.W.C. Jeffrey Sorensen: toxic comment classification challenge. Kaggle. (2017) https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge
Coletto M, Garimella K, Gionis A, Lucchese C (2017) Automatic controversy detection in social media: a content-independent motif-based approach. Online Social Network Media. https://doi.org/10.1016/J.OSNEM.2017.10.001
Dash CSK, Behera AK, Dehuri S, Ghosh A (2023) An outliers detection and elimination framework in classification task of data mining. Decision Anal J 6:100164
Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language
Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th international AAAI conference on Web and Social Media. ICWSM ’17, pp 512–515
Demszky D, Movshovitz-Attias D, Ko J, Cowen A, Nemade G, Ravi S (2020) Goemotions: a dataset of fine-grained emotions. arXiv preprint arXiv:2005.00547
FasterCaptial S (2017) Z-Scores and their significance. Figshare. Dataset
Fortuna P, Nunes S (2018) A survey on automatic detection of hate speech in text. https://doi.org/10.1145/3232676
Founta A-M, Djouvas C, Chatzakou D, Leontiadis I, Blackburn J, Stringhini G, Vakali A, Sirivianos M, Kourtellis N (2018) Large scale crowdsourcing and characterization of Twitter abusive behavior
General Data Protection Regulation (GDPR). (2021) https://gdpr-info.eu/ . Accessed 12 Feb 2021
Guide to Protecting the Confidentiality of Personally Identifiable Information (PII). (2021) https://tinyurl.com/ylyjst5y . Accessed 12 Feb 2021
Hessel J, Lee L (2019) Something’s Brewing! Early prediction of controversy-causing posts from discussion features. https://doi.org/10.18653/V1/N19-1166
Hossain I, Puppala S, Alam MJ, Talukder S (2023) Monitoring dynamics of emotional sentiment in social network commentaries
JCharisTech Neattext: a python library for cleaning and pre-processing textual data. https://blog.jcharistech.com/neattext/ . Accessed 1 Jan 2024
Jigsaw Alphabet Inc.: Perspective API Research. https://perspectiveapi.com/research/ . Accessed 1 Jan 2024
Jurgens D, Hemphill L, Chandrasekharan E (2019) A just and comprehensive strategy for using NLP to address online abuse. https://doi.org/10.18653/V1/P19-1357
Kumari HV, Suresh D, Dhananjaya P (2022) Clinical data analysis and multilabel classification for prediction of dengue fever by tuning hyperparameter using gridsearchcv. In: 2022 14th International conference on computational intelligence and communication networks (CICN), pp 302–307. IEEE
Lee SY, Ryu MH (2019) Exploring characteristics of online news comments and commenters with machine learning approaches. Telemat Inform 43:101249
Mathew B, Saha P, Yimam SM, Biemann C, Goyal P, Mukherjee A (2021) Hatexplain: a benchmark dataset for explainable hate speech detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 14867–14875
Mohammad SM, Kiritchenko S (2018) Understanding emotions: a dataset of tweets to study interactions between affect categories. In: International conference on language resources and evaluation
Mondal M, Silva LA, Benevenuto F (2017) A measurement study of hate speech in social media. https://doi.org/10.1145/3078714.3078723
Mosbach M, Andriushchenko M, Klakow D (2020) On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines. arXiv preprint arXiv:2006.04884
Niculae V, Danescu-Niculescu-Mizil C (2016) Conversational markers of constructive discussions
Oh YW, Park CH (2021) Machine cleaning of online opinion spam: developing a machine-learning algorithm for detecting deceptive comments. Am Behav Sci 65(2):389–403
Pennycook G, Bear A, Collins ET, Rand DG (2020) The implied truth effect: attaching warnings to a subset of fake news headlines increases perceived accuracy of headlines without warnings. Manag Sci 66(11):4944–4957
Pota M, Ventura M, Fujita H, Esposito M (2021) Multilingual evaluation of pre-processing for bert-based sentiment analysis of tweets. Exp Syst Appl 181:115119
Python Package Index: Neattext—text pre-processing and cleaning in python. https://pypi.org/project/neattext/ . Accessed 1 Jan 2024
Röttger P, Vidgen B, Nguyen D, Waseem Z, Margetts H, Pierrehumbert JB (2020) Hatecheck: functional tests for hate speech detection models. arXiv preprint arXiv:2012.15606
Saveski M, Roy B, Roy D (2021) The structure of toxic conversations on twitter. In: Proceedings of the web conference 2021, pp 1086–1097
Seo S (2006) A review and comparison of methods for detecting outliers in univariate data sets. Ph.D. thesis, University of Pittsburgh
Sharma HK, Singh T, Kshitiz K, Singh H, Kukreja P (2017) Detecting hate speech and insults on social commentary using NLP and machine learning. Int J Eng Technol Sci Res 4(12):279–285
Google Scholar
Shugars S, Beauchamp N (2019) Why keep arguing? predicting engagement in political conversations online:. SAGE Open https://doi.org/10.1177/2158244019828850
Talukder Z, Islam MA (2022) Computationally efficient auto-weighted aggregation for heterogeneous federated learning. In: 2022 IEEE international conference on edge computing and communications (EDGE), pp 12–22. IEEE
Vidhya A (2021) Cleaning and pre-processing textual data with Neattext library. https://www.analyticsvidhya.com/blog/2021/10/cleaning-and-pre-processing-textual-data-with-neattext-library/ . Accessed 1 Jan 2024
Wang L, Cardie C (2016) A piece of my mind: a sentiment analysis approach for online dispute detection
Wulczyn E, Thain N, Dixon L (2017) Ex Machina: Personal attacks seen at scale. https://doi.org/10.1145/3038912.3052591
Wulczyn E, Thain N, Dixon L (2017) Ex machina: personal attacks seen at scale. In: Proceedings of the 26th international conference on World Wide Web, pp 1391–1399
Yao M, Chelmis C, Zois D-S (2019) Cyberbullying ends here: towards robust detection of cyberbullying in social. Media doi. https://doi.org/10.1145/3308558.3313462
Zhang J, Chang J, Danescu-Niculescu-Mizil C, Dixon L, Hua Y, Thain N, Taraborelli D (2018) Conversations gone awry: detecting early signs of conversational failure
Zhang J, Danescu-Niculescu-Mizil C, Sauper C, Taylor SJ (2018) Characterizing online public discussions through patterns of participant interactions. https://doi.org/10.1145/3274467
Zhang T, Wu F, Katiyar A, Weinberger KQ, Artzi Y (2020) Revisiting few-sample bert fine-tuning. arXiv preprint arXiv:2006.05987
Zhao F, Li X, Gao Y, Li Y, Feng Z, Zhang C (2022) Multi-layer features ablation of bert model and its application in stock trend prediction. Exp Syst Appl 207:117958
Download references
This research was supported by NSF Grant CNS-2153482.
All authors have contributed equally to this work.
Department of Computer Science, The University of Texas at El Paso, 1801 Hawthorne St., El Paso, TX, 79902, USA
Ismail Hossain, Md. Jahangir Alam & Sajedul Talukder
School of Computing, Southern Illinois University Carbondale, 1230 Lincoln Dr., Carbondale, IL, 62901, USA
Sai Puppala
You can also search for this author in PubMed Google Scholar
This research represents a collaborative effort where each author has significantly contributed to the development and execution of the work presented: Ismail Hossain (I.H.) and Sai Puppala (S.P.): These authors contributed equally to this work. I.H. and S.P. were instrumental in the conceptualization and design of the study. They focused on the development of the methodology and played a leading role in the analysis of emotional sentiment dynamics within social network commentaries. Both authors also contributed to the writing and editing of the manuscript, ensuring the clarity and coherence of the presentation. Md Jahangir Alam (M.J.A.): Contributed to both the data collection and the development of the analytical framework for sentiment analysis. M.J.A. was heavily involved in the preprocessing of data and conducted extensive tests on new datasets, contributing to the substantial expansion of the research findings. He also assisted in drafting and revising the manuscript, providing critical insights into the interpretation of results. Sajedul Talukder (S.T.): As the corresponding author, S.T. oversaw the entire project, ensuring the research aligned with the objectives and the manuscript met publication standards. He was responsible for project coordination, acquisition of funding, and provided guidance on the overall research direction. S.T. contributed to the refinement of the research methodology, analysis of results, and played a pivotal role in manuscript revision, focusing on the integration of feedback and enhancement of the manuscript's overall quality. All authors have reviewed the manuscript, contributed to its critical revision for important intellectual content, and approved the final version to be published. Each author agrees to be accountable for all aspects of the work, ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Correspondence to Sajedul Talukder .
Competing interests.
The authors declare no competing interests.
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions
Hossain, I., Puppala, S., Alam, M.J. et al. A visual approach to tracking emotional sentiment dynamics in social network commentaries. Soc. Netw. Anal. Min. 14 , 182 (2024). https://doi.org/10.1007/s13278-024-01332-8
Download citation
Received : 01 March 2024
Revised : 01 August 2024
Accepted : 05 August 2024
Published : 05 September 2024
DOI : https://doi.org/10.1007/s13278-024-01332-8
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
Written by Media Matters Staff
Published 09/13/24 12:57 PM EDT
Two claims from Family Research Council President Tony Perkins in this clip are misleading.
First, on IVF, Perkins denies that women have been forced “to travel out of state for in vitro fertilization treatments.” There have been reports to the contrary after Alabama's Supreme Court ruling that imperiled IVF. From CNN :
Yesterday, Goidel was days away from having her eggs retrieved at an Alabama fertility clinic, after three miscarriages and more than a $20,000 investment in a grueling in vitro fertilization journey. Now, she and her husband are packing for a flight to Texas tonight, in hopes of salvaging their shot at a successful pregnancy. After the Alabama Supreme Court ruled last week that frozen embryos are considered human beings and those who destroy them can be held liable for wrongful death, fertility clinics throughout the state began pausing IVF treatments out of fear of legal prosecution. Goidel said her provider, Alabama Fertility Specialists, called her Thursday morning and told her because she is so far along in the IVF process, the clinic was still willing to retrieve her eggs – but could not make any guarantees about whether they would be able to use them to make embryos, store or ship them.
At least one IVF clinic in Alabama has announced plans to shutter due to litigation concerns.
Second, Perkins claims that he is not aware that “that pro-life laws in various states are prohibiting doctors for treating women who show up at a hospital because of a miscarriage.”
Perkins should read more news articles, because this has been widely reported. From The Associated Press :
More than 100 pregnant women in medical distress who sought help from emergency rooms were turned away or negligently treated since 2022, an Associated Press analysis of federal hospital investigations found. Two women — one in Florida and one in Texas — were left to miscarry in public restrooms. In Arkansas, a woman went into septic shock and her fetus died after an emergency room sent her home. At least four other women with ectopic pregnancies had trouble getting treatment, including one in California who needed a blood transfusion after she sat for nine hours in an emergency waiting room. ... In Texas, where doctors face up to 99 years of prison if convicted of performing an illegal abortion, medical and legal experts say the law is complicating decision-making around emergency pregnancy care. Although the state law says termination of ectopic pregnancies isn’t considered abortion, the draconian penalties scare Texas doctors from treating those patients, the Center for Reproductive Rights argues. “As fearful as hospitals and doctors are of running afoul of these state abortion bans, they also need to be concerned about running afoul of federal law,” said Marc Hearron, a center attorney. Hospitals face a federal investigation, hefty penalties and threats to their Medicare funding if they violate the federal law.
The Family Research Council is a Project 2025 partner .
Citation From the September 12, 2024, edition of Family Research Council’s Washington Watch
TONY PERKINS (HOST): In Tuesday night's presidential debate, Vice President Kamala Harris gave a rambling and incoherent statement that seemed to imply the Supreme Court's overturning of Roe v. Wade had forced women to travel out of state for in vitro fertilization treatments.
PERKINS: It remains unclear what the vice president meant by that political word salad. I mean, IVF is legal in all 50 states, and I'm not sure where the plane and the strangers came in.
Well, joining me now to discuss this is Dr. Marguerite Duane. She is a board-certified family physician and the executive director of FACTS, the Fertility Appreciation Collaborative to Teach — Collaboration to Teach the Science, an organization dedicated to educating health care professionals and students about the scientifically valid natural-based family planning methods. Dr. Duane, welcome to Washington Watch .
MARGUERITE DUANE: Thank you so much for having me.
PERKINS: Now let me ask you, are you aware of any place in the country where IVF is not allowed?
DUANE: No. I'm not aware that IVF is not available throughout the country. My understanding, again, I'm not a lawyer. I'm a physician. But to my knowledge, IVF is legal. But where women do need to travel extensively is to seek physicians who are trained to provide comprehensive reproductive health care through a restorative lens, one that's really designed to treat the underlying causes of infertility.
And in fact, I've had patients drive four to six hours to see me to receive the kind of care that I'm trained to provide, and we currently train physicians across the country to provide. Again, care that is real women's health care that seeks to identify underlying causes of infertility and treat those through a restorative reproductive approach.
PERKINS: And is also respectful of human life.
PERKINS: You mentioned the miscarriages and the vice president, her comments there suggesting that pro-life laws in various states are prohibiting doctors for treating women who show up at a hospital because of a miscarriage. Again, I'm not aware of that.
DUANE: And it's simply not true. And I can tell you, as a physician who cares for patients who regularly experience miscarriage, we are trained to provide both medical and surgical treatments to treat miscarriage. Now the difference between a miscarriage and an abortion is with miscarriage, the embryo has already passed. The heart has stopped beating. The child is no longer alive.
Things you buy through our links may earn Vox Media a commission.
Fox News and MSNBC are united after the debate: Donald Trump lost and Kamala Harris won. Stretching well past the scheduled 90 minutes, the first and possibly only debate between the candidates, which aired on ABC, featured the Republican candidate almost immediately straying from his campaign’s playbook for the face-off, getting angry and increasingly personal — twice even hushing the first woman to serve as vice-president. If that weren’t enough, he repeated a bizarre lie about migrants eating pets in Ohio, among numerous other tangents. Democrats are ecstatic about Harris’s performance, while Republicans were left wincing at their man’s defensive posture and blaming the moderators for being too hard on him.
• Gabriel Debenedetti on the success of Kamala Harris’s debate strategy . • Nia Prater on whether there will be a second debate . • Jonathan Chait on how Trump was sabotaged by the online right . • Photos and anonymous overheard comments from the New York Young Republican Club’s debate watch party. • The Cut’s Laura Bassett on how Harris out-alphaed Trump . • Jonathan Chait on the contrast Harris was able to draw with her Trump-baiting. • Margaret Hartmann on Trump’s pet-eating tangent . • Ed Kilgore on Trump’s torrent of denials .
Below is a reverse chronological account of what happened as it happened, including commentary and analysis from the entire Intelligencer team.
From my new report on how the Harris team’s debate strategy played out:
It took only a few minutes for Trump to grow flustered by Harris’s reference to a negative analysis of his economic plans by professors at Penn’s Wharton School, his alma mater. Minutes later, she directly quoted a tweet of his praising Chinese leader Xi Jinping over Beijing’s handling of COVID, and he once again spluttered. Soon after, he mixed up Virginia for West Virginia when he went on a tirade about Democrats and “after birth” abortion. He also praised the “genius and heart and strength” of the six conservative Supreme Court justices who overturned Roe v. Wade —a historically unpopular move. As Trump refused to make eye contact with Harris and grimaced into his notes, I was reminded of what Celinda Lake, a senior Democratic pollster who works with the Harris campaign, told me a few hours earlier: Research shows that 70 percent of what voters take away from debates is the theater aspect, only 30 percent is the actual policy difference. Harris’s campaign relished the chance to throw Trump off his game after he won the first debate against Biden by simply letting his opponent expose himself as just too old for the job. This time, the Harris team ran an ad on Fox News and stationed billboards around the city taunting Trump about his smaller crowd sizes, an obsession of his that voters find childish. When Trump accused her of busing in paid crowds to her own events, Harris looked like she almost couldn’t believe he took the bait instead of responding to her claim that he doesn’t care about everyday voters. She laughed as Trump insisted that undocumented migrants were eating family pets in Ohio, a far-right conspiracy that took his focus far from his straightforward attempts to blame her for the migrant surge at the southern border. One top Democratic operative, who’d been basically comatose at that point early in the first debate, started texting me “YES” “YES” “YES” every few moments as Trump preached to the far-right corners of X more than persuadable voters in swing states.
You can read the rest here .
To be clear, these results should be taken with a grain of salt, but:
And this isn’t a poll, but still interesting:
Which begs the question:
As I argue in my review of the debate , Harris “baited Donald Trump into losing his temper, then used the visual contrast between them to establish herself as not only a plausible president but the only plausible president onstage.” Also:
The clearest success Harris registered was in performing the role of president. She repeatedly touted her economic plan, rebutting the charge she lacks ideas, which is intended to present her as a lightweight. She also did this by citing her foreign-policy experience, meeting with Volodymyr Zelenskyy and organizing a NATO response to Russia’s invasion. The importance of these validators might be overlooked, but many Americans have old-fashioned views of presidential qualifications, associating it with masculinity. Most important, she established herself as presidential by appearing calm and confident, in vivid contrast to the bellowing lunatic on the stage beside her.
From my new take on Trump’s bad night:
Trump has a devoted following of people who believe his revisionist take on reality, who don’t accept the experts or the statistics or logic or the evidence of their own eyes and ears. It’s hard to imagine, however, that many persuadable people watching this debate will find it so easy to accept that Trump is right about everything and everyone else is wrong. To the extent that Trump made his war on reality so sweeping and absolute and furious on the stage in Philadelphia, he lost not just the debate but his grip.
Read the rest here .
Doing your own stint in the spin room is rarely a good sign, and now taylor swift has endorsed kamala harris.
From her cat-featuring announcement post on Instagram following the debate, which Swift said she watched:
Recently I was made aware that AI of ‘me’ falsely endorsing Donald Trump’s presidential run was posted to his site. It really conjured up my fears around AI, and the dangers of spreading misinformation. It brought me to the conclusion that I need to be very transparent about my actual plans for this election as a voter. The simplest way to combat misinformation is with the truth. I will be casting my vote for Kamala Harris and Tim Walz in the 2024 Presidential Election. I’m voting for @kamalaharris because she fights for the rights and causes I believe need a warrior to champion them. I think she is a steady-handed, gifted leader and I believe we can accomplish so much more in this country if we are led by calm and not chaos. I was so heartened and impressed by her selection of running mate @timwalz , who has been standing up for LGBTQ+ rights, IVF, and a woman’s right to her own body for decades.
Trump: ‘my best debate, ever’.
That’s what he claimed in a post-debate Truth Social message (while also attacking Harris and the moderators):
I thought that was my best Debate, EVER, especially since it was THREE ON ONE!
Minutes after the first presidential debate between Trump and Harris ended, Harris campaign chair Jen O’Malley Dillon issued a statement claiming victory and extending an offer for a second matchup between the two candidates. “Vice President Harris is ready for a second debate. Is Donald Trump?” she wrote.
The Washington Post reports that Trump’s team appears game, for now, with campaign adviser Chris LaCivita saying of the request, “Of course. They need clean up.”
Reports pooler Sara Cook on the second commercial break:
The second the stage hand said they were clear for a 4 minute break, Trump turned towards the exit, gave a big sigh through closed lips, and walked off stage without looking at Harris. From the time the moderators announced they were going to break, Harris began writing on her notepad. She wrote continuously for the entire first two minutes of the break, occasionally bringing one hand to her chin or brushing hair behind her ear. She then reviewed what she wrote for the next minute, making a few tweaks, before putting the pen down and looking out around the room with her hands folded in front of her. She took a sip of water from a glass placed under the lectern. Trump walked back onstage 30 seconds before the end of break. He did not look at Harris, she did not look at him. Harris made small adjustments to her collar. Both candidates looked straight ahead until the program restarted. Again, no words were spoken.
As it was throughout the debate, a clear contrast:
Julia Ioffe notes wryly that Hillary Clinton won her debates against Donald Trump, too. It’s an important reminder that what liberals and media critics consider a successful performance is not necessarily going to be persuasive to the small segment of swing voters that need to be persuaded in this election.
Donald Trump has promised a big beautiful Obamacare replacement for almost a decade without ever really articulating what it would look like. (It was even unclear when Republicans came close to repealing the Affordable Care Act in 2017.) But never fear: He finally unveiled some specifics on Tuesday night:
“I have concepts of a plan,” won’t rescue Trump from a disastrous performance, but to my weary brain, it’s gold. Not only does it sound like the title of a forgotten shoegaze album; it’s emblematic of Trump himself. He has nothing, really, just bigotry and a handful of vague positions. He specializes in vibes, and bad ones at that. That line was one of his more honest moments. It’ll rattle around in my mind palace for weeks to come.
After Trump accused Democrats for wanting to take away everyone’s firearms, Harris said something surprising: She is a gun owner. It’s not news, however. In 2019, according to CNN , her presidential campaign at the time said she purchased a handgun for personal protection and keeps it locked in a safe.
When Trump was asked about his past comments on Harris’s race, he started out by saying that he doesn’t care at all about how she identifies. And then he doubled down.
“All I can say is I read where she was not Black, that she put out — I’ll say that. And then I read that she was Black and that’s okay,” he said of Harris, who is Black and Indian. “Either one was okay with me. That’s up to her.”
In response, Harris raised Trump’s past examples of racism, spending a significant amount of time on his treatment of the Central Park 5, who were heavily featured at the DNC last month. “It’s a tragedy that we have someone who wants to be president who has consistently used race to try to divide us,” she said.
Yusef Salaam, a New York City councilman and member of the Central Park 5, is expected to be in the spin room after the debate.
And considering how he’s using that time, Harris is probably fine with that.
Kamala Harris’s response to Donald Trump on the Russia-Ukraine war is not focused on attacking Trump. Instead, she uses it to recount her foreign-policy work, meeting with Zelenskyy and NATO.
One of her most important obstacles to overcome is still that many voters question whether she, or any woman, is strong enough to serve as commander-in-chief.
Harris is taking trump seriously.
I am surprised she’s not being more dismissive in her posture. What she is doing is effective, but I was anticipating her emphasizing, for instance, that the reason why she had to introduce herself to Trump at the start of the debate is because he skipped the inauguration, because he was a sore loser, etc. They are both taking each other extremely seriously in their exchanges.
They know it’s not over and that there’s plenty to do, but the overriding feeling I’ve gotten from Democrats close to the Harris campaign over the debate’s first hour is immense relief. So many were scarred by the last debate and downplayed what Harris had to do tonight. But they’re unanimous now that her obvious strategy of getting under Trump’s skin has worked wonders.
One top Democrat who was catatonic during the last debate has just been texting me “YES” “YES” “YES” every few minutes, peaking as Trump rambled about pets being eaten and when Harris started laughing at him.
Their deeper feeling isn’t quite so gleeful. They know her most important audience tonight is undecided voters, not just people who hate Trump, and that this is almost certainly the largest audience she’ll get all campaign long. There’s a half-hour left, and Trump keeps hammering her on the border, one of her biggest weaknesses.
But they’re happy with how she got through the economics section, thrilled with her answers on abortion — her campaign adviser David Plouffe said on X that the campaign’s internal numbers showed a 40-point gap among undecided voters while they were talking — and they clearly see a path to success in letting him ramble incoherently while she tries to present herself as a chance to break beyond the messy, unproductive politics of the last decade. Harris’s campaign says that its live-testing of battleground-state undecided voters hit its lowest point when Trump was going on about insisting he won the 2020 race.
At this point during Trump-Biden debate, the president’s team was desperately hoping no one was watching. Right now, Harris’s is praying that everyone’s tuning in and that this is, like ABC keeps saying, the most consequential debate in history.
In a comment that was even more startling than his description of her as a Marxist, Trump said of Harris that “Biden hates her.” Keep in mind that Biden hand-picked her as his vice-president, then made sure she rather than many other plausible Democrats was his successor when he withdrew from the 2024 race, and then spoke on her behalf at the convention and has been campaigning with her.
So who are you going to believe? Trump or your lying ears and eyes?
Congressional Republicans have tuned into the Trump-Harris debate and, so far, they’re not liking what they’re seeing. Several conceded to reporters that Harris successfully forced Trump off his game:
Senator Lindsey Graham, an ally of Trump, took to social media to complain about the moderators who have heavily fact-checked the former president:
Another response:
After Kamala Harris taunted Trump with the contempt with which world leaders held the former president, Trump had one shining example of a foreign fan who is his validator: Hungarian authoritarian Viktor Orban! Aside from the fact that few viewers likely knew who he was talking about, the few who did were probably horrified. Whether you consider Orban a new Franco, or a new Perón, or a new Mussolini, he’s hardly a role model for American leadership.
For what it’s worth:
In an amazing turn of his extraordinarily frequent oscillations on what happened on January 6, Trump now says he had nothing to do with what happened at the Capitol. He just made a speech, and Nancy Pelosi (!) was responsible for what happened. He’s not acknowledging all the steps he took that led up to January 6 or — as David Muir tried to remind him, that he, not Pelosi, not Harris, not Biden — was president that day.
Trump also mentioned the shooting of Ashli Babbitt, a Capitol rioter — then pivoted immediately to immigration yet again. “She is the border czar,” he said falsely of Harris. “What about those people?” he asked. “When are they going to be prosecuted?” He then repeated a line from earlier, saying that crime rates are going down in other countries because criminals are crossing the border. (Violent crime in the United States is down, I should note.) Trump has nothing of substance to say; his only real attack point is immigration, immigration, immigration. He won’t take responsibility for the Capitol Riot and, minutes later, would not admit he lost the election to President Joe Biden before he brought up — you guessed it — immigration. Again.
As completely expected, Donald Trump spouted a lot of lies and gross exaggerations during Tuesday’s debate. But unlike on some other debate nights, the network in charge is doing some effective real-time fact-checking. At least three times, one of the two ABC moderators, David Muir and Linsey Davis, have stepped in and corrected Trump after particularly egregious answers.
“There is no state in this country where it’s legal to kill a baby after it’s born,” Muir announced after Trump falsely claimed otherwise during an answer on abortion.
Muir also informed the audience at home that there is no evidence of immigrants eating dogs in Ohio, which Trump claimed, spinning off a popular conservative conspiracy theory that raced across the internet this week.
Trump tries to blame harris for assassination attempt.
In a shocking moment, Trump seemed to directly accuse Harris of contributing to the attempt on his life, suggesting her rhetoric played a role. “I probably took a bullet to the head because of the things they said about me,” he said.
It appears to be part of a recent trend from Trump and his circle to raise the specter of conspiracy around the July assassination attempt ahead of the November election. On Monday, Trump’s wife, Melania, shared a video suggesting that there was “more to the story” of the shooting.
ABC asked Vice-President Harris about her previous positions calling for a fracking ban, a mandatory assault-weapons buyback, and decriminalizing immigration enforcement.
Harris in her reply says she supports fracking, and notes the Inflation Reduction Act, which she voted for, expanded fracking. But she doesn’t mention the other issues, and retreats into a generalized defense of her values. It’s her weakest response so far.
Trump raised a debunked racist hoax on the stage, claiming that immigrants in Springfield, Ohio, are eating and killing residents’ pets. The rumors have been shared by Republican allies of Trump and his running mate, J.D. Vance. “In Springfield, they’re eating the dogs, the people that came in. They’re eating the cats. They’re eating the pets of the people that live there and this is what’s happening in our country,” he said.
Debate moderator David Muir cut in to correct Trump on his claim, but Trump continued on, saying that people have said so on TV.
When it was her turn to speak, Harris seemed pleased with the exchange. “I mean, talk about extreme,” she said with a laugh.
Harris’s powerful abortion answer wasn’t just her strongest rhetorical moment; it was directly responsive to the contemporary hellscape of women sitting in parking lots bleeding out, patients being forced to cross state lines for their procedures, minors being forced to stay pregnant after assault, and IVF in the crosshairs. Before Roe v. Wade was overturned, this probably felt theoretical to a lot of Americans, and polls were all over the place. Now, all of the aforementioned stories are real.
Meanwhile, Trump’s trying to run an old playbook on abortion, the one the Susan B. Anthony List historically encouraged: trying to make Democrats squirm by bringing up later abortions, or as he put it to Hillary Clinton back in 2016, claiming that they support “the baby out of the womb of the mother just prior to the birth of the baby,” and lying about so-called post-birth abortions, which do not exist. (He was distorting comments made about newborn hospice and confusing West Virginia and Virginia in the process.) The problem for him is that he’s talking hypotheticals about a past that he can’t substantiate, while every day, Americans read headlines about real-life consequences that have profoundly affected public opinion.
I have good and bad news if you had “Trump shouts out the ‘late great Hannibal Lecter’” on your debate bingo card. Harris brought up Trump’s favorite fictional cannibal as an example of the unhinged things Trump says at his rallies rather than focusing on ways to help the American people.
On abortion, former president Donald Trump offered a rambling, if familiar, answer, saying falsely that liberal states “have abortion in the ninth month.” Then he misspoke in the process of lying: He claimed the the governor of West Virginia wanted to execute babies after birth, when he usually means Ralph Northam, the former governor of Virginia. “For 52 years they’ve been trying to get Roe v. Wade into the states,” he said and praised the “genius, heart and strength” of his chosen Supreme Court justices, who voted to overturn Roe . This is typical Trump: He wants to take credit for killing Roe , a decision that he claims, falsely, is popular, but he doesn’t want to answer direct questions about his own position on issues like Florida’s abortion referendum.
Then Vice-President Kamala Harris swiftly and decisively put him on the defensive, referring to “Trump abortion bans” in conservative-controlled states and describing the human consequences of those bans. Some don’t have exceptions for rape or incest, she said, adding, “That is immoral.” People do not have to “abandon their faith or their deeply held beliefs” in order to oppose the government — and Trump — making reproductive decisions for them.
It was an effective line of attack, and Trump didn’t credibly respond. All he could do was accuse Harris (again, falsely) of supporting abortion as late as the ninth month of pregnancy “and probably after birth.” If Trump thinks he can appeal to moderates and independents by claiming to support certain exceptions to abortion bans, he’s failing. His arsenal contains lies and not much else. As personal stories of harm emerge in states with bans on the books, it’s harder and harder for Trump to distance himself from the world he’s created — and would reinforce if reelected president.
Harris, baiting trump, flags his rallies, trump dodges question about signing a national abortion ban.
“Will you veto a national abortion ban?” asks a moderator. “Well, I won’t have to,” Trump replies. Trump hems and haws about whether a national abortion law will pass Congress. He is told that J.D. Vance promised he would veto a national abortion ban. Trump replies that he didn’t talk to Vance.
That sure sounds like he wouldn’t veto a ban.
Trump’s advisers were preoccupied with two things ahead of the debate: (1) Prevent him from getting angered by whatever Harris says, which they worried would knock him off message. (2) Encourage him to hang back, in the hopes that Harris might be forced to talk more expansively, which they hoped would lead to her producing mangled sentences they could utilize in service of their argument that she speaks incoherently (as opposed to Trump, who is of course famously coherent.)
Just a few minutes in, he’s already angry. It seemed to start with Harris mentioning the Wharton School, which was an artful way to trigger him and it worked instantly. Now he is speaking at a high volume and rather aggressively. He is still on message, but for how long? Harris meanwhile seems to be tailoring her facial expressions for memes. People looking in a befuddled way at Donald Trump is a robust genre already, and I expect she’ll make a meaningful contribution to that trove by the end of the night.
Trump seemed to briefly mix up the two Virginias during a winding response on abortion. While inaccurately claiming that states are performing abortions after nine months, Trump appeared to make a reference to former Virginia governor Ralph Northam, praising his successor and ally Glenn Youngkin, but said West Virginia instead. A mix-up that likely won’t endear him to the commonwealth.
After initially claiming that Harris was just a Biden rubber-stamp and then that she had no policies at all, Trump suddenly lurched into a flat assertion that Harris is a Marxist, alluding to the occasional description her father as a “Marxist economist.” He offered no explanation of this claim, but guess Harris is lucky he didn’t call her a “communist” as he often has.
The economy is one of Trump’s best issues, per polling. The race is basically tied, and Trump’s strength is the perception he is an economic mastermind — a perception that is winning over some voters who otherwise don’t like him. Trump needs to win a clear victory on the economy. I don’t think he did at all, but we’ll see what the viewers think.
Harris blames trump for praising china’s covid response.
Kamala Harris not only quoted Trump praising Xi Jinping’s handling of COVID; she noted China’s lack of transparency on the origins of the pandemic. That is an interesting position for her to take, and a correct one, in my view. But it’s also one conservatives have largely owned, because some progressives have treated the hypothesis that the pandemic emerged from a lab as a conspiracy theory. Harris seems to be taking the other side.
One of the Democrats’ most effective attack lines against Donald Trump and other Republicans this year has been Project 2025 , the draconian playbook the Heritage Foundation and conservatives government cooked up for a second Trump presidency. Harris mentioned it early on even though the answer had little to do with the question Harris is asked, but it likely won’t be the first time she hits it tonight.
Trump, on the defensive, claimed ignorance. “I haven’t read it, I don’t want to read it,” he said.
When the two candidates came out, one question was answered when Kamala Harris approached Trump with a handshake that he awkwardly answered. The first question to Harris reprised the famous 1980 Reagan debate question: “Are [we] better off than four years ago?” She did not answer but instead went into her stock “opportunity economy” message, followed by a brisk denunciation of Trump’s economic agenda of tax cuts and tariffs. Following up, Trump introduced alleged uncontrolled immigration as wrecking the economy, and in a series of follow-ups, the two candidates hammered each other along the lines we expected, with Harris citing Project 2025 and Trump mocking Harris’s policy specifics.
Harris forces a handshake.
After much speculation it wouldn’t happen, there was indeed a handshake, but it didn’t come easily. Harris clearly insisted and had to walk all the way to Trump’s lectern to make it happen. “Kamala Harris. Let’s have a good debate,” she said. Trump replied, “Nice to see you, have fun.” Awkward!
Will that mass deportation involve barbed wire and cattle cars.
If my colleague Sarah Jones is right that Trump could “get nasty — and racist — fast” on immigration once the debate begins, then Kamala Harris will have a strategic decision to make on how to handle one of Trump’s signature issues. Up until now, she’s basically dealt with immigration by endorsing the bipartisan border-control bill that Trump killed earlier this year and moved on to other issues. But should Trump really go wild, she might consider poking him a bit on the implications of his promise to launch the greatest “mass deportation” in American history, involving every undocumented immigrant. Because of their defensiveness on the issue, Democrats have not raised alarms about the details of this terrible-sounding plan or the implications for Latino citizens and legal immigrants. who may be hassled or even rounded up in such an effort. Trump needs to pay a price for this very un-American America First idea.
Staffer 1: The debate room looks like Avatar. Staffer 2: rather aquatic Staffer 1: the podiums look different height Staffer 4: it’s making me feel a little insane Staffer 1: Trump is going to lose it Staffer 5: Shouldn’t the plural be “podia”? Yet it isn’t, strange Staffer 1: I went to a state school Four minutes later… Staffer 5: FYI, according to this AI Overview, “The plural of the word ‘podium’ is ‘podiums’ or ‘podia’” Six minutes later… Staffer 6: Chiming in to note that these are neither podiums nor podia. They are lecterns . The podium is the thing you stand on, not the thing you stand at.
Far-right activist Laura Loomer was seen leaving Trump’s plane after it arrived in Philadelphia. Loomer’s presence is notable for her extremism: She has called Islam “a cancer” and celebrated the deaths of migrants who were crossing the Mediterranean. On an extremist podcast in 2017, Loomer, who is Jewish, said, “Someone asked me, ‘Are you pro-white nationalism?’ Yes. I’m pro-white nationalism.”
Nevertheless, Trump supported her failed 2020 congressional race in Florida, she has flown on his plane in the past, and he reportedly wanted to hire her for a campaign role — until aides intervened, the Washington Post reported . With Loomer onboard, Trump may be in a pugnacious mood, especially on immigration. On Truth Social yesterday and today, he repeatedly boosted the viral lie that Haitian immigrants in Springfield, Ohio, are kidnapping and eating pets, and Republicans have tried to link Vice-President Harris to President Biden’s immigration policy with that rumor and in other talking points. The debate has yet to start, but expect Trump to quickly get nasty — and racist — once it begins.
At the end of the June 27 debate between Joe Biden and Donald Trump, CNN moderators tried three times to get a clear answer from Trump as to whether he would accept defeat in November. Indeed, not that a single person noticed, but Biden’s last words before the candidates went to closing remarks trolled and mocked Trump for his refusal to answer what turned out to be the $64,000 question of the 2020 election:
You’re a whiner. When you lost the first time, you continued to appeal and appeal to courts all across the country. Not one single court in America said any of your claims had any merit, state or local, none. But you continue to promote this lie about somehow there’s all this misrepresentation, all the stealing. There’s no evidence of that at all. And I tell you what? I doubt whether you’ll accept it because you’re such a whiner.
The odds are very high that the same fraught question will come up tonight and that Trump will again hedge and change the subject. Unless the moderators can find a more precise way to elicit a clear answer, Harris may need to do so herself with a pledge of her own.
There are at least a couple of benchmarks moderators or Harris could suggest for a concrete agreement by the candidates not to let the contest go until another horrifying January in Washington. One would be to accept the results if the election is called by the Associated Press and all the major networks, including Fox News. Another is to accept the results as certified by governors (or the highest election official in each state), which federal law requires by December 11. If Trump rejects a moderator or Harris challenge to go along with any benchmark other than his subjective determination the election is “fair,” it will be safe to conclude he’s planning another election coup.
Politico reports that Trump Force One arrived with a lot of extra passengers, including “Stephen Miller, Natalie Harp, Laura Loomer, Vince Haley, Ross Worthington, John Coale, Steve Witkoff, Lara Trump, Alina Habba, Chris LaCivita, Steven Cheung, Susie Wiles, Corey Lewandowski, Eric Trump, Taylor Budowich, Tulsi Gabbard, Rep. Matt Gaetz, Margo Martin, Jason Miller, Boris Ephsteyn, Walt Nauta and Dan Scavino.”
ABC News seems rather noncommittal, per the New York Times :
Rick Klein, ABC News’s political director and a lead organizer of Tuesday’s debate, said in an interview that the moderators, David Muir and Linsey Davis , were “there to facilitate a discussion” and that “the debate belongs to the candidates.” Is there a role for the moderators to fact-check? “I don’t think it’s a ‘yes’ or ‘no’ proposition,” Mr. Klein said. “We’re not making a commitment to fact-check everything, or fact-check nothing, in either direction. We’re there to keep a conversation going, and to facilitate a good solid debate, and that entails a lot of things in terms of asking questions, moving the conversation along, making sure that it’s civilized.”
Greetings from the extremely air-conditioned press filing center–slash–spin room in Philadelphia, where I just settled in after almost running straight into a very busy looking Marco Rubio at my hotel a few blocks away. (He must be here spinning for Donald Trump.)
I spent most of today checking in with Democrats inside of and close to Kamala Harris’s campaign to see how they’re feeling, what they expect, what they want to see, and what they’re nervous about. I got a lot of different answers, but one thing stuck out: Basically, all of them agreed that more pressure is on Harris tonight, if only because she’s the new character in the race and the one voters are still interested in hearing more from. (The consensus: Voters know exactly who Trump is and don’t need any new information about him, thank you very much .)
Harris knows this, obviously. As I reported over the weekend , she hasn’t been prepping to deliver some sort of devastating knockout blow to Trump but instead has been thinking about the best ways to present herself as representing a new political era. That’s probably going to mean talking plenty about Trump’s record, naturally — but just as much, if not more, about her vision for the economy.
Of course, we’ll see how this all goes to plan or rather how quickly it veers into unexpected territory. As one Democratic pollster told me this afternoon, reliable research about debate audiences shows 70 percent of what matters to voters is the visual and the performance rather than the substance of what the candidates say.
So yes, Harris will be eager to let Trump be Trump, to put it mildly. Her campaign has been trolling Trump on the airwaves and with billboards about, uh, crowd size here in Philly. If he goes unhinged early, they’ll consider it a win. One top Democrat I talked to didn’t disagree that the pressure was on her but said the bar was pretty low after Biden’s performance this summer. Instead, this person suggested, Harris’s job is just to be the normal adult onstage. Isn’t that what exhausted voters want?
Spin-room drama abounds:
At a campaign stop in Arizona, he told supporters that Harris would use the debate to introduce their ticket to more of the country — and the contrast with Trump would be obvious:
Tonight you’re going to watch Vice President Harris lay out a plan for this country, a new way forward. You’re going to hear her talk about an economy that is an opportunity economy where everybody matters. She’s going to talk about education being a path to a better future, not long term student loan debt. She’s going to talk about tackling some of the toughest problems like climate change and doing it in a way that grows our economy. Now if you did a split screen to that, on the other side of that screen, you’re going to see a nearly 80 year old man who’s in it for himself talk about revenge and talk about how bad this country is, and talk us down on everything he does. … Let’s not let a single person, make the case that there is not an absolutely crystal clear difference of a positive forward America, or one that is small, petty, backwards and we’re done with it.
Hours before the debate, Donald Trump added a surreal note to the event by pitching a fit on Truth Social and demanding that congressional Republicans shut down the federal government at the end of September if Democrats don’t accept a ridiculous and redundant proposal to federalize state election systems in order to address a completely made-up crisis over noncitizen voting:
If Republicans in the House, and Senate, don’t get absolute assurances on Election Security, THEY SHOULD, IN NO WAY, SHAPE, OR FORM, GO FORWARD WITH A CONTINUING RESOLUTION ON THE BUDGET. THE DEMOCRATS ARE TRYING TO “STUFF” VOTER REGISTRATIONS WITH ILLEGAL ALIENS. DON’T LET IT HAPPEN - CLOSE IT DOWN!!!
By way of background, House Republicans earlier this year pushed through the so-called SAVE Act , reflecting Trump’s 100 percent unsubstantiated claims that Democrats are planning to flood the polls with voting by noncitizens. Noncitizen voting is already illegal in all 50 states with prison sentences and deportation the available penalties for the incredibly rare violation.
Congressional Republicans led by House Speaker Mike Johnson understood all along this was a empty “messaging” bill not designed to become law but to underline a MAGA campaign talking point. But now Trump has blown up that harmless if demagogic gesture by demanding that Johnson (and also Senate Republican Leader Mitch McConnell, who is likely to openly mock this gesture) refuse to go along with a stopgap spending plan at the end of the fiscal year that is necessary to keep the federal government operating. There is zero chance the Senate or the White House will go along with this demand, which would require all 50 states completely redo their process for voter registration right before a national election for absolutely no good reason. Johnson agreed to prioritize this dumb legislation in the first place because he needed Trump’s protection from a potential coup by the House Freedom Caucus, which was angry at Johnson for not shutting down the government earlier this year. Now, the dispute could become very real for federal employees and beneficiaries of key federal programs and services.
This is a very old theme for Trump despite its fictional underpinnings. When he won in 2016, he complained that he would have won the national popular vote (which he lost by over 2-and-a-half million votes) if not for “millions of illegal votes.” He offered zero evidence for this claim. He’s brought back the phantom menace of noncitizen voting this year as part of a broader claim that Democrats have opened up the borders to bring in migrants who will immediately be marched to the polls to reelect their socialist benefactors. You can understand how this hoax appeals to Trump since it combines his signature immigration and “stolen election” themes. Either Harris or the debate moderators should consider demanding that Trump cite some actual evidence that any of this is happening, not that hard-core MAGA folk need any for this version of the Great Replacement Theory .
On the political betting site Polymarket , most bettors don’t think Harris and Trump will shake hands tonight:
The odds are probably even worse than that, since there hasn’t been a presidential debate handshake since the first Trump-Clinton debate in 2016.
From my debate preview this morning, Harris has her work cut out for her:
Without question, [she] has the more complicated task: defining herself to viewers as an agent of change from the Biden-Trump era of politics, and a much safer option than an extremist second Trump administration. This means anticipating and rebutting Trump claims that she is responsible for Biden’s alleged policy failures and is more radical than Biden himself. And it also means casting some light through the fog of endless commentary about Trump to convincingly express concerns about what he will do if restored to power.
Trump, meanwhile, needs to focus on pigeonholing:
Trump’s biggest advantage is the extremely low standard he has set throughout his career for either coherence or civility. Almost anyone else would be afflicted with a dilemma as to whether to accuse Harris of being Biden 2.0 or a “communist,” since Biden is nobody’s idea of a dedicated Marxist-Leninist revolutionary. Trump can blithely pursue both angles of attack simultaneously, because that’s just who he is. Calling Harris a “radical” or a “Marxist” or a “communist” is what passes for a substantive comment from the former president, and he would be wise to stick with ideologically freighted criticism rather than slandering her personally (i.e., he should leave the blatantly racist and sexist patter to MAGA social media). Above all, the 45th president needs to do everything he can to fan doubts about Harris, making her out to be the “risky change” candidate and returning the election to a competition between highly motivated party bases with swing voters ultimately focused on their unhappiness with life as it is.
It will be broadcast live at 9 p.m. ET on ABC and simulcast on multiple other networks, including C-Span, PBS, MSNBC, and Fox News. The debate will also be streamed live on ABC.com and ABC News’ YouTube channel (for people without cable or streaming-service subscriptions), as well as on ABC News Live, Disney+, and Hulu.
This post has been updated a lot.
This email will be used to sign into all New York sites. By submitting your email, you agree to our Terms and Privacy Policy and to receive email correspondence from us.
Create your free account.
Password must be at least 8 characters and contain:
As part of your account, you’ll receive occasional updates and offers from New York , which you can opt out of anytime.
IMAGES
VIDEO
COMMENTS
Definition of Social Network Analysis (SNA) Social Network Analysis, or SNA, is a research method used to visualize and analyze relationships and connections between entities or individuals within a network. Imagine mapping the relationships between different departments in a corporation.
Due to the explosive rise of online social networks, social network analysis (SNA) has emerged as a significant academic field in recent years. Understanding and examining social relationships in networks through network analysis opens up numerous research avenues in sociology, literature, media, biology, computer science, sports, and more.
Abstract A quantitative approach to social network analysis involves the application of mathematical and statistical techniques and graphical presentation of results. Nonetheless—as with all sciences—subjectivity is an integral aspect of network analysis, manifested in the selection of measures to describe connection patterns and actors' positions (e.g., choosing a centrality indicator ...
1 Introduction. Social network analysis (SNA), in essence, is not a formal theory in social science, but rather an approach for investigating social structures, which is why SNA is often referred to as structural analysis [1]. The most important difference between social network analysis and the traditional or classic social research approach ...
This review of social network analysis focuses on identifying recent trends in interpersonal social networks research in organizations, and generating new research directions, with an emphasis on conceptual foundations. It is organized around two broad social network topics: structural holes and brokerage and the nature of ties. New research directions include adding affect, behavior, and ...
Social Network Analysis refers to the study conducted with an awareness of social networks, including connections with other analysts in the field. It involves examining relationships between individuals or groups to understand patterns and dynamics within social structures. AI generated definition based on: Social Networks, 2005.
Highlights • Up-to-date literature review of basic research and application domains in social networks. • Definition of a new set of metrics to measure the capacity of SNA frameworks and tools. • Quantitative analysis of social network analysis tools and frameworks (SNA). • Evaluation of 20 popular SNA software tools according to the new set of metrics. • SNA software technology ...
Learn from the experts how to apply social network analysis to various fields and topics with this comprehensive handbook.
Social network analysis (SNA) is a core pursuit of analyzing social networks today. In addition to the usual statistical techniques of data analysis, these networks are investigated using SNA ...
Research design for social network analysis (SNA), as for any other types of research, is a process during which the research question and set of methods that enable to answer the stated question are described. Social network analysis is a multidisciplinary research area, and in consequence a wide range of approaches to analyze network data exists.
Highlights • New trends and application of advanced data science and artificial intelligence techniques for knowledge extraction from social networks. • Selected papers related to the application of machine learning, soft computing, and computational intelligence to complex social media-based domains. • Current contributions and challenges in social media analysis, social network ...
Introduction Social networks can affect health beliefs, behaviours and outcomes through various mechanisms, including social support, social influence and information diffusion. Social network analysis (SNA), an approach which emerged from the relational perspective in social theory, has been increasingly used in health research. This paper outlines the protocol for a scoping review of ...
Social network analysis uses a variety of mathematical techniques, such as maximum likelihood estimation and p-models, for studying stochastic and dynamic events to investigate the structural features of social media usage, which could be reflective of real contexts with certain levels of chaos, or entropy.
Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. [1] It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them.
Social Network Analysis Social Network Analysis (SNA) is an analytical method used to study social structures through the use of networks and graph theory. It identifies the relationships between individuals, organizations, or other entities and examines the patterns and implications of these relationships.
Social network analysis is the process of investigating social structures through the use of networks and graph theory. This article introduces data scientists to the theory of social networks, with a short introduction to graph theory and information spread.
Previous research exploring the affinity space concept in online social networking spaces has focused primarily on health science networks. The research by Sharma et al. (Citation 2021) examined a much smaller network of 158 participants and 514 posts in a diabetes-focused affinity space. When examining the affinity space, Sharma and colleagues ...
Social Network analysis is the study of structure, and how it influences health, and it is based on theoretical constructs of sociology and mathematical foundations of graph theory. Structure refers to the regularities in the patterning of relationships among individuals, groups and/or organizations. When social network analysis is undertaken ...
Social Network Analysis and Mining is a multidisciplinary journal focusing on theoretical and experimental work related to social network analysis and mining. Serves a wide range of researchers from computer science, network science, social sciences, mathematical sciences, medical, biological, financial, management, and political sciences.
Hence, big data analytic techniques and frameworks are commonly exploited in Social Network Analysis (SNA). By the ever-increasing growth of social networks, the analysis of social data, to describe and find communication patterns among users and understand their behaviors, has attracted much attention.
Professor Song Yang in the Department of Sociology and Criminology published a new book, Social Network Analysis in Action, by Springer. This edited volume includes six cutting-edge chapters on various aspects of social network analysis, starting with basics of social network theories, research designs, data analytics, moving to advanced topics in social network analysis, data mining from ...
Unravel the complexities of Social Network Analysis (SNA) with our complete guide. Explore methodologies, applications, challenges, and ethical considerations.
Research in social network analysis (SNA) faces unprecedented ethical challenges today due to both technological developments ('big' data) and a growi…
Based on resource-based view (RBV) and social network theory (SNT), we developed a model to examine the role of cross-organizational improvisation (COI) and social ties (specifically, business and political ties) in the relationship between DEO and green innovation (GI). ... theoretical analysis predominates over empirical research (Hains ...
This project delves into sentiment analysis on Twitter using Long Short-Term Memory Neural Networks in conjunction with Global Vectors for Word Representation (GloVe) to highlight the potential of LSTM neural networks for sentiment analysis on social media platforms like Twitter. This project delves into sentiment analysis on Twitter using Long Short-Term Memory (LSTM) Neural Networks in ...
The Census Bureau first asked everybody in the U.S. about Hispanic ethnicity in 1980. But it made some efforts before then to count people who today would be considered Hispanic. The Census Bureau also has a long history of changing labels and shifting categories.In the 1930 census, for example, the race question had a category for "Mexican."
The National Heart Lung and Blood Institute Growth and Health Study Research Group. Obesity and cardiovascular disease risk factors in black and white girls: the NHLBI Growth and Health Study. Am J Public Health . 1992;82(12):1613-1620. doi: 10.2105/AJPH.82.12.1613 PubMed Google Scholar Cross
Research Objectives In this article, we investigate the following research questions on sentiment analysis and trend prediction in social media comment sections: (RQ1): Can we design a tool to effectively track and analyze the evolution of sentiment trends in social media comment sections across different time frames, focusing on both short-term fluctuations and long-term shifts in public opinion?
The Family Research Council is a Project 2025 partner. Video file Citation From the September 12, 2024, edition of Family Research Council's Washington Watch
From my new report on how the Harris team's debate strategy played out:. It took only a few minutes for Trump to grow flustered by Harris's reference to a negative analysis of his economic ...