research on social networks analysis

Social Network Analysis 101: Ultimate Guide

Comprehensive introduction for beginners.

Social network analysis is a powerful tool for visualizing, understanding, and harnessing the power of networks and relationships. At Visible Network Labs, we use our network science and mapping tools and expertise to track collaborative ecosystems and strengthen systems change initiatives. In this Comprehensive Guide, we’ll introduce key principles, theories, terms, and tools for practitioners framed around social impact, systems change, and community health improvement. Let’s dig in!

Learn more and get started with the tools below in our complete Guide.

You can read this guide from start-to-finish or use the table of contents to fast forward to a topic or section of interest to you. The guide is yours to use as you see fit.

Introduction

Let’s start by reviewing the basics, like a definition, why SNA is important, and the history of the practice. If you want a quick intro to this methodology, download our Social Network Analysis Brief .

Definition of Social Network Analysis (SNA)

Social Network Analysis , or SNA, is a research method used to visualize and analyze relationships and connections between entities or individuals within a network. Imagine mapping the relationships between different departments in a corporation. The outcome would be a vivid picture of how each department interacts with others, allowing us to see communication patterns, influential entities, and bottlenecks

The Importance of SNA

SNA is a powerful tool. It allows us to explore the underlying structure of an organization or network, identifying the formal and informal relationships that drive the formal processes and outcomes. This insight can enable better communication, facilitate change management, and inspire more efficient collaboration.

This methodology also helps demonstrate the impact of relationship-building and systems change efforts by documenting the changes in the quality and quantity of relationships before and after the initiative. The maps and visualizations produced by SNA are an engaging way to share your progress and impact with stakeholders, donors, and the community at large.

Brief Historical Overview of SNA

The concept of SNA emerged in the 1930s within the field of sociology. Its roots, however, trace back to graph theory in mathematics. It was not until the advent of computers and digital data in the 1980s and 1990s that SNA became widely used, revealing new insights about organizational dynamics, community structures, and social phenomena.

While it originated as an academic research tool, it is increasingly used to inform real-world practice. Today, it is used in a broad variety of industries, fields, and sectors, including business, web development, public health, foundations and philanthropy , telecommunications, law enforcement, academia, and systems change initiatives, to name a few.

Fundamentals of SNA

SNA is a broad topic, but these are some of the essential terms, concepts, and theories you need to know to understand how it works.

Nodes and Edges

In SNA, nodes represent individuals or entities while edges symbolize the relationships between them. For example, in an inter-organizational network, nodes might be companies, and edges could represent communication, collaboration, or competition.

Network Types

Different types of networks serve different purposes. ‘Ego Networks’ focus on one node and its direct connections, revealing its immediate network. ‘Whole Networks’, on the other hand, capture a broader picture, encompassing an entire organization or system. Open networks are loosely connected, with many opportunities to build new connections, ideal for innovation and idea generation – while closed networks are densely interconnected, better for refining ideas amongst a group who all know each other.

Network Properties

Properties such as density (the proportion of potential connections that are actual connections), diameter (the longest distance between two nodes), and centrality (the importance of a node within the network) allow us to understand the network’s structure and function. Metrics also can measure relationship quality across the network, like our validated trust and value scores.

Dyadic and Triadic Relationships

Dyadic relationships involve two nodes, like a partnership between two companies. Triadic relationships, involving three nodes, are more complex but can offer richer insights. For instance, it might show how a third company influences the relationship between two others, or which members of your network are the best at building new relationships between their peers.

Homophily and Heterophily

Homophily refers to the tendency of similar nodes to connect, while heterophily is the opposite. In a business context, we might see homophily between companies in the same industry and heterophily when seeking diversity in a supply chain. Many networks aim to be diverse but get stuck talking to the same, similar partners. These network concepts underly many strategies promoting network innovation to avoid group-think among likeminded partners.

Network Topologies

Lastly, the layout or pattern of a network, its topology, can reveal much about its function. For instance, a centralized topology, where one node is connected to all others, may indicate a hierarchical organization, while a decentralized topology suggests a more collaborative and flexible environment. This is also referred to as the structure of the network. Read more.

Theoretical Background of SNA

Many different theories have developed to explain how certain network properties, like their topology, centrality, or type, lead to different outcomes. Here are several key theories relevant to SNA.

Strength of Weak Ties Theory

This theory postulates that weak ties or connections often provide more novel information and resources compared to strong ties. These “weak” relationships, which may seem less important, can serve as important bridges between different clusters within a network. Read more.

Structural Hole Theory

This theory posits that individuals who span the structural holes, or gaps, in a network—acting as a bridge between different groups—hold a strategic advantage. They can control and manipulate information and resources flowing between the groups, making their position more influential. Read more

Small World Network Theory

This theory emphasizes the interconnectedness of nodes within a network. It suggests that most nodes can be reached from any other node through a relatively short path of connections. This property leads to the famous phenomenon of “six degrees of separation,” indicating efficient information transfer and connectivity in a network.

Barabási–Albert (Scale-Free Network) Model

This model suggests that networks evolve over time through the process of preferential attachment, where new nodes are more likely to connect to already well-connected nodes. This results in “scale-free” networks, where a few nodes (“hubs”) have many connections while the majority of nodes have few.

Data Collection and Preparation

Every network mapping begins by collecting and preparing data before it can be analyzed. This data varies widely, but at a basic level, they must include data on nodes (the entities in the network) and data on edges (the lines between nodes representing a relationship or connection). Additional data on the attributes of the nodes or edges add more levels of analysis and insight but are not strictly necessary.

Primary Methods for Collecting SNA Data

This can be as simple as conducting interviews or surveys within an organization. The more complex the network, the more difficult it is to collect good primary data: If you have more than 5-10 partners, interviews and surveys are hard to conduct by hand.

Network survey tools like PARTNER collect relational data by asking respondents who they are connected to, and then asking them about aspects of their relationships to provide trust, value, and network structure scores. This is impossible to do using most survey software like Google Forms without hours of cleaning by hand.

Response rates are an important consideration if using surveys for data collection. Unlike a typical survey where a small sample is representative, a network survey requires a high response rate – 80% and above are considered the gold standard.

In an inter-organizational context where surveys are impossible, or you cannot achieve a valid response rate, one might gather data through business reports, contracts, or publicly available data on partnerships and affiliations. For example, you could visit an organization’s website to note who they list as a partner – and do the same for others – to generate a basic SNA map.

Secondary Sources of SNA Data

Secondary sources include data that was already collected but can be used again, often to complement your use of primary data you collect yourself. This might include academic databases, industry reports, or social media data. It’s important to ensure the accuracy and reliability of these sources.

You can also conduct interviews or focus groups with network members to add a qualitative perspective to your results. These mixed-method SNA projects provide a great deal more depth to their network maps through their conversations with numerous network representatives to explore deeper themes and perspectives.

Ethical Considerations in Data Collection

When collecting data, it’s crucial to ensure privacy, obtain necessary permissions, and anonymize data where necessary. Respecting these ethical boundaries is critical for maintaining trust and integrity in your work.

Consider also how your SNA results will be used. For example, network analysis can help assess how isolated an individual is to target them for interventions. Still, it could also be abused by insurance companies to charge these individuals a higher rate (loneliness increases your risk of death).

Lastly, consider ways to involve the communities with stake in your SNA using approaches like community-based participatory research. Bring in representatives from target populations to help co-design your initiative or innovation as partners, rather than patients or research subjects.

Preparing Data for Analysis

Data needs to be formatted correctly for analysis, often as adjacency matrices or edgelists. Depending on the size and complexity of your network, this can be a complex process but is crucial for meaningful analysis.

If you are new to SNA, you can start by laying out your data in tables. For example, the table below shows a relational data set for a set of partners within a public health coalition. The first column shows the survey respondent (Partner 1), the second shows who they reported as a partner, the third shows their reported level of trust, and the fourth their reported level of collaboration intensity. This is just one of many ways to lay out and organize network data.

Depending on which analysis tool you choose, a varying degree of data preparation and cleaning will be required. Usually, free tools require the most work, while software with subscriptions do a lot of it for you.

Partner 1	Partner 2	Trust (1-4)	Level of Collaboration
Mayor’s Office	Local Hospital	3	Coordination
Public Health Dept.	Primary Care Clinic	4	Cooperation
Mayor’s Office	Public Health Dept.	2	Awareness

Network Analysis Methods & Techniques

There are many ways to analyze a network or set of entities using SNA. Here are some of basic and advanced techniques, along with info on network visualization – a major component and common output of SNA projects.

Basic Technique: Network Centrality

One of the most common ways to analyze a network is to look at the centrality of various nodes to identify key players, information hubs, and gatekeepers across the network. There are three types of centrality, each corresponding to a different aspect of connectivity and centrality. Degree, Betweenness, and Closeness Centrality are measures of a node’s importance.

Degree Centrality

Can be used to identify the most connected actors in the network. These actors are considered “popular” or “active” and they often have a strong influence within the network due to their numerous direct connections. In a coalition or network, these nodes could be the organizations or individuals that are most active in participating or the most engaged in the network activities. They may be the ‘go-to’ people for information or resources and have a significant impact on shaping the group’s agenda.

Betweenness Centrality

A useful for identifying the “brokers” or “gatekeepers” in the network. These actors have a unique position where they connect different parts of the network, facilitating or controlling the flow of information between others. In a coalition context, these could be the organizations or individuals who have influence over how information, resources, or support flow within the network, by virtue of their position between other key actors. These actors could play crucial roles in collaboration, negotiation, and conflict resolution within the network.

Closeness Centrality

A measure of how quickly a node can reach every other node in the network via the shortest paths. In a coalition, these nodes can disseminate information or exert influence quickly due to their close proximity to all other nodes. These ‘efficient connectors’ are beneficial for the rapid spread of information, resources, or innovations across the network. They could play a vital role during times of rapid change or when swift collective action is required.

Advanced Techniques: Clusters and Equivalence

Clustering Coefficients

The Clustering Coefficient provides insights into the “cliquishness” or local cohesion of the network around specific nodes. In a coalition or inter-organizational network, a high clustering coefficient may indicate that a node’s connections are also directly connected to each other, forming tight-knit groups or sub-communities within the larger network. These groups often share common interests or objectives, and they might collaborate or share resources more intensively. Understanding these clusters can be crucial for coalition management as it can highlight potential subgroups that may need to be engaged differently, or that might possess different levels of influence or commitment to the coalition’s overarching goals.

Structural Equivalence

Structural Equivalence is used to identify nodes that have similar patterns of connections, even if they do not share a direct link. In a coalition context, structurally equivalent organizations or individuals often occupy similar roles or positions within the network, and thus may have similar interests, influence, or responsibilities. They may be competing or collaborating entities within the same sectors or areas of work. Understanding structural equivalence can provide insights into the dynamics of the network, such as potential redundancies, competition, or opportunities for collaboration. It can also reveal how changes in one part of the network may impact other, structurally equivalent parts of the network.

Visualizing Networks

Network visualization is a key tool in Social Network Analysis (SNA) that allows researchers and stakeholders to see the ‘big picture’ of the network structure, as well as discern patterns and details that may not be immediately evident from numerical data. Here are some key aspects and benefits of network visualization in the context of a coalition or inter-organizational network:

Overview of Network Structure: Visualizations provide a snapshot of the entire network structure, including nodes (individuals or organizations) and edges (relationships or interactions). This helps to comprehend the overall size, density, and complexity of the network. Seeing these relationships mapped out can often make the network’s structure more tangible and easier to understand.

Identification of Key Actors: Centrality measures can be represented visually, making it easier to identify key actors or organizations within the network. High degree nodes, gatekeepers, and efficient connectors will stand out visually, which can assist in identifying who holds influence or power within the network.

Detecting Subgroups and Communities: Visualization can also highlight clusters or subgroups within the network. These might be based on shared interests, common goals, or frequent interaction. Understanding these subgroups is crucial for coalition management and strategic planning, as different groups might have unique needs, concerns, or levels of engagement.

Identifying Outliers and Peripheral Nodes: Network visualizations can also help in identifying outliers or peripheral nodes – those who are less engaged or connected within the network. These actors might represent opportunities for further engagement or potential risks for network cohesion.

Highlighting Network Dynamics: Visualizations can be used to show changes in the network over time, such as the formation or dissolution of ties, the entry or exit of nodes, or changes in nodes’ centrality. These dynamics can provide valuable insights into the evolution of the coalition or network and the impact of various interventions or events.

Software and Tools for SNA

SNA software helps you collect, clean, analyze, and visualize network data to simplify the process of of analyzing social networks. Some tools are free with limited functionality and support, while others require a subscription but are easier to use and come with support. Here are some popular s tools used across many application

Introduction to Popular SNA Tools

Tools like UCINet, Gephi, and Pajek are popular for SNA. They offer a variety of functions for analyzing and visualizing networks, accommodating users of varying skill levels. Here are ten tools for use in different contexts and applications.

UCINet: A comprehensive software package for the analysis of social network data as well as other 1-mode and 2-mode data.
NetDraw: A tool usually used in tandem with UCINet to visualize networks.
Gephi: An open-source network analysis and visualization software package written in Java.
NodeXL: A free and open-source network analysis and visualization software package for Microsoft Excel.
Kumu: A powerful visualization platform for mapping systems and better understanding relationships.
Pajek: Software for analysis and visualization of large networks, it’s particularly good for handling large network datasets.
SocNetV (Social Networks Visualizer): A user-friendly, free and open-source tool.
Cytoscape: A bioinformatics software platform for visualizing molecular interaction networks.
Graph-tool: An efficient Python module for manipulation and statistical analysis of graphs.
Polinode: Tools for network analysis, both for analyzing your own network data and for collecting new network data.

Choosing the Right Tool for Your Analysis:

The right tool depends on your needs. For beginners, a user-friendly interface might be a priority, while experienced analysts may prefer more advanced functions. The size and complexity of your network, as well as your budget, are also important considerations.

PARTNER CPRM: A Community Partner Relationship Management System for Network Mapping

PARTNER CPRM social network analysis platform

For example, we created PARTNER CPRM, a Community Partner Relationship Management System, to replace the CRMs used by most organizations to manage their relationships with their network of strategic partners. Incorporating data collecting, analysis, and visualization features alongside CRM tools like contact management and email tracking, the result is a powerful and easy-to-use network mapping tool.

SNA Case Studies

Looking for a real-world example of a social network analysis project? Here are three examples from recent projects here at Visible Network Labs.

Case Study 1: Leveraging SNA for Program Evaluation

SNA is increasingly becoming a vital tool for program evaluation across various sectors including public health, psychology, early childhood, education, and philanthropy. Its potency is particularly pronounced in initiatives centered around network-building.

Take for instance the Networks for School Improvement Portfolio by the Gates Foundation. The Foundation employed PARTNER, an SNA tool, to assess the growth and development of their educator communities over time. The SNA revealed robust networks that offer valuable benefits to members by fostering information exchange and relationship development. By repeating the SNA process at different stages, they could verify their ongoing success and evaluate the effectiveness of their actions and adjustments.

Read the Complete Case Study Here

Case Study 2: Empowering Coalition-building

In the realm of policy change, building a coalition of partners who share a common goal can be pivotal in overturning the status quo. SNA serves as a strategic tool for developing a coalition structure and optimizing pre-existing relationships among the members.

The Fix CRUS Coalition in Colorado, formulated in response to the closure of five major peaks to public access, is a prime example of this. With the aim of strengthening state liability protections for landowners, the coalition employed PARTNER to evaluate their network and identify key players. Their future plans involve mapping connections to important legislators as their bill progresses through the state legislature. Additionally, their network maps and reports will prove instrumental in acquiring grants and funding.

Case Study 3: Boosting Employee Engagement

In the private sector, businesses are increasingly harnessing SNA to optimize their employee networks, both formal and informal, with the goal of enhancing engagement, productivity, and morale.

Consider the case of Acuity Insurance. In response to a transition to a Hybrid-model amid the COVID-19 pandemic, the company started using PARTNER to gather network data from their employees. Their aim was to maintain their organizational culture and keep employee engagement intact despite the model change. Their ongoing SNA will reveal the level of connectedness within their team, identify employees who are over-networked (and hence at risk of burnout), and pinpoint those who are under-networked and could be missing crucial information or opportunities.

Challenges and Future Directions in Network Analysis

Like all fields and practices, social network analysis faces certain limitations. Practitioners are constantly innovating to find better ways to conduct projects. Here are some barriers in the field and current trends and predictions about the future of SNA.

The Limitations of SNA

SNA is a powerful tool, but it’s not without limitations. It can be time-consuming and complex, particularly with larger networks. Response rates are important to ensure accuracy, which makes data collection more difficult and time-consuming. SNA also requires quality, validated data, and the interpretation of results can be subjective. Software that helps to address these problems requires a significant investment, but the results are often worth it.

Lastly, SNA is a skill that takes time and effort to learn. If you do not have someone in-house with network analysis skills, you may need to hire someone to carry out the analysis or spend time training an employee to build the capacity internally.

Current Trends and Future Predictions

One emerging trend is the increased application of SNA in mapping inter-organizational networks such as strategic partnerships, community health ecosystems, or policy change coalitions. Organizations are realizing the power of these networks and using SNA to navigate them more strategically. With SNA, they can identify key players, assess the strength of relationships, and strategize on how to optimize their network for maximum benefit.

In line with the rise of data science, another trend is the integration of advanced analytics and machine learning with SNA. This fusion allows for the prediction of network behaviors, identification of influential nodes, and discovery of previously unnoticed patterns, significantly boosting the value derived from network data.

The future of SNA is likely to see a greater emphasis on dynamic networks – those that change and evolve over time. With increasingly sophisticated tools and methods, analysts will be better equipped to track network changes and adapt strategies accordingly.

In addition, there is a growing focus on inter-organizational network resilience. As global challenges such as pandemics and climate change underscore the need for collaborative solutions, understanding how these networks can withstand shocks and adapt becomes crucial. SNA will play an instrumental role in identifying weak spots and strengthening the resilience of these networks.

Conclusion: Social Network Analysis 101

SNA offers a unique way to visualize and analyze relationships within a network, be it within an organization or between organizations. It provides valuable insights that can enhance communication, improve efficiency, and inform strategic decisions.

This guide provides an overview of SNA, but there is much more to learn. Whether you’re interested in the theoretical underpinnings, advanced techniques, or the latest developments, we encourage you to delve deeper into this fascinating field.

Resources and Further Reading

For those who want to build more SNA skills and learn more about network science, check out these recommendations for further reading and exploration from the Visible Network Labs team of network science experts.

Recommended Books on SNA

“Network Science” by Albert-László Barabási – A comprehensive introduction to the theory and applications of network science from a leading expert in the field.
“Analyzing Social Networks” by Steve Borgatti, Martin Everett, and Jeffrey Johnson – An accessible introduction, complete with software instructions for carrying out analyses.
“Social Network Analysis: Methods and Applications” by Stanley Wasserman and Katherine Faust – A more advanced, methodological book for those interested in a deep dive into the methods of SNA.
“Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives” by Nicholas Christakis and James Fowler – An engaging exploration of how social networks influence everything from our health to our political views.
“The Network Imperative: How to Survive and Grow in the Age of Digital Business Models” by Barry Libert, Megan Beck, and Jerry Wind – An excellent book for those interested in applying network science in a business context.
“Networks, Crowds, and Markets: Reasoning About a Highly Connected World” by David Easley and Jon Kleinberg – An interdisciplinary approach to understanding networks in social and economic systems. This book combines graph theory, game theory, and market models.

Online Resources and Courses

Here are some online learning opportunities, including online courses, communities, resources hubs, and other places to learn about social network analysis.

Social Network Analysis by Lada Adamic from the University of Michigan
Social and Economic Networks: Models and Analysis by Matthew O. Jackson from Stanford University
Introduction to Social Network Analysis by Dr. Jennifer Golbeck from the University of Maryland, College Park
Statistics.com : Statistics.com offers a free online course called Introduction to SNA taught by Dr. Jennifer Golbeck.
The Social Network Analysis Network: This website provides a directory of resources on network methods, including courses, books, articles, and software.
The SNA Society: This organization provides a forum for social network analysts to share ideas and collaborate on research. They also offer a number of resources on their website, including a list of online courses.

Journals and Research Papers on SNA

These are a few of the most influential cornerstone research papers in network science and analysis methods:

“The Strength of Weak Ties” by Mark Granovetter (1973)
“Structural Holes and Good Ideas” by Ronald Burt (2004)
“ Collective dynamics of ‘small-world’ networks” by Duncan Watts & Steven Strogatz (1998)
“The structure and function of complex networks.” by M.E. Newman (2003).
“Emergence of scaling in random networks” by A. Barabasi (1999).

Check out these peer-reviewed journals for lots of network science content and information:

Social Networks : This is an interdisciplinary and international quarterly journal dedicated to the development and application of network analysis.
Network Science : A cross-disciplinary journal providing a unified platform for both theorists and practitioners working on network-centric problems.
Journal of Social Structure (JoSS) : An electronic journal dedicated to the publication of network analysis research and theory.
Connections : Published by the International Network for Social Network Analysis (INSNA), this journal covers a wide range of social network topics.
Journal of Complex Networks : This journal covers theoretical and computational aspects of complex networks across diverse fields, including sociology.

Frequently Asked Questions about SNA

A: SNA is a research method used to visualize and analyze relationships and connections within a network. In an organizational context, SNA can be used to explore the structure and dynamics of an organization, such as the informal connections that drive formal processes. It can reveal patterns of communication, identify influential entities, and detect potential bottlenecks or gaps.

A: The primary purpose of SNA is to uncover and visualize the relationships between entities within a network. By doing so, it allows us to understand the network’s structure and dynamics. This insight can inform strategic decision-making, facilitate change management, and enhance overall efficiency within an organization.

A: SNA allows researchers to examine the relationships between entities, the overall structure of the network, and the roles and importance of individual entities within it. This can involve studying patterns of communication, collaboration, competition, or any other type of relationship that exists within the network.

A: SNA has a wide range of applications across various fields. In business, it’s used to analyze organizational structures, supply chains, and market dynamics. In public health, it can map the spread of diseases. In sociology and anthropology, SNA is used to study social structures and relationships. Online, SNA is used to study social media dynamics and digital marketing strategies.

A: Key concepts in SNA include nodes (entities) and edges (relationships), network properties like density and centrality, and theories such as the Strength of Weak Ties and Structural Hole Theory. It also encompasses concepts like homophily and heterophily, which describe the tendency for similar or dissimilar nodes to connect.

A: An example of SNA could be a study of communication within a corporation. By treating departments as nodes and communication channels as edges, analysts could visualize the communication network, identify key players, detect potential bottlenecks, and suggest improvements.

A: Social Network Analysis refers to the method of studying the relationships and interactions between entities within a network. It involves mapping out these relationships and applying various analytical techniques to understand the structure, dynamics, and implications of the network.

A: In psychology, SNA can be used to study the social relationships between individuals or groups. It might be used to understand the spread of information, the formation of social groups, the dynamics of social influence, or the impact of social networks on individual behavior and well-being.

A: SNA can be conducted at different levels, depending on the focus of the study. The individual level focuses on a single node and its direct connections (ego networks). The dyadic level looks at the relationship between pairs of nodes, while the triadic level involves three nodes. The global level (whole network) considers the entire network.

A: There are several types of networks in SNA, including ego networks (focused on a single node), dyadic and triadic networks (focused on pairs or trios of nodes), and whole networks. Networks can also be categorized by their structure (like centralized or decentralized), by the type of relationships they represent, or by their application domain (such as organizational, social, or online networks).

A: SNA is used to visualize and analyze the relationships within a network. Its insights can inform strategic decisions, identify influential entities, detect potential weaknesses or vulnerabilities, and enhance the efficiency of communication or processes within an organization or system. It’s also an essential tool for research in fields like sociology, anthropology, business, public health, and digital marketing.

Connect with our Team!

Contact the VNL team to demo PARTNER™ or discuss a research or evaluation project. We can help you learn more about our services, help brainstorm project designs, and provide a custom scope based on your budget and needs. We look forward to connecting!

Email our team: [email protected]

Send a message: Contact Us Here

Get Involved!

Let us know how you’d like to get involved with the Jeffco PARTNER CPRM by choosing from the options below.

Join our next webinar: Marketing & Communication Strategies & Tactics for Networks & Coalitions

Choose a free gift.

Click one of the links below to download a free resource to strengthen your community partnerships, collaborative network, and strategic ecosystem.

Network Leadership Guide

Advice for building, managing, and assessing cross-sector networks or coalitions of partners.

Ecosystem Mapping Template

A template to map the connections and interactions between key stakeholders in your community.

Network Strategy Planner

A worksheet and guide to help you think through and develop your network or ecosystem strategy.

Subscribe to our Network Science Newsletter!

Get monthly updates on VNL news, new research, funding opportunities, and other resources related to network and ecosystem mapping and management.

Social Network Analysis: A Survey on Process, Tools, and Application

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, index terms.

Information systems

Information systems applications

Collaborative and social computing systems and tools

Social networking sites

World Wide Web

Online advertising

Social advertising

Web applications

Social networks

Recommendations

Social network analysis: a survey on measure, structure, language information analysis, privacy, and applications.

The rapid growth in popularity of online social networks provides new opportunities in computer science, sociology, math, information studies, biology, business, and more. Social network analysis (SNA) is a paramount technique supporting understanding ...

Broad Learning:: An Emerging Area in Social Network Analysis

Looking from a global perspective, the landscape of online social networks is highly fragmented. A large number of online social networks have appeared, which can provide users with various types of services. Generally, information available in these ...

Understanding user behavior in a local social media platform by social network analysis

Characterizing user behavior by social network analysis in social media has been an active research domain for a long time. However, much previous research has focused on the large-scale global social media such as Facebook, Wikipedia and Twitter. ...

Information

Published in.

Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland

University of Bologna, Italy

Association for Computing Machinery

New York, NY, United States

Publication History

Check for updates, author tags.

Information diffusion
influence maximization
link prediction
community detection
social network analysis

Contributors

Other metrics, bibliometrics, article metrics.

0 Total Citations
1,119 Total Downloads
Downloads (Last 12 months) 1,119
Downloads (Last 6 weeks) 135

View Options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

View this article in Full Text.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
Download citation
Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

A-Z Publications

Annual Review of Organizational Psychology and Organizational Behavior

Volume 9, 2022, review article, new developments in social network analysis.

Daniel J. Brass 1
View Affiliations Hide Affiliations Affiliations: LINKS Center for Social Network Analysis, Department of Management, Gatton College of Business and Economics, University of Kentucky, Lexington, Kentucky, USA; email: [email protected]
Vol. 9:225-246 (Volume publication date January 2022) https://doi.org/10.1146/annurev-orgpsych-012420-090628
First published as a Review in Advance on October 26, 2021
Copyright © 2022 by Annual Reviews. All rights reserved

This review of social network analysis focuses on identifying recent trends in interpersonal social networks research in organizations, and generating new research directions, with an emphasis on conceptual foundations. It is organized around two broad social network topics: structural holes and brokerage and the nature of ties. New research directions include adding affect, behavior, and cognition to the traditional structural analysis of social networks, adopting an alter-centric perspective including a relational approach to ego and alters, moving beyond the triad in structural hole and brokerage research to consider alters as brokers, expanding the nature of ties to include negative, multiplex/dissonant, and dormant ties, and exploring the value of redundant ties. The challenge is to answer the question “What's next in social network analysis?”

Article metrics loading...

Full text loading...

Literature Cited

Aral S , Van Alstyne M. 2011 . The diversity-bandwidth trade-off. Am. J. Sociol. 117 : 90– 171 [Google Scholar]
Asch SE 1951 . Effects of group pressure upon the modification and distortion of judgements. Groups, Leadership, and Men H Guetzkow 171– 90 Pittsburgh, PA: Carnegie Press [Google Scholar]
Baker W. 2019 . Emotional energy, relational energy, and organizational energy: toward a multilevel model. Annu. Rev. Organ. Psychol. Organ. Behav. 6 : 373– 95 [Google Scholar]
Barley SR. 1990 . The alignment of technology and structure through roles and networks. Adm. Sci. Q. 35 : 61– 103 [Google Scholar]
Battilana J , Casciaro T. 2012 . Change agents, networks, and institutions: a contingency theory of organizational change. Acad. Manag. J. 55 : 381– 98 [Google Scholar]
Battilana J , Casciaro T. 2013 . Overcoming resistance to organizational change: strong ties and affective cooptation. Manag. Sci. 59 : 4 819– 36 [Google Scholar]
Bian Y. 1997 . Bringing strong ties back in: indirect ties, network bridges, and job searches in China. Am. Sociol. Rev. 62 : 366– 85 [Google Scholar]
Bizzi L. 2013 . The dark side of structural holes: a multilevel investigation. J. Manag. 39 : 1554– 78 [Google Scholar]
Borgatti SP , Brass DJ 2020 . Centrality: concepts and measures. Social Networks at Work DJ Brass, SP Borgatti 9– 22 New York: Routledge [Google Scholar]
Borgatti SP , Brass DJ , Halgin DS 2014 . Social network research: confusions, criticisms, and controversies. Research in the Sociology of Organizations , Vol. 40: Contemporary Perspectives on Organizational Social Networks DJ Brass, G Labianca, A Mehra, DS Halgin, SP Borgatti 1– 33 New York: Emerald Group Publ. [Google Scholar]
Borgatti SP , Cross R. 2003 . A relational view of information seeking and learning in social networks. Manag. Sci. 49 : 432– 45 [Google Scholar]
Borgatti SP , Everett MG , Freeman LC 2002 . UCINET 6 for Windows: Software for Social Network Analysis Harvard, MA: Analytic Technol. [Google Scholar]
Borgatti SP , Everett MG , Johnson JC. 2018 . Analyzing Social Networks Los Angeles: Sage [Google Scholar]
Borgatti SP , Halgin DS 2011a . Analyzing affiliation networks. The SAGE Handbook of Social Network Analysis J Scott, PJ Harrington 417– 33 London: Sage [Google Scholar]
Borgatti SP , Halgin DS. 2011b . On network theory. Organ. Sci. 22 : 168– 81 [Google Scholar]
Borgatti SP , Mehra A , Brass DJ , Labianca G. 2009 . Network analysis in the social sciences. Science 323 : 5916 892– 95 [Google Scholar]
Brands RA. 2013 . Cognitive social structures in social network research: a review. J. Organ. Behav. 34 : S1 82– 103 [Google Scholar]
Brands RA , Brady G , Shah N , Mehra A. 2021 . Alter-centric uncertainty and the willingness to accept referrals from would-be brokers Work. Pap., London Bus. Sch London: [Google Scholar]
Brands RA , Kilduff M. 2014 . Just like a woman? Effects of gender-biased perceptions of friendship network brokerage on attributions and performance. Organ. Sci. 25 : 1530– 48 [Google Scholar]
Brands RA , Mehra A. 2019 . Gender, brokerage, and performance: a construal approach. Acad. Manag. J. 62 : 196– 219 [Google Scholar]
Brass DJ. 1984 . Being in the right place: a structural analysis of individual influence in an organization. Adm. Sci. Q. 29 : 518– 39 [Google Scholar]
Brass DJ 2009 . Connecting to brokers: strategies for acquiring social capital. Social Capital: Reaching Out, Reaching In VO Bartkus, JH Davis 260– 74 Cheltenham, UK: Edward Elgar Publ. [Google Scholar]
Brass DJ 2012 . A social network perspective on organizational psychology. The Oxford Handbook of Organizational Psychology SWJ Kozlowski 667– 95 New York: Oxford Univ. Press [Google Scholar]
Brass DJ 2018 . A social network perspective on organizational citizenship behavior. The Oxford Handbook of Organizational Citizenship Behavior PM Podsakoff, SB MacKenzie, NP Podsakoff 317– 30 New York: Oxford Univ. Press [Google Scholar]
Brass DJ , Borgatti SP 2018 . Multilevel thoughts on social networks. The Handbook for Multilevel Theory, Measurement, and Analysis JM LeBurton, S Humphrey 187– 200 Washington, DC: Am. Psychol. Assoc. [Google Scholar]
Brass DJ , Borgatti SP 2020 . Social Networks at Work New York: Routledge [Google Scholar]
Brass DJ , Burkhardt ME. 1993 . Potential power and power use: an investigation of structure and behavior. Acad. Manag. J. 36 : 441– 70 [Google Scholar]
Brass DJ , Butterfield KD , Skaggs BC. 1998 . Relationships and unethical behavior: a social network perspective. Acad. Manag. Rev. 23 : 1 14– 31 [Google Scholar]
Brass DJ , Galaskiewicz J , Greve HR , Tsai W 2004 . Taking stock of networks and organizations: a multilevel perspective. Acad. Manag. J. 47 : : 795– 819 [Google Scholar]
Brass DJ , Krackhardt D 2012 . Power, politics, and social networks in organizations. Politics in Organizations: Theory and Research Considerations GR Ferris, DC Treadway 355– 75 New York: Routledge [Google Scholar]
Breiger RL. 1974 . The duality of persons and groups. Soc. Forces. 53 : 181– 90 [Google Scholar]
Brennecke J. 2020 . Dissonant ties in intraorganizational networks: why individuals seek problem-solving assistance from difficult colleagues. Acad. Manag. J. 63 : 743– 78 [Google Scholar]
Burkhardt ME , Brass DJ. 1990 . Changing patterns or patterns of change: the effects of a change in technology on social network structure and power. Adm. Sci. Q. 35 : 104– 27 [Google Scholar]
Burt RS. 1992 . Structural Holes: The Social Structure of Competition Cambridge, MA: Harvard Univ. Press [Google Scholar]
Burt RS. 2002 . Bridge decay. Soc. Netw 24 : 4 333– 63 [Google Scholar]
Burt RS. 2004 . Structural holes and good ideas. Am. J. Sociol. 110 : 2 349– 99 [Google Scholar]
Burt RS. 2007 . Second-hand brokerage: evidence on the importance of local structure on managers, bankers, and analysts. Acad. Manag. J. 50 : 110– 45 [Google Scholar]
Burt RS , Burzynska K. 2017 . Chinese entrepreneurs, social networks, and guanxi . Manag. Organ. Rev. 13 : 2 221– 60 [Google Scholar]
Burt RS , Hogarth RM , Michaud C. 2000 . The social capital of French and American managers. Organ. Sci. 11 : 123– 47 [Google Scholar]
Burt RS , Kilduff M , Tasselli S. 2013 . Social network analysis: foundations and frontiers on advantage. Annu. Rev. Psychol. 64 : 527– 47 [Google Scholar]
Burt RS , Merluzzi J 2014 . Embedded brokerage: hubs versus locals. Research in the Sociology of Organizations , Vol. 40: Contemporary Perspectives on Organizational Social Networks DJ Brass, G Labianca, A Mehra, DS Halgin, SP Borgatti 161– 78 Bingley, UK: Emerald Group Publ. [Google Scholar]
Burt RS , Merluzzi J. 2016 . Network oscillation. Acad. Manag. Discov. 2 : 368– 91 [Google Scholar]
Buskens V , van de Rijt A. 2008 . Dynamics of networks if everyone strives for structural holes. Am. J. Sociol. 114 : 2 371– 407 [Google Scholar]
Byron K , Landis B 2020 . Relational misperceptions in the workplace: new frontiers and challenges. Organ. Sci. 31 : 1 223– 42 [Google Scholar]
Cannon-Bowers JA , Salas E , Converse S 1993 . Shared mental models in expert team decision making. Individual and Group Decision Making NJ Castellan, pp 221– 46 Hillsdale, NJ: Lawrence Erlbaum Assoc. [Google Scholar]
Carnabuci G , Diószegi B. 2015 . Social networks, cognitive style, and innovative performance: a contingency perspective. Acad. Manag. J. 58 : 881– 905 [Google Scholar]
Casciaro T. 2020 . Networks and affect in the workplace. Social Networks at Work DJ Brass, SP Borgatti 21– 48 New York: Routledge [Google Scholar]
Casciaro T , Lobo MS. 2005 . Competent jerks, lovable fools and the formation of social networks. Harvard. Bus. Rev. 83 : 92– 99 [Google Scholar]
Casciaro T , Lobo MS. 2008 . When competence is irrelevant: the role of interpersonal affect in task-related ties. Adm. Sci. Q. 53 : 4 65– 84 [Google Scholar]
Centola D. 2010 . The spread of behavior in an online social network experiment. Science 329 : 1194– 97 [Google Scholar]
Coleman JS. 1990 . Foundations of Social Theory Cambridge, MA: Harvard. Univ. Press [Google Scholar]
Coleman J , Katz E , Menzel H. 1957 . The diffusion of innovation among physicians. Sociometry 20 : 253– 70 [Google Scholar]
Contractor NS , Wasserman S , Faust K. 2006 . Testing multitheoretical, multilevel hypotheses about organizational networks: an analytic framework and empirical results. Acad. Manag. Rev. 31 : 681– 703 [Google Scholar]
Cross R , Parker A. 2004 . The Hidden Power of Social Networks: Understanding How Work Really Gets Done in Organizations Boston: Harvard. Bus. Sch. Press [Google Scholar]
Cullen-Lester KL , Maupan CK , Carter DR 2017 . Incorporating social networks into leadership development: a conceptual model and evaluation of research and practice. Lead. Q. 28 : 130– 52 [Google Scholar]
Cummings JN , Cross R. 2003 . Structural properties of work groups and their consequences for performance. Soc. Netw. 25 : 197– 210 [Google Scholar]
Dahlander L , McFarland DA. 2013 . Ties that last: tie formation and persistence in research collaborations over time. Adm. Sci. Q. 58 : 69– 110 [Google Scholar]
Ellwardt L , Steglich C , Wittek R. 2012 . The co-evolution of gossip and friendship in workplace social networks. Soc. Netw. 34 : 623– 33 [Google Scholar]
Feld SL. 1981 . The focused organization of social ties. Am. J. Sociol. 86 : 5 1015– 35 [Google Scholar]
Fernandez RM , Gould RV. 1994 . A dilemma of state power: brokerage and influence in the national health policy domain. Am. J. Sociol. 99 : 1455– 91 [Google Scholar]
Freeman L. 1979 . Centrality in social networks: conceptual clarification. Soc. Netw. 1 : 215– 39 [Google Scholar]
Freeman L , Romney K , Freeman S 1987 . Cognitive structure and informant accuracy. Am. Anthropol. 89 : 310– 25 [Google Scholar]
Galunic C , Ertug G , Gargiulo M. 2012 . The positive externalities of social capital: benefiting from senior brokers. Acad. Manag. J. 55 : 5 1213– 31 [Google Scholar]
Granovetter MS. 1973 . The strength of weak ties. Am. J. Sociol. 6 : 1360– 80 [Google Scholar]
Granovetter MS. 1985 . Economic action and social structure: the problem of embeddedness. Am. J. Soc. 91 : 481510 [Google Scholar]
Grosser TJ , Obstfeld D , Choi EW , Woehler M , Lopez-Kidwell V et al. 2018 . A sociopolitical perspective on employee innovativeness and job performance: the role of political skill and network structure. Organ. Sci. 29 : 4 612– 32 [Google Scholar]
Grosser TJ , Obstfeld D , Labianca G , Borgatti S. 2019 . Measuring mediation and separation brokerage orientations: a further step toward studying the social network brokerage process. Acad. Manag. Discov. 5 : 114– 36 [Google Scholar]
Grosser TJ , Park S , Mathieu JE , Reobuck AA. 2020 . Network thinking in teams research. Social Networks at Work DJ Brass, SP Borgatti 309– 32 New York: Routledge [Google Scholar]
Hahl O , Kacperczyk A , Davis JP 2016 . Knowledge asymmetry and brokerage: linking network perception to position in structural holes. Strategic Organ 14 : 2 118– 43 [Google Scholar]
Halevy N , Halali E , Zlatev JJ 2019 . Brokerage and brokering: an integrative review and organizing framework for third party influence. Acad. Manag. Ann. 13 : 215– 39 [Google Scholar]
Halgin DS , Borgatti SP , Huang Z. 2020 . Prismatic effects of negative ties. Soc. Netw. 60 : 26– 33 [Google Scholar]
Hansen MT. 1999 . The search-transfer problem: the role of weak ties in sharing knowledge across organizational subunits. Adm. Sci. Q. 44 : 82– 111 [Google Scholar]
Harris JK. 2014 . An Introduction to Exponential Random Graph Modeling Los Angeles: Sage [Google Scholar]
Hasan S. 2020 . Social networks and careers. Social Networks at Work DJ Brass, SP Borgatti 228– 50 New York: Routledge [Google Scholar]
Iyengar R , Van den Bulte C , Valente TW. 2011 . Opinion leadership and social contagion in new product diffusion. Mark. Sci. 30 : 195– 212 [Google Scholar]
Kilduff M , Brass DJ. 2010 . Organizational social network research: core ideas and key debates. Acad. Manag. Ann. 4 : 1 317– 57 [Google Scholar]
Kilduff M , Buengerler C. 2020 . Self-monitoring: a personality theory for network research. Social Networks at Work DJ Brass, SP Borgatti 155– 77 New York: Routledge [Google Scholar]
Kilduff M , Crossland C , Tsai W , Krackhardt D. 2008 . Organizational network perceptions versus reality: A small world after all? Organ . Behav. Hum. Decis. Process. 107 : 1 15– 28 [Google Scholar]
Kilduff M , Krackhardt D. 1994 . Bringing the individual back in: a structural analysis of the internal market for reputation in organizations. Acad. Manag. J. 37 : 87– 108 [Google Scholar]
Kilduff M , Lee JW. 2020 . The integration of people and networks. Annu. Rev. Organ. Psychol. Organ. Behav. 7 : 155– 79 [Google Scholar]
Kilduff M , Tsai W. 2003 . Social Networks and Organizations London: Sage [Google Scholar]
Kleinbaum AM. 2012 . Organizational misfits and the origins of brokerage in intrafirm networks. Adm. Sci. Q. 57 : 3 407– 52 [Google Scholar]
Kleinbaum AM. 2018 . Reorganization and tie decay choices. Manag. Sci. 64 : 2219– 37 [Google Scholar]
Kleinbaum AM , Jordan AH , Audia PG. 2015 . An alter-centric perspective on the origins of brokerage in social networks: how perceived empathy moderates the self-monitoring effect. Organ. Sci. 26 : 4 1226– 42 [Google Scholar]
Krackhardt D. 1987 . Cognitive social structure. Soc. Netw. 9 : 109– 34 [Google Scholar]
Krackhardt D. 1990 . Assessing the political landscape: structure, cognition, and power in organizations. Adm. Sci. Q. 35 : 342– 69 [Google Scholar]
Krackhardt D 1994 . Constraints on the interactive organization as an ideal type. The Post-Bureaucratic Organization: New Perspectives on Organizational Change C Hecksher, A Donnellon 211– 22 Thousand Oaks, CA: Sage [Google Scholar]
Krackhardt D 1999 . Simmelian ties: super strong and sticky. Power and Influence in Organizations R Kramer, M Neale 21– 38 Thousand Oaks, CA: Sage [Google Scholar]
Krackhardt D , Kilduff M. 1999 . Whether close or far: social distance effects on perceived balance in friendship networks. J. Pers. Soc. Psychol. 76 : 5 770– 82 [Google Scholar]
Krackhardt D , Porter LW. 1985 . When friends leave: a structural analysis of the relationship between turnover and stayers’ attitudes. Adm. Sci. Q. 30 : 242– 61 [Google Scholar]
Kwon S , Rondi E , Levin DZ , DeMassis A , Brass DJ 2020 . Network brokerage: an integrative review and future research agenda. J. Manag. 46 : 1092– 1120 [Google Scholar]
Labianca G , Brass DJ. 2006 . Exploring the social ledger: negative relationships and negative asymmetry in social networks in organizations. Acad. Manag. Rev. 31 : 596– 614 [Google Scholar]
Landis B , Kilduff M , Menges JI , Kilduff GJ. 2018 . The paradox of agency: feeling powerful reduces brokerage opportunity recognition yet increases willingness to broker. J. Appl. Psychol. 103 : 929– 38 [Google Scholar]
Lazega E. 2020 . Bureaucracy, Collegiality and Social Change: Redefining Organizations with Multilevel Relational Infrastructures Cheltenham, UK: Edward Elgar Publ. [Google Scholar]
Lazega E , Snijders TAB 2015 . Multilevel Network Analysis for the Social Sciences: Theory, Methods and Applications New York: Springer [Google Scholar]
Lazer D. 2001 . The co-evolution of individual and network. J. Math. Sociol. 25 : 69– 108 [Google Scholar]
Levin DZ , Walter J , Appleyard MM , Cross R. 2016 . Relational enhancement: how the relational dimension of social capital unlocks the value of network-bridging ties. Group Organ. Manag. 41 : 415– 57 [Google Scholar]
Levin DZ , Walter J , Murnighan JK 2011 . Dormant ties: the value of reconnecting. Organ. Sci. 22 : 923– 39 [Google Scholar]
Lin N. 1999 . Social networks and status attainment. Annu. Rev. Sociol. 25 : 467– 87 [Google Scholar]
Mannucci PV , Perry-Smith JE. 2021 .. “ Who are you going to call?” Network activation in creative idea generation and elaboration. Acad. Manag. J. https://doi.org/10.5465/amj.2019.0333 . In press [Crossref] [Google Scholar]
Maoret M , Tortoriello M , Iubatti D. 2020 . Big fish, big pond? The joint effect of formal and informal core–periphery positions on innovation productivity. Organ. Sci. 31 : 6 1538– 59 [Google Scholar]
McFadyen MA , Semadeni M , Cannella AA Jr 2009 . Value of strong ties to disconnected others: examining knowledge creation in biomedicine. Organ. Sci. 20 : 3 552– 64 [Google Scholar]
Mehra A , Kilduff M , Brass DJ. 2001 . The social networks of high and low self-monitors: implications for workplace performance. Adm. Sci. Q. 46 : 1 121– 46 [Google Scholar]
Methot JR , Lepine JA , Podsakoff NP , Christian JS. 2016 . Are workplace friendships a mixed blessing? Exploring tradeoffs of multiplex relationships and their associations with job performance. Pers. Psychol. 69 : 2 311– 55 [Google Scholar]
Methot JR , Rosado-Solomon E. 2020 . Multiplex relationships in organizations: applying an ambivalence lens. Social Networks at Work DJ Brass, SP Borgatti 79– 103 New York: Routledge [Google Scholar]
Moliterno TP , Mahony DM. 2011 . Network theory of organization: a multilevel approach. J. Manag. 37 : 2 443– 67 [Google Scholar]
Moreno JL. 1934 . Who Shall Survive?: A New Approach to the Problem of Human Interrelations Washington, DC: Nerv. Ment. Disease Publ. [Google Scholar]
Nebus J. 2006 . Building collegial information networks: a theory of advice network generation. Acad. Manag. Rev. 31 : 615– 37 [Google Scholar]
Newcomb TM. 1961 . The Acquaintance Process New York: Holt [Google Scholar]
Obstfeld D. 2005 . Social networks, the tertius iungens orientation, and involvement in innovation. Adm. Sci. Q. 50 : 100– 30 [Google Scholar]
Obstfeld D , Borgatti SP , Davis J 2014 . Brokerage as a process: decoupling third party action from social network structure. Research in the Sociology of Organizations , Vol. 40: Contemporary Perspectives on Organizational Social Networks DJ Brass, G Labianca, A Mehra, DS Halgin, SP Borgatti 135– 59 Bingley, UK: Emerald Group Publ. [Google Scholar]
Oh H , Kilduff M. 2008 . The ripple effect of personality on social structure: self-monitoring origins of network brokerage. J. Appl. Psych. 93 : 1155– 64 [Google Scholar]
Parkinson C , Kleinbaum AM , Wheatly T. 2018 . Similar neural responses predict friendship. Nat. Commun. 9 : 332 [Google Scholar]
Paruchuri S. 2010 . Intraorganizational networks, interorganizational networks, and the impact of central inventors: a longitudinal study of pharmaceutical firms. Organ. Sci. 21 : 63– 80 [Google Scholar]
Paruchuri S , Goossen MC , Phelps C 2018 . Conceptual foundations of multilevel social networks. The Handbook for Multilevel Theory, Measurement, and Analysis SE Humphrey, JM LeBreton 201– 22 Washington, DC: Am. Psychol. Assoc. [Google Scholar]
Perry BL , Pescosolido BA , Borgatti SP. 2018 . Egocentric Network Analysis Cambridge, UK: Cambridge Univ. Press [Google Scholar]
Perry-Smith JE , Mannucci PV. 2017 . From creativity to innovation: the social network drivers of the four phases of the idea journey. Acad. Manag. Rev. 42 : 53– 79 [Google Scholar]
Podolny JM. 2001 . Networks as the pipes and prisms of the market. Am. J. Sociol. 107 : 33– 60 [Google Scholar]
Quintane E , Carnabuci G 2016 . How do brokers broker? Tertius gaudens, tertius iungens, and the temporality of structural holes. Organ. Sci 27 : 6 1343– 60 [Google Scholar]
Reagans R , Zuckerman E , McEvily B 2004 . How to make the team: social networks vs. demography as criteria for designing effective teams. Adm. Sci. Q. 49 : 101– 33 [Google Scholar]
Rider CI 2009 . Constraints on the control benefits of brokerage: a study of placement agents in U.S. venture capital fundraising. Adm. Sci. Q. 54 : 575– 601 [Google Scholar]
Rivera MT , Soderstrom SB , Uzzi B. 2010 . Dynamics of dyads in social networks: assortative, relational, and proximity mechanisms. Annu. Rev. Sociol. 36 : 1 91– 115 [Google Scholar]
Roethlisberger FJ , Dixon WJ. 1939 . Management and the Worker Cambridge, MA: Harvard Univ. Press [Google Scholar]
Sasidharan S , Santhanam R , Brass DJ , Sambamurthy V. 2012 . The effects of social network structure on enterprise system success: a longitudinal multilevel analysis. Info. Sys. Res. 23 : 658– 78 [Google Scholar]
Sasovova Z , Mehra A , Borgatti SP , Schippers MC. 2010 . Network churn: the effects of self-monitoring personality on brokerage dynamics. Adm. Sci. Q. 55 : 639– 68 [Google Scholar]
Seibert SE , Kraimer ML , Liden RC. 2001 . A social capital theory of career success. Acad. Manag. J. 44 : 219– 37 [Google Scholar]
Simmel G 1950 . The Sociology of Georg Simmel New York: Free Press [Google Scholar]
Smith EB , Menon T , Thompson L. 2012 . Status differences in the cognitive activation of social networks. Organ. Sci. 23 : 67– 82 [Google Scholar]
Snijders T , Koskinen J 2013 . Longitudinal models. Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications D Lusher, J Koskinen, G Robins 130– 40 New York: Cambridge Univ. Press [Google Scholar]
Soda G , Mannucci PV , Burt RS. 2021 . Networks, creativity, and time: staying creative through brokerage and network rejuvenation. Acad. Manag. J. 64 : 1164 – 90 [Google Scholar]
Soda G , Tortoriello M , Iorio A. 2018 . Harvesting value from brokerage: individual strategic orientation, structural holes, and performance. Acad. Manag. J 61 : 896– 918 [Google Scholar]
Soltis SM , Brass DJ , Lepak DM. 2018 . Social resource management: an integration of social networks and human resource management. Acad. Manag. Ann. 12 : 537– 73 [Google Scholar]
Sparrowe RT , Liden RC. 2005 . Two routes to influence: integrating leader-member exchange and network perspectives. Adm. Sci. Q. 50 : 505– 35 [Google Scholar]
Stovel K , Shaw L. 2012 . Brokerage. Annu. Rev. Sociol. 38 : 139– 58 [Google Scholar]
Tasselli S , Kilduff M. 2018 . When brokerage between friendship cliques endangers trust: a personality-network fit perspective. Acad. Manag. J. 61 : 3 802– 25 [Google Scholar]
Tasselli S , Kilduff M. 2021 . Network agency. Acad. Manag. Ann. 15 : 68 – 110 [Google Scholar]
Tasselli S , Kilduff M , Landis B. 2018 . Personality change: implications for organizational behavior. Acad. Manag. Ann. 12 : 467– 93 [Google Scholar]
Ter Wal A , Criscuolo P , McEvily B , Salter A 2020 . Dual networking: how collaborators network in their quest for innovation. Adm. Sci. Q. 65 : 4 887– 930 [Google Scholar]
Tortoriello M , Reagans R , McEvily B. 2012 . Bridging the knowledge gap: the influence of strong ties, network cohesion, and network range on the transfer of knowledge between organizational units. Organ. Sci. 23 : 4 1024– 39 [Google Scholar]
Travers J , Milgram S 1969 . An experimental study of the “small world” problem. Sociometry 32 : 425– 43 [Google Scholar]
Tröster C , Parker A , van Knippenberg D , Sahlmüller B. 2019 . The coevolution of social networks and thoughts of quitting. Acad. Manag. J. 62 : 1 22– 43 [Google Scholar]
Uzzi B. 1997 . Social structure and competition in interfirm networks: the paradox of embeddedness. Adm. Sci. Q. 42 : 35– 67 [Google Scholar]
Venkataramani V , Labianca G , Grosser T. 2013 . Positive and negative workplace relationships, social satisfaction, and organizational attachment. J. Appl. Psychol. 98 : 1028– 39 [Google Scholar]
Vedres B. 2017 . Forbidden triads and creative success in jazz: the Miles Davis factor. Appl. Netw. Sci. 2 : 31 [Google Scholar]
Wagner WG , Pfeffer J , O'Reilly CA. 1984 . Organizational demography and turnover in top-management teams. Adm. Sci. Q. 29 : 74– 92 [Google Scholar]
Walter J , Levin DZ , Murnighan JK 2015 . Reconnection choices: selecting the most valuable (versus most preferred) dormant ties. Organ. Sci. 26 : 5 1447– 65 [Google Scholar]
Watts DJ. 2003 . Six Degrees: The Science of a Connected Age New York: WW Norton [Google Scholar]
Xiao Z , Tsui AS 2007 . When brokers may not work: the cultural contingency of social capital in Chinese high-tech firms. Adm. Sci. Q. 52 : 1– 31 [Google Scholar]
Yang SW , Soltis SM , Ross JR , Labianca GJ 2021 . Dormant tie reactivation as an affiliative coping response to stressors during the COVID-19 crisis. J. Appl. Psychol. 106 : 4 489– 50 [Google Scholar]
Article Type: Review Article

Most Read This Month

Most cited most cited rss feed, conservation of resources in the organizational context: the reality of resources and their consequences, self-determination theory in work organizations: the state of a science, burnout and work engagement: the jd–r approach, psychological safety: the history, renaissance, and future of an interpersonal construct, employee voice and silence, psychological capital: an evidence-based positive approach, how technology is changing work and organizations, research on workplace creativity: a review and redirection, abusive supervision, alternative work arrangements: two images of the new world of work.

Social network analysis: An overview

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(5):e1256

Institute for Systems and Computer Engineering, Technology and Science (INESC TEC)

Universidade Federal de Uberlândia (UFU)

University of Porto

Discover the world's research

25+ million members
160+ million publication pages
2.3+ billion citations

Jyunichi Miyakoshi
Tadayuki Matsumura
Yasuo Deguchi
J Manuf Tech Manag

Karuna Jain

Ayu Fitriatul 'Ulya
Sun Zhichao
Hongyang Lu
Yajun Huang

Koustelios Athanasios

Akmal Nur Alif Hidayatullah

SOC SCI COMPUT REV
Nils Brandenstein

Swapna S. Gokhale

Ronald S. Burt

Mark Granovetter

Gina M. B. Oliveira
Herbert S. Parnes
Mark S. Granovetter
Chengliang Liu

Michalis Faloutsos
Petros Faloutsos
Christos Faloutsos

M. E. J. Newman
Recruit researchers
Join for free
Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Log in using your username and password

Search More Search for this keyword Advanced search
Latest content
For authors
Browse by collection
BMJ Journals

Eshleen Grewal 1 ,
Jenny Godley 2 , 3 , 4 ,
Justine Wheeler 5 ,
http://orcid.org/0000-0001-9008-2289 Karen L Tang 1 , 3 , 4
1 Department of Medicine , University of Calgary , Calgary , Alberta , Canada
2 Department of Sociology , University of Calgary , Calgary , Alberta , Canada
3 Department of Community Health Sciences , University of Calgary , Calgary , Alberta , Canada
4 O’Brien Institute for Public Health , University of Calgary , Calgary , Alberta , Canada
5 Libraries and Cultural Resources , University of Calgary , Calgary , Alberta , Canada
Correspondence to Dr Karen L Tang; klktang{at}ucalgary.ca

Introduction Social networks can affect health beliefs, behaviours and outcomes through various mechanisms, including social support, social influence and information diffusion. Social network analysis (SNA), an approach which emerged from the relational perspective in social theory, has been increasingly used in health research. This paper outlines the protocol for a scoping review of literature that uses social network analytical tools to examine the effects of social connections on individual non-communicable disease and health outcomes.

Methods and analysis This scoping review will be guided by Arksey and O’Malley’s framework for conducting scoping reviews. A search of the electronic databases, Ovid Medline, PsycINFO, EMBASE and CINAHL, will be conducted in April 2024 using terms related to SNA. Two reviewers will independently assess the titles and abstracts, then the full text, of identified studies to determine whether they meet inclusion criteria. Studies that use SNA as a tool to examine the effects of social networks on individual physical health, mental health, well-being, health behaviours, healthcare utilisation, or health-related engagement, knowledge, or trust will be included. Studies examining communicable disease prevention, transmission or outcomes will be excluded. Two reviewers will extract data from the included studies. Data will be presented in tables and figures, along with a narrative synthesis.

Ethics and dissemination This scoping review will synthesise data from articles published in peer-reviewed journals. The results of this review will map the ways in which SNA has been used in non-communicable disease health research. It will identify areas of health research where SNA has been heavily used and where future systematic reviews may be needed, as well as areas of opportunity where SNA remains a lesser-used method in exploring the relationship between social connections and health outcomes.

Protocols & guidelines
Social Interaction
Social Support

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjopen-2023-078872

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

This is a novel scoping review that fills an important gap—how and where social network analysis (SNA) (as a data collection and analytical tool) has been used in health research has not been systematically documented despite its increasing use in the discipline.

The breadth of the scoping review allows for a comprehensive mapping of the use of SNA to examine social connections and non-communicable disease and health outcomes, without limiting to any one population group or setting.

The use of the Arksey and O’Malley framework as well as the Levac et al recommendations to guide our scoping review will ensure that a rigorous and transparent process is undertaken.

Due to the scope of the review and the large volume of anticipated studies, only published articles in the English language will be included.

Introduction

Social connections are known to influence health. 1 People with many supportive social connections tend to be healthier and live longer than people who have fewer supportive social connections, while social isolation, or the absence of supportive social connections, is associated with the deterioration of physical and psychological health, and even death. 2–5 These associations hold even when accounting for socioeconomic status and health practices. 6 Additionally, having a low quantity of supportive social connections is associated with the development or worsening of medical conditions, such as atherosclerosis, hypertension, cardiovascular disease and cancer, potentially through chronic inflammation and changes to autonomic regulation and immune responses. 7–13 Unsupportive social connections can also have adverse effects on health due to emotional stress, which can then lead to poor health habits, psychological distress and negative physiological responses (eg, increased heart rate and blood pressure), all of which are detrimental to health over time. 14 The health of individuals is therefore connected to the people around them. 15

Social networks can influence health via five pathways. 15 16 First, networks can provide social support, to meet the needs of the individual. Dyadic relationships can provide informational, instrumental (ie, aid and assistance with tangible needs), appraisal (ie, help with decision-making) and/or emotional support; this support can be enhanced or hindered by the overall network structure. 17 In addition to the tangible aid and resources that are provided, social support—either perceived or actual—also has direct effects on mental health, well-being and feelings of self-efficacy. 18–20 Social support may also act as a buffer to stress. 16 19 The second pathway by which social networks influence health, and in particular health behaviours such as alcohol and cigarette use, physical activity, food intake patterns and healthcare utilisation, is through social influence. 16 21 That is, the attitudes and behaviour of individuals are guided and altered in response to other network members. 22 23 Social influence is difficult to disentangle from social selection from an empirical standpoint. That is, similarities in behaviour may be due to influences within a network, or alternatively, they may reflect the known phenomenon where individuals tend to form close connections with others who are like them. 22 24 The third pathway is through the promotion of social engagement and participation. Individuals derive a sense of identity, value and meaning through the roles they play (eg, parental roles, community roles, professional roles, etc) in their networks, and the opportunities for participation in social contexts. 16 The fourth pathway by which networks affect health is through transmission of communicable diseases through person-to-person contact. Finally, social networks overlap, resulting in differential access to resources and opportunities (eg, finances, information and jobs). 15 16 An individual’s structural position can result in differential health outcomes, similar to the inequities that stem from differences in social status. 16

There has been an explosion of literature in the area of social networks and health. In their bibliometric analysis, Chapman et al found that the number of studies that examine social networks and health has sextupled since 2000. 25 Similarly, the value of grants and contracts in this topic area, as awarded by the National Science Foundation and the National Institutes of Health, has increased 10-fold. 25 A turning point in the field was the HIV epidemic, where there was an urgent need to better understand its spread. 25 The exponential rise in the number of studies since then that examine social networks and health appears to reflect a widespread understanding that an individual’s health cannot be isolated from his or her social networks and context. There is, however, significant heterogeneity in what aspect of, and how, social networks are being studied. For example, many health research studies use proxies for social connectedness such as marital status or living alone status (as these variables tend to be commonly included in health surveys), without considering the quality of those social connections, and without further exploring the broader social network and their characteristics. 16 26 These proxy measures do little to describe the structure, quantity, quality or characteristics of social connections within which individuals are embedded. Another common approach in health research is to focus on social support measures and their effects on health. Individuals are asked about perceived, or received, social support (for example, through questions that ask about the availability of people who provide emotional support, informational support and/or assistance with daily tasks, with either binary or a Likert scale of responses). 27 28 While important, social support measures do not assess the structure of social networks and represent only one of many different mechanisms by which social networks influence health. 17 23

Social network analysis

Social network analysis (SNA) is a methodological tool, developed in the 1930s by social psychologists, used to study the structure and characteristics of the social networks within which individuals are embedded. 16 29 It has evolved over the past 100 years and has been used by researchers in many social science disciplines to analyse how structures of relationships impact social life. 29 30 SNA has the following key properties 3 30 31 : (1) it relies on empirical relational data (ie, data on actors (nodes) and the connections (ties) between them); (2) it uses mathematical models and graph theory to examine the structure of relationships within which individual actors are embedded; and (3) it models social action at both the group and the individual level arising from the opportunities and constraints determined by the system of relationships. The premise of SNA is that social ties are both drivers and consequences of human behaviour, and are therefore the object of study. 15 16 23 32 Social networks are comprised of nodes, representing the members within a network, connected by ties, representing relations among those individuals. 33 There are two types of SNA: egocentric network analysis and whole network analysis. Egocentric network analysis describes the characteristics of an individual’s (ie, the ‘egos’) personal network, while whole network analysis examines the structure of relationships among all the individuals in a bounded group, such as a school or classroom. 3

In egocentric network analysis, a list of ‘alters’ (ie, nodes) to whom the ego is connected, is obtained through a name generator. Name generator questions ask for a list of alters based on role relations (eg, friends or family), affect (eg, people to whom the ego feels close), interaction (eg, people with whom the ego has been in contact) or exchange (eg, people who provide social and/or financial support). 34 These are followed by name interpreters, where the ego is asked questions about the characteristics of each named alter. 35 Analyses of these data involve constructing measures that describe these egocentric networks. Such measures include network size, network density (ie, how tightly knit the network is), the strength of relationships (ie, the intensity and duration of relationships between ego and alter), network function (ie, the resources and/or support provided through the network) and the diversity of relations within the network (‘heterogeneity’). 23 36 In whole network analysis, the network boundary is determined a priori and network members are known, for example, through membership lists or rosters. 37 Each network member is surveyed, to identify the other network members with whom they are connected and/or affiliated; attributes of each member are obtained through surveying the network members themselves. Variables are constructed at the individual and network levels. Individual-level measures include the number of ties to other network members (‘degree’), types of relationships, and the strength and diversity of relationships. Network-level measures include but are not limited to: density (representing how tightly knit or ‘glued’ together the network is), reciprocity (ie, the proportion of network ties that are reciprocated), isolates (ie, nodes with no ties to other network members), centralisation (or the extent to which the network ties are focused on one node or a set of nodes), cliques and equivalence (ie, sets of nodes that have the same pattern of ties and therefore occupy the same position in the network). 33 38 The constructed measures can then be included in statistical models to explore associations between individual and/or network-level measures, and outcomes. 33 39

Study rationale

In medicine and health research, there has traditionally been a dichotomy between the individual and the context in which the individual is situated—such as in their relationships with others. 40 As such, epidemiology of diseases has historically focused on individual-level traditional risk and protective factors—such as biological markers, genetics, lifestyle and health behaviours, and psychological conditions. 41 While criticisms of this individualistic focus abound, attempts to develop and use different approaches in medicine and research have lagged behind. 42 The use and adoption of methods, like SNA, that frame issues of health and wellness differently, has the potential to offer new insights and solutions to clinical and healthcare delivery problems, 42 by more holistically considering ‘different levels of change’ beyond the individual. 41 We seek to examine the extent to which SNA has transcended the boundaries of its disciplines of origin in the social sciences, into health research. For example, while Chapman et al have clearly shown an explosion of publications at this intersection, 25 it remains unclear whether these studies use SNA tools (which were developed specifically to interrogate the nature and characteristics of social networks), or whether they suffer from the known problem of conflation of constructs like social support, social capital and social integration. 15 43 Many studies that report the impact of ‘social networks’ on health outcomes do not use SNA methods but rather use self-reported network size (without probing the network and its structure), 44 45 social support, 46 marital status 47 48 and/or household members 47 as proxies.

We will therefore undertake a scoping review to map the use of SNA as a data collection and analytical method in health research. More specifically, the scoping review will examine how SNA has been used to study associations across social networks and individual health and well-being (including both physical and psychological health), health knowledge, health engagement, health service use and health behaviours. Scoping reviews are a knowledge synthesis approach that aims to uncover the volume, range, reach and coverage of a body of literature on a specific topic. 49 They differ from systematic reviews, another type of knowledge synthesis, in their objectives. Systematic reviews seek to answer clinical or epidemiological questions and are conducted to fill gaps in knowledge. 50 Systematic reviews are used to establish the effectiveness of an intervention or associations between specific exposures and outcomes. On the other hand, scoping reviews do not seek to provide an answer to a question, but rather, aim to create a map of the existing literature. 49 They are used to provide clarity to the concepts and definitions used in literature, examine the way in which research is conducted in a specific field or on a specific topic, and uncover knowledge gaps. 49 A scoping review, therefore, is well suited as a research method to address our research question, of mapping the ways in which SNA has been used in health research. This scoping review can identify areas (eg, specific populations and specific health outcomes) where there has been a plethora of SNA research warranting future systematic reviews. It can also identify areas within health research where the use of SNA is scarce, highlighting topics, populations or outcomes for future study.

This scoping review will be limited to studies that use SNA in exploring network components and their associations with non-communicable diseases and health and well-being outcomes, for three reasons. The first is feasibility, given the large volume of studies anticipated, based on Chapman et al ’s bibliometric study on this topic. 25 Second, the use of SNA in understanding disease transmission of communicable diseases (such as sexually transmitted infections) is well established; its application to HIV was in fact one of the catalysts, as previously mentioned, to its broader uptake in health research. 25 Third, SNA in health research has shifted from focusing on communicable diseases to focusing on non-communicable diseases and their risk factors; SNA is now being applied much more frequently to the latter conditions than the former ones. 51

Methods and analysis

The scoping review will be informed by the framework developed by Arksey and O’Malley 52 for conducting scoping reviews, as well as the additional recommendations made by Levac et al . 53 Arksey and O’Malley’s framework recommends that the review process be organised into the following five steps: identifying the research question; identifying relevant studies; study selection; charting the data; and collating, summarising and reporting the results. 52 The reporting of this review will adhere to the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews. 54

Patient and public involvement

No patients will be involved.

Step 1: identifying the research question

A preliminary search of the literature identified a gap related to SNA and how it has been used to study the relationship between social networks and individual well-being and health outcomes. This led to the development of the research question that will guide this scoping review: how have social network analytical tools been used to study the associations between social networks and individual patient health? In this case, SNA is defined as a data analysis technique that uses either an egocentric or whole network analysis approach. For egocentric network analysis, we will include studies that involve peer nomination (ie, use of a name generator) and the collection of one or more characteristics of alters (ie, use of name interpreter(s)).

Step 2: identifying relevant studies

A search strategy will be constructed through consultation with an academic librarian (JW). The main concepts from the research question will be used for a preliminary search in Google Scholar. Additionally, the lead authors will provide the librarian with key studies that will be text-mined for relevant terms. These key studies will include a variety of populations (across different countries and age groups) and health outcomes. 55–58 Key studies will be searched in Ovid MEDLINE for appropriate subject headings. In consultation with team members, the librarian (JW) will construct a pilot search strategy. A title/abstract/keyword search will be conducted in Ovid MEDLINE against the known seed/key studies. Table 1 lists example keywords and terms relating to social networks that will be used, with the full search strategy detailed in online supplemental appendix A .

Supplemental material

View inline

Search terms relating to social network analysis

Due to a significant number of irrelevant articles surrounding communicable diseases using this search strategy, we will exclude records with these terms in either the title or keyword fields. Table 2 lists the terms related to communicable diseases.

Search terms relating to communicable diseases

Of note, the search strategy will not include terms that relate to health-related outcomes of interest (outside of excluding communicable diseases). Prior literature has shown that the inclusion of outcome concepts in a search strategy reduces the recall and sensitivity of a search strategy. 59 60 This problem is further exacerbated when only generic health terms (for example, ‘morbidity’ or ‘health status’) or specific health terms (eg, specific diseases or conditions such as ‘diabetes mellitus’) are used. 61 Because the objective of this scoping review is to examine and map the use of SNA in health research, the outcomes of interest are very broad, including: physical health and well-being, psychological health and well-being, healthcare engagement, health knowledge, health behaviours, healthcare access and use, disease prevalence and outcomes (spanning every organ system), and mortality. It will be impossible for a search strategy to be sufficiently comprehensive, to capture all possible generic and specific terms relating to this broad range of outcomes. In keeping with recommendations to minimise the number of elements in a search strategy 62 —and in particular outcome elements 63 —our search strategy will entail searching for SNA terms in health databases without specifying health outcomes.

The search strategy will first be created in Medline (Ovid), then translated and adapted for the databases: (1) EMBASE (Ovid), (2) APA PsycInfo (Ovid) and (3) CINAHL (EBSCO). A search will be completed in April 2024. No date filters will be applied to the search. However, animal-only studies will be excluded. The current version of the search strategy including limits and filters, for all databases, is included in online supplemental appendix A .

Step 3: study selection

The criteria that will be used to determine which studies to include are as follows:

Studies that employ SNA as a data collection and/or analysis technique, as defined above. Of note, studies that elicit only the number of friends or other social contacts, without collection of any information about these social contacts, are not considered to be SNA and are therefore not included in the scoping review.

Studies that explore the social networks of individuals in whom the health outcome is measured.

Studies must include the exploration of non-communicable health outcomes. Examples include self-rated health or other global measures of health (including measures of physical health, mental health and well-being), health practices (eg, physical activity, dietary patterns, smoking, alcohol use, substance use), sexual and reproductive health, healthcare-seeking behaviours (eg, medication adherence, acute care use, attachment to a primary care provider), health knowledge, health beliefs, healthcare engagement, non-communicable disease prevalence and mortality.

The criteria that will be used to exclude studies are as follows:

Studies that explore the social networks of organisations or healthcare providers, rather than the social networks of the individual about whom the health outcome is measured or reported.

Studies that describe or use data analysis techniques other than SNA (eg, using proxies for social networks/social support that do not include peer nomination (such as marital status or living alone status), or studies where study participants report the number of social contacts but where no other information about each social contact is collected).

Studies that focus exclusively on online social networks (eg, social media, online forums, online support groups).

Studies related to prevention, transmission or outcomes of communicable diseases.

Non-English studies, for feasibility purposes.

We will not limit studies based on the study population or country in which the SNA is conducted. Studies in paediatric and adult populations will be included. The reasons for excluding SNA studies that focus solely on social media and online networks are twofold. First, we anticipate a very large number of articles, given the broad populations and outcomes of interest, and for feasibility purposes, we have needed to narrow the research objective to in-person and/or offline social networks only. Second, there are likely inherent differences in online and offline social networks. Individuals use health-related social networking sites and online networks primarily for information seeking, connection with others who share a similar lived experience while being able to maintain some emotional distance and interacting with health professionals 64 ; this differs from in-person networks, which individuals go to more for emotional and tangible or instrumental support. Friends met on online networks vary from friends met in person in other important ways. They tend to have less similarity in terms of age, gender and place of residence, 65 and the network ties more commonly arise spontaneously—that is, without common acquaintances or affiliations. 66 The social patterns and interactions among individuals and their online network contacts are also different—with entire relationships built on text-based interactions. 66 Therefore, while online social networks are an important area of study, they appear to be inherently different from the study of offline social networks, and are therefore excluded from this scoping review.

For the first step of the screening process, after removing duplicate articles, two reviewers will independently assess the titles and abstracts of the studies to determine whether they meet the inclusion criteria. Any studies that do not meet the inclusion criteria will be excluded from the review. Studies that either one of the two reviewers feels are potentially relevant will be included in the full-text review, to ensure that no article is prematurely excluded at this stage. During the second step of the screening process, two reviewers will independently review the full texts of the studies to ensure they meet the inclusion criteria. Conflicts will be resolved by third and fourth reviewers with expertise in SNA (JG) and health outcomes (KLT). The number of studies included in each step of the screening process will be reported using the Preferred Reporting Items for Systematic reviews and Meta-Analyses diagram. 67

Step 4: charting the data

A data charting document ( online supplemental appendix B ) will be created to extract data from the studies in the review. This document will include information about the authors, year of publication, study location, study population characteristics, outcomes of interest to this scoping review, and the scales and measures used for each outcome. Data about the social network analytical method will also be extracted, including whether studies used egocentric versus whole networks, the name generator used (in egocentric network studies) or the relationship being explored, the maximum number of peer nominations allowed, the lookback period used, whether (and which) alter attributes were collected, and whether alter-to-alter tie data were collected. Data extraction will be performed by at least one reviewer, with a second reviewer separately checking and confirming the inputted data. Disagreements in data extraction will be resolved through a consensus, and through the input of reviewers with content and methods expertise (KLT, JG).

Step 5: collating, summarising and reporting results

The results of the review will be presented in the form of figures and tables and will include descriptive numerical summaries. The numerical summary will include information about the number of studies included in the review, where the studies were conducted, when they were published and characteristics of the populations, such as the sample sizes and mean age. It will also include characteristics of the SNA conducted in these studies, including the number that are whole network studies versus egocentric network studies, the data sources used and the attributes of the social connections that are collected and analysed. Results will be synthesised in text, as well as through tables and figures.

Ethics and dissemination

This review does not require ethics approval. Data will be extracted from published material. Once the scoping review is complete, an article will be written to convey the findings of this review, and it will be submitted for publication in a peer-reviewed journal. We anticipate the results of this review will map out the ways in which SNA has been used in health research. Specifically, this scoping review will identify areas of potential saturation where SNA has been heavily used, opportunities for future systematic reviews (where there is a large body of primary research studies requiring synthesis) and health research gaps (eg, the health outcomes where SNA has been minimally used). The scoping review will also shed light on characteristics of SNA that have been used (eg, whether egocentric networks vs whole networks are used and in what settings, and whether a broad range of social network characteristics are captured and analysed), which will serve to inform the conduct of future SNA studies in health research.

Ethics statements

Patient consent for publication.

Not applicable.

Schaefer DR
Christiansen J ,
Qualter P ,
Friis K , et al
Leong D , et al
Umberson D ,
Glymour MM ,
Everson-Rose SA ,
Robles TF ,
Kiecolt-Glaser JK
Kiecolt-Glaser JK ,
McGuire L ,
Robles TF , et al
O’Brien E , et al
Shattuck EC
Christakis NA
Berkman LF ,
Brissette I , et al
McFarlane AH ,
Bellissimo A ,
Rueger SY ,
Malecki CK ,
Pyun Y , et al
Siciliano MD
de la Haye K ,
Barnett LM , et al
Murray JM ,
Sánchez-Franco SC ,
Sarmiento OL , et al
Pescosolido BA ,
Borgatti SP
Chapman A ,
Verdery AM ,
Holt-Lunstad J ,
Gjesfjeld CD ,
Greeno CG ,
Poston WS , et al
Fredericks KA ,
Carrington P
Crossley N ,
Bellotti E ,
Edwards G , et al
Wasserman S ,
Burgette JM ,
Rankine J ,
Culyba AJ , et al
Kirkengen AL ,
Ekeland T-J ,
Getz L , et al
Pescosolido BA
Lucivero F , et al
Vettore MV ,
Ahmad SFH ,
Machuca C , et al
De Gagne JC
Palmer Kelly E ,
García EL ,
Banegas JR ,
Pérez-Regadera AG , et al
Hempler NF ,
Joensen LE ,
Peters MDJ ,
Stern C , et al
Higgins JPT ,
Chandler J , et al.
Valente TW ,
Colquhoun H ,
Tricco AC ,
Zarin W , et al
Christakis NA ,
Mohr P , et al
O’Malley AJ ,
Arbesman S ,
Steiger DM , et al
Watkins SC ,
Jato MN , et al
Frandsen TF ,
Nielsen MFB ,
Bruun Nielsen MF ,
Lindhardt CL , et al
Maclean A ,
Sweeting H , et al
Bramer WM ,
de Jonge GB ,
Rethlefsen ML , et al
Lefebvre C ,
Glanville J ,
Briscoe S , et al
Colineau N ,
Doerfel ML ,
Shamseer L ,
Clarke M , et al

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Data supplement 1
Data supplement 2

Contributors KLT and JG conceived of the study protocol. KLT, JG, EG and JW developed and revised the study protocol, the search strategy and the inclusion/exclusion criteria. EG and KLT drafted the protocol manuscript, and all authors provided critical revisions.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Information

Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

Active Journals
Find a Journal
Proceedings Series
For Authors
For Reviewers
For Editors
For Librarians
For Publishers
For Societies
For Conference Organizers
Open Access Policy
Institutional Open Access Program
Special Issues Guidelines
Editorial Process
Research and Publication Ethics
Article Processing Charges
Testimonials
Preprints.org
SciProfiles
Encyclopedia

Article Menu

Subscribe SciFeed
Recommended Articles
Google Scholar
on Google Scholar
Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

The role of social network analysis in social media research.

1. Introduction

2. the significance of this study, 3. works related to social media usage, 4. the related works of social network analysis, 5. possible hypotheses regarding structural features of social media usage, 6. conclusions, author contributions, conflicts of interest.

We Are Social. Available online: https://wearesocial.com/uk/blog/2023/04/the-global-state-of-digital-in-april-2023 (accessed on 14 April 2023).
Hofstede, G. What did GLOBE really measure? Researchers’ minds versus respondents’ minds. J. Int. Bus. Stud. 2006 , 37 , 882–896. [ Google Scholar ] [ CrossRef ]
Granovetter, M. Threshold models of Collective Behavior. Am. J. Sociol. 1978 , 83 , 1420–1443. [ Google Scholar ] [ CrossRef ]
Castells, M. Communication, power and counter-power in the network society. Int. J. Commun. 2007 , 1 , 29. [ Google Scholar ]
McQuail, D. The future of communication studies: A contribution to the debate. Media Commun. Stud. Int Erventions Intersect. 2010 , 27 , 27–35. [ Google Scholar ]
Sheppard, B.H.; Hartwick, J.; Warshaw, P.R. The theory of reasoned action: A meta-analysis of past research with recommendations for modifications and future research. J. Consum. Res. 1988 , 15 , 325–343. [ Google Scholar ] [ CrossRef ]
Knoke, D.; Yang, S. Social Network Analysis ; SAGE Publications: London, UK, 2019. [ Google Scholar ]
Cloudlancer. Available online: https://cloudlancer.com/social-media-statistics-2023/?utm_source=Search&utm_medium=ppc&utm_campaign=0003Stats&utm_id=0003&utm_term=statssocial&utm_content=2023statssocial&gclid=CjwKCAjwlJimBhAsEiwA1hrp5gKoTxnqL5DyzzD_RMvBKFLpWXPH9naQlMwN6hB1PDShB20iYgJoXhoCqvwQAvD_BwE (accessed on 30 July 2023).
Ngai, E.W.T.; Tao, S.S.C.; Moon, K.K.L. Social media research: Theories, constructs, and conceptual frameworks. Int. J. Inf. Manag. 2015 , 35 , 33–44. [ Google Scholar ] [ CrossRef ]
Tuten, T.L.; Solomon, M.R. Social Media Marketing ; Sage: Los Angeles, CA, USA, 2018; p. 4. [ Google Scholar ]
Bandura, A. Gauging the relationship between self-efficacy judgment and action. Cogn. Ther. Res. 1980 , 4 , 263–268. [ Google Scholar ] [ CrossRef ]
Anderson, M. Intelligence and Development: A Cognitive Theory ; Blackwell Publishing: Hoboken, NJ, USA, 1992. [ Google Scholar ]
Burke, M.; Kraut, R.; Marlow, C. Social capital on Facebook: Differentiating uses and users. In SIGCHI Conference on Human Factors in Computing Systems ; SIGCHI: Tokyo, Japan, 2011; pp. 571–580. [ Google Scholar ]
Bateson, G. Steps to an Ecology of Mind: Collected Essays in Anthropology, Psychiatry, Evolution and Epistemology ; Granada: London, UK, 1973. [ Google Scholar ]
Colliander, J.; Dahlén, M. Following the fashionable friend: The power of social media. J. Advert. Res. 2011 , 51 , 313–320. [ Google Scholar ] [ CrossRef ]
Berry, D.M. Understanding Digital Humanities ; Palgrave Macmillan: New York, NY, USA, 2012. [ Google Scholar ]
Tabassum, S.; Pereira, F.S.F.; Fernandes, S.; Gama, J. Social network analysis: An overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018 , 8 , e1256. [ Google Scholar ] [ CrossRef ]
Kwon, O.; Wen, Y. An empirical study of the factors affecting social network service use. Comput. Hum. Behav. 2010 , 26 , 254–263. [ Google Scholar ] [ CrossRef ]
Sassenberg, K.; Vliek, M.L.W. Social Psychology in Action: Evidence-Based Interventions from Theory to Practice , 1st ed.; Springer: Cham, Switzerland, 2020. [ Google Scholar ]
Zhong, B.; Hardin, M.; Sun, T. Less effortful thinking leads to more social networking? The associations between the use of social network sites and personality traits. Comput. Hum. Behav. 2011 , 27 , 1265–1271. [ Google Scholar ] [ CrossRef ]
Latané, B.; Williams, K.; Harkins, S. Many hands make light the work: The causes and consequences of social loafing. J. Personal. Soc. Psychol. 1979 , 37 , 822–832. [ Google Scholar ] [ CrossRef ]
Montano, D.E.; Kasprzyk, D. Theory of reasoned action, theory of planned behavior, and the integrated behavioral model. Health Behav. Theory Res. Pract. 2015 , 70 , 231. [ Google Scholar ]
Vallerand, R.J.; Pelletier, L.G.; Blais, M.R. The Academic Motivation Scale: A measure of intrinsic, extrinsic, and amotivation in education. Educ. Psychol. Meas. 1992 , 52 , 1003–1017. [ Google Scholar ] [ CrossRef ]
Wei, L. Filter blogs vs. personal journals: Understanding the knowledge production gap on the internet. J. Comput. Mediat. Commun. 2009 , 14 , 532–559. [ Google Scholar ] [ CrossRef ]
Chen, G.M. Tweet this: A uses and gratifications perspective on how active Twitter use gratifies a need to connect with others. Comput. Hum. Behav. 2010 , 27 , 755–762. [ Google Scholar ] [ CrossRef ]
White, E.L.; Parsons, A.L.; Wong, B.; White, A. Building a socially responsible global community? Communication B Corps on social media. Corp. Commun. Int. J. 2023 , 28 , 86–102. [ Google Scholar ] [ CrossRef ]
Granovetter, M. The Strength of Weak Ties: A Network Theory Revisited. Sociol. Theory 1973 , 1 , 201. [ Google Scholar ] [ CrossRef ]
Gudka, M.; Gardiner, K.L.K.; Lomas, T. Towards a framework for flourishing through social media: A systematic review of 118 research studies. J. Posit. Psychol. 2023 , 18 , 86–105. [ Google Scholar ] [ CrossRef ]
Pang, H.; Qin, K.; Ji, M. Can social network sites facilitate civic engagement? Assessing dynamic relationship between social media and civic activities among young people. Soc. Media Civ. Act. 2022 , 46 , 79–94. [ Google Scholar ] [ CrossRef ]
Stalder, F. Manuel Castells: The Theory of the Network Society ; Polity: Cambridge, UK, 2006. [ Google Scholar ]
Ajzen, I. Perceived behavioral control, self-efficacy, locus of control, and the theory of planned behavior. J. Appl. Soc. Psychol. 2002 , 32 , 665–683. [ Google Scholar ] [ CrossRef ]
Ajzen, I. The theory of planned behavior: Reactions and reflections. Psychol. Health 2011 , 26 , 1113–1127. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Albert, R.; Barabási, A.L. Statistical mechanics of complex networks. Rev. Mod. Phys. 2002 , 74 , 47. [ Google Scholar ] [ CrossRef ]
Gonzalez, M.C.; Hidalgo, C.A.; Barabási, A.L. Understanding individual human mobility patterns. Nature 2008 , 453 , 779–782. [ Google Scholar ] [ CrossRef ]
Mahalingham, T.; McEvoy, P.M.; Clarke, P.J.F. Assessing the validity of self-report social media use: Evidence of No relationship with objective smartphone use. Comput. Hum. Behav. 2023 , 140 , 107567. [ Google Scholar ] [ CrossRef ]
Robins, G.; Pattison, P.; Kalish, Y.; Lusher, D. An introduction to exponential random graph (p*) models for social networks. Soc. Netw. 2007 , 29 , 173–191. [ Google Scholar ] [ CrossRef ]
Pettegrew, L.S.; Carolyn, D. Smart phones and mediated relationships: The changing face of relational communication. Rev. Commun. 2015 , 15 , 122–139. [ Google Scholar ] [ CrossRef ]
Lee, M.J.; Lee, E.; Lee, B.; Jeong, H.; Lee, D.-S.; Lee, S.H. Uncovering hidden dependency in weighted networks via information entropy. Phys. Rev. Res. 2021 , 3 , 043136. [ Google Scholar ] [ CrossRef ]
Anderson, C.J.; Wasserman, S.; Crouch, B. A p* primer: Logit models for social networks. Soc. Netw. 1999 , 21 , 37–66. [ Google Scholar ] [ CrossRef ]
Wasserman, S.; Faust, K. Social Network Analysis: Methods and Applications ; Cambridge University Press: London, UK, 1994. [ Google Scholar ]
Fisher, D. Using egocentric networks to understand communication. IEEE Internet Comput. 2005 , 9 , 20–28. [ Google Scholar ] [ CrossRef ]
Wasserman, S.; Robins, G. An introduction to random graphs, dependence graphs, and p*. In Models and Methods in Social Network Analysis ; Cambridge University Press: New York, NY, USA, 2005; pp. 148–161. [ Google Scholar ]
Watzlawick, P.; Weakland, J.H. (Eds.) The Interactional View ; Norton: New York, NY, USA, 1977. [ Google Scholar ]
Corradini, E.; Virsino, D.; Virgili, L. Investigating negative reviews and detecting negative influencers in Yelp through a multi-dimensional social network based model. Int. J. Inf. Manag. 2021 , 60 , 102377. [ Google Scholar ] [ CrossRef ]
Corradini, E.; Virsino, D.; Virgili, L. Investigating the phenomenon of NSFW posts in Reddit. Inf. Sci. 2021 , 566 , 140–164. [ Google Scholar ] [ CrossRef ]
Top Social Media Statistics and Trends of 2023. Available online: https://www.forbes.com/advisor/business/social-media-statistics/ (accessed on 30 July 2023).

Click here to enlarge figure

Parameter	Structural Features	Estimate (Standard Error)
θ (Edge)		−3.12 (1.36)
σ2 (Two Stars)		0.06 (1.84)
σ3 (Tree Stars)		−0.02 (0.13)
τ (Triangle)		1.06 (8.4)

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Nie, Z.; Waheed, M.; Kasimon, D.; Wan Abas, W.A.B. The Role of Social Network Analysis in Social Media Research. Appl. Sci. 2023 , 13 , 9486. https://doi.org/10.3390/app13179486

Nie Z, Waheed M, Kasimon D, Wan Abas WAB. The Role of Social Network Analysis in Social Media Research. Applied Sciences . 2023; 13(17):9486. https://doi.org/10.3390/app13179486

Nie, Zhou, Moniza Waheed, Diyana Kasimon, and Wan Anita Binti Wan Abas. 2023. "The Role of Social Network Analysis in Social Media Research" Applied Sciences 13, no. 17: 9486. https://doi.org/10.3390/app13179486

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

Subscribe to receive issue release notifications and newsletters from MDPI journals

Home » Social Network Analysis – Types, Tools and Examples

Social Network Analysis – Types, Tools and Examples

Table of Contents

Social Network Analysis

Social Network Analysis (SNA) is an analytical method used to study social structures through the use of networks and graph theory. It identifies the relationships between individuals, organizations, or other entities and examines the patterns and implications of these relationships.

The nodes in the network represent the actors within the networks and the ties or edges represent relationships between the actors. These might be, for example, friendship ties between people, business relationships between companies, or communication patterns between individuals.

By analyzing the network structure and the characteristics of the actors within the network, SNA can reveal properties such as the distribution of resources, the flow of information, or the overall connectivity of the network.

Here are a few key concepts in SNA:

Centrality : This measures the importance of a node in the network. Various centrality measures exist, each emphasizing a different aspect of a node’s position within the network, such as degree centrality (the number of direct connections a node has), betweenness centrality (the number of times a node acts as a bridge along the shortest path between two other nodes), and eigenvector centrality (the sum of the centrality scores of all nodes that one node is connected to).
Density : This is a measure of the proportion of possible connections in a network that are actual connections. A high density suggests that the network participants are highly interconnected.
Clusters or Communities : These are groups of nodes that are more densely connected with each other than with the rest of the network.
Structural Holes : These are gaps in the network where a node could potentially act as a bridge between two unconnected parts of the network.

Types of Social Network Analysis

Social Network Analysis can be broadly categorized based on the type of networks being analyzed, the level of analysis, and the methodologies employed. Here are a few ways to categorize SNA:

Whole Network Analysis

This type of analysis focuses on the structure and properties of the network as a whole. This might include measures of network cohesion, centralization, and density. It also looks at the overall distribution of relationships and identifies key groups or clusters within the network.

Ego Network Analysis

In this type of analysis, the focus is on a single actor (the ‘ego’) and their immediate network (the ‘alters’). It’s often used when interest is in the personal networks of individuals. Measures can include the size of the network, the composition of the network in terms of the types of ties and nodes, and measures of network density or diversity.

Two-mode (or Bipartite) Network Analysis

This type of SNA is used when there are two different types of nodes, and connections are only possible between nodes of different types (not within types). For example, authors and the books they write, or actors and the movies they appear in. In such a network, you can study the connections between nodes of one type, mediated by nodes of the other type.

Dynamic Network Analysis (DNA)

This is used to study how social networks evolve over time. This could involve studying how ties between actors develop or disappear, or how actors move around within the network. In addition to traditional network measures, DNA also considers measures that are dynamic in nature, such as change in centrality over time.

Semantic Network Analysis

This type of SNA focuses on the relationships between concepts or ideas, rather than individuals or organizations. For instance, semantic network analysis could map out how different scientific concepts are related to each other in the literature.

Social Media Network Analysis

A specialized form of SNA, this deals with the study of social relationships as expressed through social media platforms. It allows for the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities.

Social Network Analysis Techniques

Social Network Analysis involves various techniques to understand the structure and patterns of relationships among actors (people, organizations, etc.) in a network. These techniques may be mathematical, visual, or computational, and often involve the use of specialized software. Here are several common SNA techniques:

Network Visualization

One of the most basic SNA techniques involves creating a visual representation of the network. This can help to reveal patterns and structures within the network that may not be immediately obvious from the raw data. There are various ways to create such visualizations, depending on the specifics of the network and the goals of the analysis. Software such as Gephi or Cytoscape can be used for network visualization.

Centrality Measures

These are techniques used to identify the most important nodes within a network. Various measures of centrality exist, each highlighting different aspects of a node’s position in the network. These include degree centrality (the number of connections a node has), betweenness centrality (how often a node appears on the shortest path between other nodes), closeness centrality (how quickly a node can reach all other nodes in the network), and eigenvector centrality (a measure of the influence of a node in a network).

Community Detection

Also known as clustering, this technique aims to identify groups of nodes that are more closely connected with each other than with the rest of the network. This can help to reveal sub-groups or communities within the network.

Structural Equivalence and Blockmodeling

Structural equivalence is a measure of how similarly two nodes are connected to the rest of the network. Nodes that are structurally equivalent often play similar roles in the network. Blockmodeling is a technique used to simplify a network by grouping together structurally equivalent nodes.

Dynamic Network Analysis

This involves studying how a network changes over time. This can help to reveal patterns of network evolution, including how relationships form and dissolve, how centrality measures change over time, and how communities evolve.

Network Correlation and Regression

These are statistical techniques used to identify and test for patterns within the network. For example, one might use these techniques to test whether nodes with certain characteristics are more likely to form connections with each other.

Social Network Analysis Tools

There are several tools available that can be used to conduct Social Network Analysis (SNA). These range from open-source software to commercial offerings, each with their own strengths and weaknesses. Here are a few examples:

Gephi : Gephi is an open-source, interactive visualization and exploration platform for all kinds of networks and complex systems. It’s user-friendly and allows users to interactively manipulate the network visualization, perform network analysis, and export results in various formats.
UCINet : UCINet is a comprehensive package for the analysis of social network data as well as other 1-mode and 2-mode data. It’s widely used in social science research.
NetDraw : Often used in conjunction with UCINet, NetDraw is a free tool for visualizing networks. It supports the visualization of large networks and allows for various customization options.
Pajek : Pajek is a program for the analysis and visualization of large networks. It’s an extensive tool, offering a range of complex network metrics, and is free for non-commercial use.
NodeXL : NodeXL is a free, open-source template for Microsoft Excel that allows users to display and analyze network graphs. Its integration with Excel makes it user-friendly, particularly for those already familiar with Excel.
Cytoscape : Originally designed for biological research, Cytoscape is now a popular open-source software platform for visualizing complex networks and integrating these with any type of attribute data.
SocioViz : SocioViz is a social media analytics platform for Twitter data, focused on network analysis and visualization. It’s a powerful tool for researchers interested in online social networks.
NetworkX : NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It integrates well with other scientific Python tools like SciPy and Matplotlib.
igraph : igraph is a library available in R, Python, and C for creating, manipulating, and analyzing networks. It’s highly efficient and can handle large networks.
RSiena : RSiena is an R package dedicated to the statistical analysis of network data, with a particular focus on longitudinal social networks.

Social Network Analysis Examples

Social Network Analysis Examples are as follows:

Public Health – COVID-19 Pandemic : During the COVID-19 pandemic, SNA was used to model the spread of the virus. The interactions between individuals were mapped as a network, helping identify super-spreader events and informing public health interventions.
Business – Google’s “PageRank” Algorithm : Google’s PageRank algorithm, which determines the order of search engine results, is a type of SNA. It considers web pages as nodes and hyperlinks as connections, determining a page’s importance by looking at the number and quality of links to it.
Sociology – Stanley Milgram’s “Small World” Experiment : This is one of the most famous social network experiments, where Milgram demonstrated that any two people in the United States are separated on average by only six acquaintances, leading to the phrase “six degrees of separation.”
Online Social Networks – Facebook’s “People You May Know” Feature : Facebook uses SNA to suggest new friends. The platform analyzes your current network and suggests people you’re likely to know, typically friends of friends or people who share common networks.
Criminal Network Analysis – Capture of Osama bin Laden : SNA was reportedly used in the operation to capture Osama bin Laden. By mapping the social connections of known associates, intelligence agencies were able to locate the Al-Qaeda leader.
Academic Research – Collaboration Networks : SNA is used in scientometrics to analyze collaboration networks among researchers . For example, a study on co-authorship networks in scientific articles can reveal patterns of collaboration and the flow of information in different disciplines.

When to use Social Network Analysis

Social Network Analysis is a powerful tool for studying the relationships between entities (like people, organizations, or even concepts) and the overall structure of these relationships. Here are several situations when SNA might be particularly useful:

Understanding Complex Systems : SNA is well-suited to studying complex, interconnected systems. If you’re interested in not just individual entities but also the relationships between them, SNA can provide valuable insights.
Identifying Key Actors : SNA can help identify the most important entities in a network based on their position and connections. These might be influential people within a social network, critical servers in a computer network, or key scholars in a field of study.
Studying Diffusion Processes : If you’re interested in how something (like information, behaviors, diseases) spreads through a network, SNA can be a valuable tool. It allows for the examination of diffusion pathways and identification of nodes that speed up or hinder diffusion.
Detecting Communities : SNA can be used to identify clusters or communities within a network. These might be groups of friends within a social network, clusters of companies in a business network, or research clusters in scientific collaboration networks.
Mapping Out Large Systems : In cases where you have a large system of many interconnected entities, SNA can provide a visual representation of the system, making it easier to understand and analyze.
Investigating Structural Roles : If you’re interested in the roles individuals or entities play within their network, SNA offers methods to classify these roles based on the pattern of their relationships.

Purpose of Social Network Analysis

Social Network Analysis serves a wide range of purposes across different fields, given its versatile nature. Here are several key purposes:

Understanding Network Structure : One of the key purposes of SNA is to understand the structure of relationships between actors within a network. This includes understanding how the network is organized, the distribution of connections, and the patterns of interaction.
Identifying Key Actors or Nodes : SNA can identify crucial nodes within a network. These could be individuals with many connections, or nodes that serve as critical links between different parts of the network. In a business, for instance, such nodes might be key influencers or innovators.
Detecting Subgroups or Communities : SNA can identify clusters or communities within a network, i.e., groups of nodes that are more connected to each other than to the rest of the network. This can be valuable in numerous contexts, from identifying communities in social media networks to detecting collaboration clusters in scientific networks.
Analyzing Information or Disease Spread : In public health and communication studies, SNA is used to study the patterns and pathways of information or disease spread. Understanding these patterns can be critical for designing effective interventions or campaigns.
Analyzing Social Capital : SNA can help understand an individual or group’s social capital – the resources they can access through their network relationships. This analysis can offer insights into power dynamics, access to resources, and inequality within a network.
Studying Network Dynamics : SNA can examine how networks change over time. This could involve studying how relationships form or dissolve, how centrality measures change over time, or how communities evolve.
Predicting Future Interactions : SNA can be used to predict future interactions or relationships within a network, which can be useful in a variety of settings such as recommender systems, predicting disease spread, or forecasting emerging trends in social media.

Applications of Social Network Analysis

Social Network Analysis has a wide range of applications across different disciplines due to its capacity to analyze relationships and interactions. Here are some common areas where it is applied:

Public Health : SNA can be used to understand the spread of infectious diseases within a community or globally. It helps identify “super spreaders” and optimizes strategies for vaccination or containment.
Business and Organizations : Companies use SNA to analyze communication and workflow patterns, enhance collaboration, boost efficiency, and detect key influencers within their organization. It can also be applied in understanding and leveraging informal networks within a business.
Social Media Analysis : On platforms like Facebook, Twitter, or Instagram, SNA helps analyze user behavior, track information dissemination, identify influencers, detect communities, and develop recommendation systems.
Criminal Justice : Law enforcement and intelligence agencies use SNA to understand the structure of criminal or terrorist networks, identify key figures, and predict future activities.
Internet Infrastructure : SNA helps in mapping the internet, identifying critical nodes, and developing strategies for robustness against cyberattacks or outages.
Marketing : In marketing, SNA can track the diffusion of advertising messages, identify influential consumers for targeted marketing, and understand consumer behavior and brand communities.
Scientometrics : SNA is used in academic research to map co-authorship networks or citation networks. It can uncover patterns of collaboration and the flow of knowledge in scientific fields.
Politics and Policy Making : SNA can help understand political alliances, lobby networks, or policy networks, which can be critical for strategic decision-making in politics.
Ecology : In ecological studies, SNA can help understand the relationships between different species in an ecosystem, providing valuable insights into ecological dynamics.

Advantages of Social Network Analysis

Social Network Analysis offers several advantages when studying complex systems and relationships. Here are a few key advantages:

Reveals Complex Relationships : SNA allows for the study of relationships between entities (be they people, organizations, computers, etc.) in a way that many other methodologies do not. It emphasizes the importance of these relationships and helps reveal complex interaction patterns.
Identifies Key Players : SNA can identify the most influential or important nodes in a network, whether they are individuals within a social network, key servers in an internet network, or central scholars in an academic field.
Unveils Network Structure and Communities : SNA can help visualize the overall structure of a network and can reveal communities or clusters of nodes within a network. This can provide valuable insights into the organization and division of a network.
Tracks Changes Over Time : Dynamic SNA allows the study of networks over time. This can help to track changes in the network structure, the role of specific nodes, or the flow of information or resources through the network.
Helps Predict Future Interactions : Based on the analysis of current and past relationships, SNA can be used to predict future interactions, which can be useful in many fields including public health, marketing, and national security.
Aids in Designing Effective Strategies : The insights gained from SNA can be used to design targeted strategies, whether that’s intervening in the spread of misinformation online, designing a targeted marketing campaign, disrupting a criminal network, or managing collaboration in an organization.
Versatility : SNA can be applied to a vast array of fields, from sociology to computer science, biology to business, making it a versatile tool.

Disadvantages of Social Network Analysis

While Social Network Analysis is a powerful tool with wide-ranging applications, it also has certain limitations and disadvantages that are important to consider:

Data Collection Challenges : Collecting complete and accurate network data can be a major challenge. For larger networks, it may be nearly impossible to collect data on all relevant relationships. There’s also a risk of response bias, as people may forget, overlook, or misinterpret their relationships when providing data.
Time and Resource Intensive : Collecting network data, especially from primary sources, can be extremely time-consuming and expensive. Additionally, analyzing network data can also require significant computational resources for larger networks.
Complexity : SNA involves complex concepts and measures, which can be difficult to understand without specialized knowledge. This complexity can make it difficult to communicate findings to a non-technical audience.
Privacy and Ethical Concerns : SNA often involves sensitive data about individuals’ relationships and interactions, raising important privacy and ethical concerns. It’s important to handle this data carefully to respect individuals’ privacy.
Static Snapshots : Traditional SNA often provides a static snapshot of a network at a particular point in time, which may not capture the dynamic nature of social relationships. While dynamic SNA does exist, it adds additional complexity and data demands.
Dependence on Quality of Data : The insights and conclusions drawn from SNA are only as good as the data used. Incomplete, inaccurate, or biased data can lead to misleading results.
Difficulties in Establishing Causality : While SNA can reveal patterns and associations in network data, it can be difficult to establish causal relationships. For instance, do strong connections between two individuals lead to similar behavior, or does similar behavior lead to strong connections?
Assumptions about Relationships : SNA often assumes that relationships are equally important, which might not always be the case. Different relationships might have different strengths or meanings, which can be challenging to represent in a network.

About the author

Muhammad Hassan

Researcher, Academic Writer, Web developer

Descriptive Analytics – Methods, Tools and...

Diagnostic Analytics – Methods, Tools and...

Big Data Analytics -Types, Tools and Methods

Predictive Analytics – Techniques, Tools and...

Emerging Research Methods – Types and Examples

What is Data Science? Components, Process and Tools

Read the latest news stories about Mailman faculty, research, and events.

Departments

We integrate an innovative skills-based curriculum, research collaborations, and hands-on field experience to prepare students.

Learn more about our research centers, which focus on critical issues in public health.

Our Faculty

Meet the faculty of the Mailman School of Public Health.

Become a Student

Life and community, how to apply.

Learn how to apply to the Mailman School of Public Health.

Social Network Analysis

Social Network analysis is the study of structure, and how it influences health, and it is based on theoretical constructs of sociology and mathematical foundations of graph theory. Structure refers to the regularities in the patterning of relationships among individuals, groups and/or organizations. When social network analysis is undertaken, the underlying assumption is that network structure, and the properties of that structure have significant implications on the outcome of interest.

Due to its focus on network structure rather than individual characteristics and or behaviors of network members, the data required for appropriate analysis differs from what is typically collected in non-relational epidemiologic study designs. Typically, study designs that focus on individual characteristics/behaviors and how those characteristics influence health, collect and conduct analysis on attribute data. Attribute data is defined as data that reflects the attitudes, opinions, and behaviors of individuals or groups. Conversely, social network analysis requires not only attribute data, but is built on the collection and analysis of relational data. Relational data refers to contacts, ties and connections, which relate one agent in a network to another. Relational data cannot be reduced to properties of the individual agents themselves but to a system/collection of agents.

Description

The majority of social network studies use either whole (Socio-centric) networks or egocentric study designs. Whole network studies assess relationships between individuals or actors that for analytical purposes are regarded as bounded or closed, even though in actuality the boundaries of the network are in fact permeable and/or ambiguous. When whole network studies are conducted, the focus of the study is to measure the structural patterns of how individuals within the network interact and how those patterns explain specific health outcomes. The underlying assumption made when whole network analysis is conducted, is that individuals that make up a group or social network will interact more than would a randomly selected group of similar size.

In a socio-centric study, members of the network are usually known or are easily determined because the focus is usually on closed networks that are a priori defined. For this reason, data collection for socio-centric network analysis involves enumerating all network members, and administering saturation surveys to all network members. A saturation survey provides respondents with a roster of all network members, and respondents are asked to identify members with whom they are affiliated. From this data, actor-by-actor matrices can be constructed and social network analysis can be conducted.

When the network of interest does not have clearly defined boundaries, socio-centric studies result in snowball or respondent driven sampling to generate the network and collect data to identify structural patterns. In respondent driven sampling, a small number of network members are interviewed and asked to name other network members, and those named members are also interviewed and asked to name other network members. This iterative process is continued until all network members are identified, or for an a priori set number of waves established before study initiation. The assumption made when respondent driven sampling is used is that the sampled network is representative of all other segments of the network from which data has not been collected. Respondent driven sampling uses name generator surveys to identify network members, followed by name interpreter questions to solicit information about the named actors, their characteristics, and relations to the focal actors.

Egocentric network designs, on the other hand, focus on a focal actor, ego, and the relationships between the ego and named actors or objects within their social networks. These types of designs collect data on the relationships involving the ego and the objects, alters, to which they are linked. Egocentric study designs use either name generators or position generators to obtain both attribute and relational data that can be used to construct actor-by-actor from which egocentric data analysis can be constructed. Position generators are used to identify people who fill particular value rolls, such as lawyers, where as name generators, as discussed above, are questionnaires that ask the ego questions about individuals to whom he or she is connected in a specific way. Unlike in socio-centric studies, however, resource constraints preclude the subsequent interview of named alters, and therefore the ego serves as the informant for not only their own relationships with the alters, but also the alters relationships with each other. Name generator questions like in socio-centric respondent driven sampling are usually followed by name interpreter questionnaires.

Analysis of Social Network Data

Network data, though collected at the level of the individual, is analyzed at the structural level. Data is organized as an actor-by-actor matrix as depicted in figure 1B. Data as displayed in figure one depicts the presence or absence of a tie. When the strength of a tie is also of interest, i.e. valued data, similarity or distance matrices could be used. Similarity matrices depict stronger ties with increasing numerical values, while increasing numerical values in distance matrices reflect weakened ties because the greater the distance between two actors, the weaker the ties. Any actor-by actor matrix can be converted into graphs and analyzed using social network analysis software such as UCINET. Graphs are visual representations of a network. Actors within a network are displayed as nodes and the lines connecting nodes are representative of the ties between two actors. Graphs can be directed, indicating the relationship is directed from one agent to the other, or valued, indicating the strength of the tie. Though, visualizing the data is informative, the crux of social network analysis lies in the calculation of descriptive measures that reveal important characteristics about 1) position of network actors, 2) properties of network subgroups, and 3) characteristics of complete networks.

Position of network actors or the interconnectedness of network actors is often referred to as a measure of cohesion. There are two common measures of cohesion

Distance= the length of the shortest path that connects two actors

(Howe et al.) Distance between points 15 and 11 is 5

Density = total number of relational ties divided by the total possible number of relationional ties

Components and cliques measure properties of network subgroups

A component is a portion of the network in which all actors are connected, either directly or indirectly.

(Howe et al.)

Nodes 1, 6, and & 7 form a clique

A clique is a subgroup of actors who are all directly connected to one another, and no other member of the network is connected to all members of the subgroup. Clique analysis is the most common techniques used to identify dense subgroups within a network. Characteristics of complete networks are defined in terms of centrality. Centrality measures identify the most prominent actors within a network. It can be conceptualized as either local or global. Local centrality refers to the direct ties a particular node has, while global centrality refers to the number of direct and indirect ties of a particular node. Centrality is measured in terms of betweenness or degree. Betweenness refers to the number of times an actor connects different subgroups of a network that would otherwise not be connected. In figure 3 above, node 19 connects nodes 13, 8, 17, 12, 14, and 15 to the main network and serves as a prominent actor within the network. Its prominence is reiterated when degree centrality is considered. Degree centrality refers to the sum of all actors that are directly connected to an ego.

Node number 19 has a degree centrality of 9, which is the highest in the sociograph. The overall centralization measure refers to how tightly a graph is organized around its most central point. The measures of network structure that have been discussed above can then be use to parameterize predictive regression models that relate relational data to attribute data. For example, after generating measures of network structure using social network analysis methods, Lee et al used multivariable regression to evaluate associations between centrality measures and hospital characteristics.

Textbooks & Chapters

Scott J. Social network analysis: a handbook. Newbury Park: Sage, 2000. This book provides an introduction to social network analysis. It briefly reviews the theoretical basis of social network analysis, and discusses the key techniques required to conduct this type of analysis. Specifically, it discusses issues of study design, data collection, and measures of social network structure.

Carrington PJ, Scott J, Wasserman S. Models and methods in social network analysisCambridge: Cambridge University Press, 2005. This book provides a more detailed methodological approach to social network analysis. Chapter 2 provides a brief discussion about study designs, while chapter 3 focus on methods of data collection and model fitting.

Wasserman S, Faust K. Social network Analysis: methods and applications. Cambridge: Cambridge University Press, 1994.

M.E.J Newman. Networks. An Introduction. 1st edition Oxford University Press, 2010 This book is an introductory text that discusses social networks and social network analysis.

Methodological Articles

Social Network Analysis: A Methodological Introduction Author(s): CT Butts Journal: Asian Journal of Social Psychology Year published: 2008

Survey Methods for Network Data

Author(s): PV Marsden Journal: The Sage Handbook of Social Network Analysis Year published: 2011

The Art and Science of Dynamic Network Visualization

Author(s): S Bender-deMoll, DA McFarland Journal: Journal of Social Structure Year published: 2006

Dynamics of Dyads in Social Networks: Assortative, Relational, and Proximity Mechanisms

Author(s): MT Rivera, SB Soderstrom, B Uzzi Journal: Annual Review of Sociology Year published: 2010

A glossary of terms for navigating the field of social network analysis

Author(s): P Hawe, C Webster, A Shiell Journal: J Epidemiol Community Health Year published: 2004

Network analysis in public health: history, methods, and applications

Author(s): DA Luke, JK Harris Journal: Annual Review of Public Health Year published: 2007

Application Articles

A (very) Short Introduction to R

Author(s): P Torfs, C Brauer Year published: 2012

A comparative study of social network analysis tools

Author(s): Combe et al Journal: France: Web Intelligence & Virtual Enterprises, Saint-Etienne Year published: 2010

Software for social network analysis

Author(s): M Huisman, MAJ van Duijn Journal: Models and methods in social network analysis Year published: 2005

The spread of obesity in a large social network over 32 years

Author(s): NA Christakis, JH Fowler Journal: New England journal of medicine Year published: 2007

Is obesity contagious? Social networks vs. environmental factors in the obesity epidemic

Author(s): E Cohen-Cole, JM Fletcher Journal: Journal of health economics Year published: 2008

Detecting implausible social network effects in acne, height, and headaches: longitudinal analysis

Author(s): E Cohen-Cole, JM Fletcher Journal: Bmj Year published: 2008

Structural characteristics of social networks and their relationship with social support in the elderly: Who provides support?

Author(s): TE Seeman, LF Berkman Journal: Social Science & Medicine Year published: 1988

Social Network Analysis of Patient Sharing Among Hospitals in Orange County, California

Author(s): BY Lee, SM McGlone, Y Song, TR Avery, S Eubank, CC Chang, RR Bailey, DK Wagener, DS Burke, R Platt, SS Huang Journal: American Journal of Public Health Year published: 2011

Transmission network analysis in tuberculosis contact investigations

Author(s): VJ Cook, SJ Sun, J Tapia, SQ Muth, DF Argüello, BL Lewis, RB Rothenberg, PD McElroy Journal: J Infect Dis Year published: 2007

Description: R contains several packages relevant for social network analysis: igraph is a generic network analysis package; sna performs sociometric analysis of networks; network manipulates and displays network objects; PAFit can analyse the evolution of complex networks by estimating preferential attachment and node fitness; tnet performs analysis of weighted networks, two-mode networks, and longitudinal networks; ergm is a set of tools to analyze and simulate networks based on exponential random graph models exponential random graph models; Bergm provides tools for Bayesian analysis for exponential random graph models, hergm implements hierarchical exponential random graph models; 'RSiena' allows the analyses of the evolution of social networks using dynamic actor-oriented models; latentnet has functions for network latent position and cluster models; degreenet provides tools for statistical modeling of network degree distributions; and networksis provides tools for simulating bipartite networks with fixed marginals. Price: Free

Description: statnet is a suite of software packages that implement a range of network modeling tools. Price: Free

https://www.insna.org/ International Network for Social Network Analysis (INSNA) is a professional association for researchers interested in network analysis. The website contains SNA software descriptions, news, scholarly articles, technical columns, abstracts and book reviews. The site features graduate programs, courses, discussion forums, I-Connect, bibliographies and publications related to SNA. INSNA also provides a Journal of Social Networks and holds an Annual International Social Networks Conference and other SNA events.

Combe et al. (2010). A comparative study of social network analysis tools. France: Web Intelligence & Virtual Enterprises, Saint-Etienne

This article aims to describe the functionalities of social network analysis. In addition, the article explains and compares several of the widely used software tools that are dedicated to social network analysis. The software packages discussed in detail include Pajek, Gephi, NetworkX and igraph.

International Network for Social Network Analysis (INSNA) Website overview: International Network for Social Network Analysis (INSNA) is a professional association for researchers interested in network analysis. The website contains SNA software descriptions, news, scholarly articles, technical columns, abstracts and book reviews. The site features graduate programs, courses, discussion forums, I-Connect, bibliographies and publications related to SNA. INSNA also provides a Journal of Social Networks and holds an Annual International Social Networks Conference and other SNA events.

Website overview: Statnet is a suite of software packages for network analysis that implement recent advances in the statistical modeling of networks. The analytic framework is based on Exponential family Random Graph Models (ergm). statnet provides a comprehensive framework for ergm-based network modeling, including tools for model estimation, model evaluation, model-based network simulation, and network visualization. This broad functionality is powered by a central Markov chain Monte Carlo (MCMC) algorithm. statnet has a different purpose than the excellent packages UCINET or Pajek; the focus is on statistical modeling of network data. The statistical modeling capabilities of statnet include ERGMs, latent space and latent cluster models. The packages are written in a combination of (the open-source statistical language) R and (ANSI standard) C, and are called from the R command line. And because it runs in the R package ( www.r-project.org ), you also have access to the full functionality of R, including the packages "network" and "sna" written by Carter Butts. statnet has a command line interface, not a GUI, with a syntax that resembles R.

Host/program: University of Michigan/Coursera Course format: Online Software used: Gephia, Netlogo, R

Join the Conversation

Have a question about methods? Join us on Facebook

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Elsevier - PMC COVID-19 Collection

Big data analytics meets social media: A systematic review of techniques, open issues, and future directions

Sepideh bazzaz abkenar.

a Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

Mostafa Haghi Kashani

b Department of Computer Engineering, Shahr-e-Qods Branch, Islamic Azad University, Tehran, Iran

Ebrahim Mahdipour

Seyed mahdi jameii.

• A comprehensive systematic review on social big data analytic approaches is provided.
• The main methods, pros, cons, evaluation methods, and parameters are discussed.
• A scientific taxonomy of social big data analytic approaches is presented.
• A detailed list of challenges and future research directions is outlined.

Social Networking Services (SNSs) connect people worldwide, where they communicate through sharing contents, photos, videos, posting their first-hand opinions, comments, and following their friends. Social networks are characterized by velocity, volume, value, variety, and veracity, the 5 V’s of big data. Hence, big data analytic techniques and frameworks are commonly exploited in Social Network Analysis (SNA). By the ever-increasing growth of social networks, the analysis of social data, to describe and find communication patterns among users and understand their behaviors, has attracted much attention. In this paper, we demonstrate how big data analytics meets social media, and a comprehensive review is provided on big data analytic approaches in social networks to search published studies between 2013 and August 2020, with 74 identified papers. The findings of this paper are presented in terms of main journals/conferences, yearly distributions, and the distribution of studies among publishers. Furthermore, the big data analytic approaches are classified into two main categories: Content-oriented approaches and network-oriented approaches. The main ideas, evaluation parameters, tools, evaluation methods, advantages, and disadvantages are also discussed in detail. Finally, the open challenges and future directions that are worth further investigating are discussed.

1. Introduction

Social networking services (or social networking sites) are online platforms distributed across various computers over long distances. Millions of people all around the world use SNSs to upload photos, videos, update their current status, and post daily comments ( Arora et al., 2019 , Lai et al., 2020 , Alalwan et al., 2017 ). They can join social networks in two ways: People may sign in by searching the network or may be invited by friends ( Kumar et al., 2010 ). After being accepted by contacts, the inviter often invites the invitee’s contacts, so the network expands in this way. The rapid growth of online social network relationship sites (e.g., Facebook, Myspace), media sharing networks (e.g., YouTube, Instagram), microblogging (e.g., Tumbler, Twitter), have encouraged researchers to investigate the published contents and analyse users’ behaviors ( Feng et al., 2018 , Heidemann et al., 2012 ). A social network refers to structures among people or other social entities with the edges related to their associations ( Busalim, 2016 ). In this structure, nodes are considered as people (or things) in the network, and interactions are expressed via the edges or links among them. A social network originated in mathematical graph theory, which is defined as a graph, G= (V, E), in which V is a set of vertices or nodes that refers to people or objects, and E denotes a set of edges or ties indicating the relationships that connect the respective people ( Bello-Orgaz et al., 2016 ).

Traditionally, data about users’ interests and behaviors were collected by questionnaires. While this is still a prominent way in social science, the emergence and popularity of social networks have allowed us to collect data regarding users’ behaviors in an unprecedented way, where we collect social data directly from users’ social platform accounts ( Jamali and Abolhassani, 2006 ). By collecting data from Online Social Networks (OSNs) and analyzing them, researchers can study a different aspect of users’ behaviors and get valuable information ( Martinez-Rojas et al., 2018 , Cetto et al., 2018 , Go and You, 2016 ). Now social science researchers send relevant queries to online social networks with the Application Programming Interface (API) to extract a large amount of users’ data ( Manovich, 2011 ). Further, most popular social networks provide API that allows researchers to gather and assess data from a given social media service ( Lomborg and Bechmann, 2014 ). When the data is not accessible through API, researchers develop web crawlers to crawl OSN website, collect and extract data by using HTTP requests, manipulating, and responding to them ( Abdesslem et al., 2012 ).

Thus, SNA is a scientific approach to extract data and analyse the structural characteristics of networks both quantitatively and qualitatively. In other words, the challenge of SNA is to study and extract the relationships among individuals, different organizations, and communities, which is essential for managing and reducing the complexities of social networks ( Otte and Rousseau, 2002 ). Social network is an efficient way to collaborate and share knowledge among the core groups of the organization, research, and development units ( Cross et al., 2002 , Parveen et al., 2015 ). In this respect, the emergence of new social networks and an increasing number of social media users led to the explosion of user-generated contents (UGCs). Thus, it is crucial to know what this big data is and what insight can be gained from it ( Boyd and Crawford, 2012 ). Big data refers to a massive volume and complicated amount of data that traditional tools are not able to manage and process effectively ( Katal et al., 2013 , Terrazas et al., 2019 , Canito et al., 2018 ). Big data is different from “a large dataset” by the fact that the former is complex and has unique attributes, while the latter is a dataset with many records ( di Bella et al., 2018 ).

In order to find out the role and influence of big data in social networks, the features of big data are described by using 5 V’s, volume, velocity, variety, veracity, and value ( Hadi et al., 2018 ). Volume means a vast amount of data that can be produced every second ( Gandomi and Haider, 2015 ). Velocity stands for rapid generation of data, often referred to as streaming data ( Kitchin, 2014 ). Variety represents various types of data, including structured, unstructured, and semi-structured data like images, videos, and texts ( Sagiroglu and Sinanc, 2013 , Pei et al., 2018 ). Veracity deals with the truthfulness of the data analysed and the accuracy behind any information ( Bello-Orgaz et al., 2016 ). The value refers to the valuable information extracted for business and real values of the data ( Peng et al., 2017 ). All these five features are available on social networks, so the most important application of big data is in the field of social media, which refers to big social data (or social big data) ; the data are obtained from social networks. Big data technologies have produced new and exciting challenges in social networks ( Duan et al., 2019 ).

Till today, with our observation and scrutiny, some surveys and Systematic Literature Reviews (SLRs) were performed on social big data analytics, but no comprehensive SLR has been written on social big data analytics that complicates the identification and assessment of the existing approaches, challenges, and gaps precisely. Moreover, due to the importance of big data analytics in social networks, this study aims at providing a systematic and comprehensive review to identify the challenges, potential future directions, merits, and demerits of this field. On the other hand, the association between SNA and big data analytic approaches is shown in particular and a research plan is investigated. An SLR presents a comprehensive review of state-of-the-art to reveal existing methods, challenges, and potential future research directions for research communities ( Brereton et al., 2007 ). We conduct this SLR with the intention of identifying , classifying , comparing social big data analytic approaches, evaluating the methods of existing papers systematically, and offering a reasonable taxonomy . Additionally, to attain this intension and to answer the following research questions, this methodological review is conducted:

• Q1: What are the existing big data analytic approaches applied in social networks?
• Q2: What parameters do the researchers employ to evaluate the big data analytics in social networks?
• Q3: What are the tools used in social network analysis and big data areas?
• Q4: What are the social big data analysis applications in the studied papers?
• Q5: What are the datasets and case studies used in social big data analysis?
• Q6: What evaluation methods are applied to measure the big data analytic approaches in social networks?
• Q7: What are the challenges and future perspectives of big data analytic approaches in social networks?

We followed the guidelines in ( Brereton et al., 2007 , Kitchenham and Charters, 2007 , Jamshidi et al., 2013 , Jatoth et al., 2015 ) with the intension of exploring systematically, categorizing available social big data analytic approaches, and presenting a precise comparison analysis of approaches along with their potential challenges and limitations. This SLR presents a systematic review of the current studies on big data analytic approaches in social networks. For this purpose, 74 papers are chosen and compared to introduce a scientific taxonomy for the classification of big data analytic approaches in social networks. We summarize available methods, main ideas, applied tools, advantages, disadvantages, and evaluation parameters, and then provide statistical and analytical reports on them. Furthermore, this review identifies the motivation for presenting an SLR, outlines an abreast list of the primary challenges and open issues, and defines the significant areas where future research can improve the methods in the selected papers.

The remainder of this SLR is organized as can be seen in Fig. 1 . Section 2 discusses some related works and motivation. The research questions, the details of the selection process, and the research methodology are documented in Section 3 . Following, Section 4 provides a classification and a detailed study of the selected papers and demonstrates their main ideas, advantages, disadvantages, evaluation methods, tools, and evaluation parameters. 5 , 5.2 , respectively, disclose the analysis of the results, open issues, and future directions. Threats to validity and limitations are presented in Section 7 . At last, the conclusion is explained in Section 8 .

An external file that holds a picture, illustration, etc.
Object name is gr1_lrg.jpg

The structure of this SLR.

2. Related works and motivation

So far, there have been many reviews in the field of big data or social networks. However, the literature reviews conducted on this subject have some drawbacks. This section refers to several review studies that discussed social big data approaches.

2.1. The related studies on social big data analytic approaches

We explore the similarities and differences of the current reviews on this topic according to a systematic research, and the related works are summarized in surveys, and SLRs in 2.1.1 , 2.1.2 , respectively. Consequently, the weak points of these reviews are outlined in Section 2.1.3 . In Table 1 , a summary of the related works is illustrated in which such parameters as the main ideas, the review types, the paper selection processes, the taxonomies, open issues, evaluation parameters, applied tools, and the publication year of each study are represented.

Summary of the related works.

Type	Ref	Main idea	Pub. year	Paper selection process	Taxonomy	Open issue	Evaluation parameter	Applied tool	Covered year
Surveys	( )	Applications of information diffusion (IF) in social big data	2016	Not clear	No	Clear	Not presented	presented	Not mentioned
	( )	Big data and social indicators	2018	Not clear	No	Not clear	Not presented	Not presented	1957–2017
	( )	Big social media analytics	2018	Clear	Yes	Clear	Presented	Not presented	2011–2017
	( )	Employing social big data analytics in the economic field	2016	Not clear	No	Clear	Not presented	Not presented	Not mentioned
	( )	Analysis of social big data for GIScience	2019	Not clear	Yes	Not clear	Not presented	Not presented	Not mentioned
	( )	Relationship between social media and big data and the accounting function	2017	Not clear	Yes	Clear	Not presented	Not presented	Not mentioned
	( )	Social big data analysis	2016	Not clear	Yes	Clear	Not presented	presented	Not mentioned
	( )	Influence analysis in social big data	2016	Not clear	No	Clear	Not presented	Not presented	Not mentioned
	( )	Social big data mining	2015	Not clear	Yes	Clear	Not presented	Not presented	2010–2015
	( )	Big data mining in social media	2015	Not clear	No	Not clear	Not presented	Presented	Not mentioned
	( )	Big data analytics in social media	2017	Not clear	No	Not clear	Not presented	Presented	Not mentioned

SLRs	( )	Social media analytics within big data context	2018	Clear	Yes	Clear	Not presented	Presented	2008–2018
	( )	Predicting cyber-attacks on social big data	2019	Clear	Yes	Clear	Not presented	Not presented	Not mentioned
	( )	Firm-level innovation based on social big data analysis	2019	Clear	Yes	Not clear	Not presented	Not presented	1970–2018
	This Study	Big data analytics in social networks	2020	Clear	Yes	Clear	Presented	Presented	2013–August 2020

2.1.1. Surveys

Yaqoob ( 2016 ) surveyed the possible applications of Information Fusion (IF) in social media. They also discussed social big data processing technologies, similarities, and differences based on relevant parameters. Moreover, the challenges of applying IF and future research directions were presented. The authors reviewed several potential applications of IF, such as advanced marketing, fraud detection, social context-based recommendation systems, and an advanced feasibility study was performed for new businesses and optimal decision making. Findings showed that applying fusion increases the accuracy, reliability, and confidence. However, business intelligence, integration, sharing, security, and data sharing were not touched in the paper. Besides, this research did not mention a systematic structure and the paper selection process was not clearly indicated.

Furthermore, di Bella et al. ( 2018 ) analysed the metadata for Scopus database papers in the field of big data in 1957–2017. The authors found that actual tendencies in academic big data literature were not enough in the building of real-time indicators considering this massive volume of productions. This study was written in a non-systematic manner and there was a gap among its discussions in big data quality measures, privacy, transparency, and big data diffusion. Moreover, recently published papers in the years 2018–2019 were not considered.

In another study, Ghani et al. ( 2018 ) provided a survey on social network analysis and classified the literature based on data sources, characteristics, computational intelligence, techniques of analysis, and the quality of features from the published papers between 2011 and 2017. The characteristics of big data analysis were summarized into descriptive, diagnostic, predictive, and prescriptive analytics. The authors classified the big data analytic techniques into modeling, sentiment analysis, SNA, and text mining. The papers were categorized according to approaches, techniques, and qualitative features by authors. Although, the paper selection process was not mentioned, they provided a comprehensive perspective of the big social media analytic research topics, and several challenges such as data quality, data locality, velocity, data availability, and natural language processing remained unaddressed.

Many other researchers perused several social big data papers such as ( Bukovina, 2016 ) by reviewing technical analysis of social media to examine the behavior of capital markets, ( Martin and Schuurman, 2019 ) by surveying social media data for qualitative geographic analysis, ( Arnaboldi et al., 2017 ) by surveying the relationship between social big data analysis and the accounting function, ( Bello-Orgaz et al., 2016 ) by reviewing the big data analytic algorithms in social media and their applications, ( Peng et al., 2016 ) by conducting a survey to explore the architecture of influence analysis in social big data, and ( Guellil and Boukhalfa, 2015 , Gole and Tidke, 2015 , Paul et al., 2017 ) by surveying big data mining in social media.

2.1.2. SLRs

Moreover, Sebei et al. ( 2018 ) presented an SLR by considering journal and conference papers published between the years 2008 and 2018 to provide a clear description of the social network analysis process applicable to big data technologies. In addition to suggesting solutions, the authors identified the challenges encountered during big data analysis. The social network analytic processes, challenges, solutions, and big data tools related to each step were studied, but the relevant parameters for comparing big data-related technologies were not specified.

Finally, other social big data SLRs are conducted such as ( Al-Garadi, 2019 ) to detect cyber-attacks on social media via the aid of Machine Learning (ML) approaches, and ( Lerena et al., 2019 ) by reviewing firm-level innovations based on text-mining and social network analysis.

2.1.3. Concluding remark

Considering the overviewed papers, some weaknesses have been noticed as described below:

• Some studies have not mentioned the periods of reviewed papers explicitly. In this paper, besides mentioning the scope of the study and the time range of articles, recently published articles have also been considered.
• The lack of a systematic construction in the related papers made the selection process unclear.
• Some papers have not been properly classified or have not presented any taxonomies. However, this paper not only provides a lucid and visual classification, but also defines a subclass for each of them.
• Some studies have not analysed the assessment parameters and evaluation tools. This SLR presents applied tools, evaluation parameters, and evaluation methods of the studied papers.
• Some of the related papers have not concentrated open issues explicitly, and future challenges have been enumerated briefly and implicitly. The presented literature is intended to highlight open issues well and precisely.

2.2. The motivation for an SLR on social big data analytic approaches

The need for an SLR is to identify , classify , and compare the existing research reviews on big data analytics in social networks. In order to show that a comprehensive SLR has not been already proposed, we searched Google Scholar with the following search string:

According to the reasons mentioned in Section 2.1.3 , and considering Table 1 , most of the retrieved reviews were not conducted systematically, their paper selection processes were unclear, and they did not propose any lucid classification in their papers. To the best of our scrutiny, only three SLRs have been conducted on this topic ( Sebei et al., 2018 , Al-Garadi, 2019 , Lerena et al., 2019 ) none of which has provided a complete systematic review to investigate SNA techniques, tools, strengths, weaknesses, open issues, evaluation parameters, and the application and critical role of big data in social networks. The two most similar efforts are in ( Ghani et al., 2018 ), which is a survey not an SLR; It only covers journal papers between 2011 and 2017 and excludes conferences, and ( Sebei et al., 2018 ), which is an SLR, covers the works between 2008 and 2018, but does not present evaluation parameters used in each studied paper. In ( Al-Garadi, 2019 ), researchers only examined cyber-attacks and security issues in social big data, which differed from our paper, and the time range of studied papers was not specified. Additionally, open issues were not specified in ( Lerena et al., 2019 ) and researchers in ( Al-Garadi, 2019 , Lerena et al., 2019 ) did not investigate the evaluation parameters and applied tools; therefore, writing an SLR that covers these weaknesses and highlights open issues and future research directions precisely is timely.

3. Research methodology

Researchers have conducted various studies on social networks and big data , their applications, and their challenges. In order to accomplish a comprehensive study of big data analytic approaches, this section presents an SLR method of big data analytic approaches in social networks. An SLR is a methodology to identify, classify, assess, and synthesize a comparative overview of the state-of-the-art in a specific subject ( Brereton et al., 2007 , Kitchenham et al., 2009 ). In contrast to other types of review papers, an SLR is a process of presenting a taxonomical review and performing a methodological analysis of the research literature to find the answers to problems and the given research questions related to specific research topics. The SLR has been used for the first time in medical fields ( Aznoli and Navimipour, 2017 ) and can be conducted in any field of study for an accurate understanding, reducing bias, and identifying open issues and future directions ( Rahimi et al., 2020 , Haghi Kashani et al., 2020 ). Since most review articles on big data analytic approaches in social networks were written in unstructured procedures, the purpose of this paper is to provide a rigorous process of the methodological steps for researching the literature in this scope.

In this systematic process, a three-phase guideline, namely planning , conducting , and documenting ( Brereton et al., 2007 ) is adopted, as depicted in Fig. 2 . The review is accompanied by an external evaluation of the outcome of each phase. We first identify the questions and the needs that are the motivation of this SLR in the planning phase. Then the articles in this subject are selected based on inclusion/exclusion criteria in the conducting phase. Ultimately, in the documenting phase, the observations are documented, and the results are analysed, compared, and visualized, which yields the answers to the research questions, then the final reports are represented. The three phases of the research methodology that are followed in this SLR are discussed below:

An external file that holds a picture, illustration, etc.
Object name is gr2_lrg.jpg

Overview of research methodology.

3.1. Planning phase

Planning begins with the determination of the research motivation for this SLR and finishes in a review protocol as follows:

Stage 1- Specifying the research motivation. According to the contribution of this SLR that is justified by comparing the available reviews explained in Section 2.2 , the motivation is specified at the first stage.

Stage 2- Defining research questions. In the second stage, according to the motivation of this paper, the research questions are defined that assists the development and validation of the review protocol. The research questions are stated below. By finding the answers to the questions, available gaps on this subject can be found, which can facilitate reaching new ideas in documenting phase.

Q1: What are the existing big data analytic approaches applied in social networks?
Q2: What parameters do the researchers employ to evaluate the big data analytics in social networks?
Q3: What are the tools used in social network analysis and big data areas?
Q4: What are the social big data analysis applications in the studied papers?
Q5: What are the datasets and case studies used in social big data analysis?
Q6: What evaluation methods are applied to measure the big data analytic approaches in social networks?
Q7: What are the challenges and future perspectives of big data analytic approaches in social networks?

Stage 3- Determining the review protocol. According to the goals of this SLR, in the previous stage, the research questions and the review scope were identified to adjust search strings for literature extraction ( Brereton et al., 2007 ). Moreover, a protocol was developed by following ( Calero et al., 2013 ) and our previous experience with SLR ( Haghi Kashani et al., 2020 , Rahimi et al., 2020 ). To evaluate the defined protocol before its execution, we requested an external specialist for feedback, who was experienced in conducting SLRs in this era. His feedback was applied in the upgraded protocol. A pilot study (approximately 25%) of the included papers was performed to reduce the bias between researchers and to enhance the data extraction process. We also enhanced the review scope, search strategies, and inclusion/exclusion during the pilot stage.

3.2. Conducting phase

The second phase of the research methodology is conducting, starting with paper selection, and culminating in data extraction. This section aims to represent the process of searching and selecting papers conducted in the second phase of the SLR. The process of selecting papers consists of a three-step guideline as depicted in Fig. 3 .

• First step. The first step of the research process was searching through Google Scholar 1 as the dominant search engine based on well-known academic publishers such as Springer 2 , IEEE Explorer 3 , ScienceDirect 4 , SAGE 5 , Taylor&Francis 6 , Wiley 7 , Emerald 8 , ACM 9 , and Inderscience 10 based on titles and keywords. The search strings were defined as follows:

Inclusion/Exclusion criteria.


	Studies that focus on social big data analytics	Having a clear picture of big data analytic approaches in social networks
	Paper published online from 2013 to August 2020	The results of classical and fundamental literature on this subject have been mentioned in recent papers

	Short papers that are less than six pages	These studies do not provide us with enough information to be used in our research.
	Surveys and review papers.	These studies do not offer any reasonable, significant, novel solutions, and information.
	Unjudged papers or papers that are not in English	Because of not trusting the quality of the unjudged papers and not having a possibility to probe non-English papers, these papers were excluded.
	Book chapters and theses	The result of book chapters or theses are mentioned in journal and conference papers

An external file that holds a picture, illustration, etc.
Object name is gr3_lrg.jpg

Paper selection process.

• Third step. Finally, in the third step , the full texts of all selected papers were reviewed, and for further detailed analysis, 74 relevant papers were chosen, which could answer our research questions and fully describe the methods and challenges. Investigating 74 relevant papers assists us in proposing a classification on social big data analysis approaches in Section 4 and revealing the pros and cons of these approaches.

3.3. Documenting phase

As determined in Fig. 2 , in documenting phase, after documenting the observations, threats to validity and limitations are explored which is presented in Section 7 . Then the results are analysed, visualized, and reported in Section 5 .

4. Classification of the selected papers

In this section, 74 chosen papers are explored to examine social big data analysis objectives, techniques, and innovations; a review of the advantages and disadvantages of each approach is also presented. A taxonomy of the related literature is given in this paper, and the pictorial description of the proposed taxonomy for the reviewed papers is shown in Fig. 4 . Offering a taxonomy for social big data analysis is not a trivial and easy task. As researchers look at the problems in this area from various perspectives, each researcher performs this classification differently. By using this categorization, the reader can easily refer to each of these papers as a categorical reference. The selected papers use big data analytic techniques for analyzing social networks. These techniques are categorized into two major groups: Content-oriented approaches, and network-oriented approaches.

An external file that holds a picture, illustration, etc.
Object name is gr4_lrg.jpg

Taxonomy of social big data analysis.

Content-oriented approaches are classified into two subgroups, namely topical learning and opinion/sentiment learning. Topical learning can be performed in a single modal or a multimodal approach. Opinion/sentiment learning can be carried out in lexicon-based, learning-based, or hybrid approaches. Further, network-oriented approaches are classified into two groups: Embedding learning and community learning. Embedding learning has graph-based, non-graph based, and explanatory models, while, community learning is node-based or group-based. The papers relevant to content-oriented approaches and network-oriented approaches are reviewed in 4.1 , 4.2 , respectively. In this study, the methods of big data analysis on social networks are examined and evaluated with a list of important evaluation parameters. Further, the definition associated with evaluation parameters of the reviewed papers, as well as their formulas, is presented in Appendix A .

4.1. Content-oriented approaches

Nowadays, with the explosion of data in social networks that provides the researchers with a different type of contents instead of the traditional books and libraries, it is essential to analyse this immense volume of data. In this paper, the selected papers with topical learning and opinion/sentiment learning are reviewed in 4.1.1 , 4.1.2 , respectively. In 4.1.1 , 4.1.2 , classification of techniques, the definition of methods, and the related papers are discussed.

4.1.1. Overview of the topical learning approaches

In content-oriented approaches, topical learning focuses on the communication contents of social networks, consisting of text mining, video content analysis, and image analysis. It is the process of analyzing various types of unstructured data, like images, audio and video files, or different types of text including word, PDF files, PowerPoint slides, posts of weblogs and social network sites, or semi-structured data such as XML, HTML, JSON, and CSV files with the purpose of uncovering underlying similarities and hidden associations and transforming them into structured data for further analysis. The topical learning may be either performed “single modal” or “multi-modal” in which a “single modal” collects and analyses one modality (text OR audio OR image OR video) whereas “multi-modal” analyse a combination of various types of datasets such as text, audio, image, and video. According to the reviewed papers, the comparison between the specification and evaluation parameters is illustrated in Tables 3 and and4 . 4 . Table 3 summarizes the main ideas, advantages, disadvantages, evaluation methods, tools, and case studies along with their categories related to the papers in this approach. Table 4 presents a side by side comparison of the evaluation parameters in papers related to topical learning approaches.

Reviewing and comparing papers with topical learning approaches.

Category	Ref.	Main ideas	Evaluation methods	Tools	Case studies
Single modal	( )	Creating a linear network autocorrelation model	Real test bed	MySQL database, RMySQL, R studio, R programming language	The proed forum on www.reddit.com
	( )	Proposing a novel logic and flexible TS_u_Datalog	Example application	Not mentioned	Not mentioned
	( )	Presenting spatio-temporal big data analysis to detect real-time behavioral patterns during the flu season	Real test bed	Hadoop,Big R released by IBM,Sqoop,Apache Flume	Twitter,Cerner HealthFacts data warehouse
	( )	Presenting a password creation and validation system for social media platforms	Real test bed	C#, SQL Server 2014	LinkedIn
	( )	Proposing a new early warning system for adverse drug reactions	Data sets	Not mentioned	the online health community, MedHelp
	( )	Proposing a recommendation system using big data of user-shared images in social media	Simulation	Matlab	Skyrock,Sina Weibo,Flickr
	( )	Presenting a multiclass classification to reveal mental disorders by investigating people’s posts on Reddit website	Data sets	Python, Scikit-learn library	Reddit website
	( )	Presenting an algorithmic model employing social media analytics and statistical machine learning to predict cyber risks	Data sets	MySQL, Rweka package, RStudio (R Statistical software)	Twitter
	( )	Applying artificial neural networks and deep learning to predict Facebook posts	Data sets	Not mentioned	Facebook
	( )	Analyzing Turkish news on Twitter with Apache Spark	Data sets	Python,Apache Spark	Twitter
	( )	Presenting a framework for trend detection in social networks	Real test bed	Hadoop,Apache Drill,Apache Storm	Twitter
	( )	A novel face recognition framework in social networks based on ML	Data sets	Apache Giraph,Apache Hive	Facebook
	( )	Presenting a hybrid content-based cyberbullying detection model based on the metaheuristic approach in social networks	Data sets	Python	Twitter, ASKfm, FormSpring
	( )	Applying genetic algorithm in clustering social big data	Data sets	Hadoop, Java, Mahout	Twitter
	( )	Offering a framework for analyzing the video transcoding based on cloud	Real test bed	Hadoop, NoSQL, Amazon S (Amazon cloud storage provider),CLEVER (Cloud-Enabled Virtual Environment)	Not mentioned
	( )	Presenting a traffic event detection tool	Data sets	Apache Spark,MongoDB, Python	Twitter
Multimodal	( )	Introducing a private video recommendation system based on cloud and online learning	Simulation	Not mentioned	Sina microblog,Youku (video sharing site)
	( )	Proposing a content-centric networking architecture based on Monte Carlo Tree Search	Simulation	Not mentioned	Sina Weibo
	( )	Presenting a Facebook fake profile detection framework	Data sets	Weka	Facebook
	( )	Presenting a multi-modal microblog emotion analyzer based on deep learning	Data sets	Not mentioned	Sina Weibo

An overview of the evaluation parameters in papers with topical learning approaches.

Category	Ref.	Centrality Measures	Security	Accuracy	Precision	Recall	F-measure	Scalability	Time	Cost	ROC (AUC)	Specificity	Matthews correlation coefficient
Single modal	( )	✓
	( )								✓	✓
	( )			✓	✓	✓	✓	✓
	( )			✓				✓	✓	✓
	( )			✓	✓	✓	✓			✓
	( )			✓	✓	✓			✓
	( )			✓	✓	✓	✓
	( )			✓	✓	✓
	( )			✓					✓
	( )	✓						✓
	( )			✓		✓			✓
	( )			✓
	( )				✓	✓	✓				✓
	( )			✓			✓	✓	✓	✓
	( )								✓
	( )			✓	✓	✓	✓

Multi modal	( )		✓	✓
	( )		✓	✓				✓	✓	✓
	( )			✓	✓	✓	✓				✓	✓	✓
	( )			✓		✓	✓

In order to investigate the effects of social media on Eating Disorders (ED), Moessner et al. ( 2018 ) applied texts, linguistics, and lexical analysis with an unsupervised, bottom-up method to identify harmful posts. They did not investigate social media data in real-time, otherwise, the safety of ED-related communication could have been improved. Further, to execute the balance policies in the business application of social networks, Huo et al. ( 2018 ) presented a new logic Datalog. TS_u_Datalog was presented as the most appropriate logic Datalogs and a new programming language with both Active_U_Datalog and Distributed Temporal Logic (DTL) was introduced to implement contractual policies in a dynamic social media. The results of the time evaluation parameter of TS_u_Datalog could have been improved and used for blockchain systems, privacy-preserving of smartphones, and as a fault tolerance technique for wireless sensor networks.

To enhance health monitoring systems to detect infectious disease and to take preventive actions, Zadeh et al. ( 2019 ) presented a spatio-temporal platform to check out whether social posts could discover flu outbreaks in a particular area during the flu season. As some people do not activate their GPS or do not express their geographic locations in a social network profile, the geographic analysis cannot be done more deeply and accurately. More efficient ML techniques were needed to perform more in-depth analysis and to identify noise and unrelated social network posts. To recognize all repetitive and non-repetitive substring in passwords, Xylogiannopoulos et al. ( 2020 ) designed an efficient pattern detection system that can be embedded in social network platforms to generate a more robust and valid password. The results indicated that, contrary to common belief, long passwords are not safe, but passwords that are a combination of small/capital numbers and symbols are stronger than the others. This methodology did not have a limitation on the length and the type of characters. However, the proposed system could have been tested on other datasets, leading to different results.

In order to prevent the death caused by Adverse Drug Reactions (ADRs), Yang et al. ( 2015 ) used text classification to propose an automated framework to filter ADR related posts. A supervised learning method was applied to classify the extracted posts into positive/negative examples. The results of classification were used as an input to build an early warning system to prohibit future ADRs. Although the presented method generally outperformed in precision, recall, and F-measure, they did not extend their framework for various types of drugs. Furthermore, Cheung et al. ( 2015 ) presented a connection discovery system for follower/followee recommendations instead of user-generated tags and social graphs. They used Bag-of-Features Tagging (BoFT) to label user-shared images with BoFT labels, and a computer vision approach was employed to model the characteristics of user-shared images. In addition to the identification of user’s gender in the proposed system, the image classification performance was higher than K-mean, and there was no need to know K (the number of clusters in the clustering) in advance. However, the runtime of clustering and feature extraction was high. Subsequently, for more users and user-shared images, a big data system is required to manage and discover data.

Furthermore, to identify mental disorders in advance, Thorstad and Wolff ( 2019 ) scrutinized people’s every day mental and non-mental health topic posts on Reddit website. The outcome of the accuracy assessment indicated that people’s posts on clinical and non-clinical subreddits were highly and moderate predictive of mental disease, respectively. Also, it revealed that the predictions were more precise on recent past posts compared to distant past posts. The limitation was that posting a clinical post may not be a significant criterion for early diagnosis of psychological disorders, as some people may be affected by mental illnesses before posting. Besides, to identify vulnerabilities, Subroto and Apriyana ( 2019 ) offered an algorithmic model applying social media analytics and ML algorithms to protect cyber-attacks. Despite the highest accuracy of the model created by artificial neural networks, it was not scalable, having hardware limitations, and was tested only on a small sample of Twitter dataset, but the authors claimed that it did not affect the accuracy of the model.

Moreover, many other studies adopted clustering and ML algorithms in text mining and trending topics on big data of social platforms ( Straton et al., 2017 , Makaroğlu et al., 2019 , Vakali et al., 2016 , Aa et al., 2015 ). Also, researchers in ( Singh and Kaur, 2019 , Sachar and Khullar, 2017 ) proposed hybrid models by applying a metaheuristic approach to enhance the classification performance in the content analysis of social big data. Nowadays, as millions of users produce and share videos in various social media, Panarello et al. ( 2020 ) developed a framework for video transcoding processing in a short time. They applied Hadoop in their cloud federation framework to transcode videos to be compatible with sharing of users with different hardware/software devices. The evaluation results on real testbed demonstrated performance enhancement in terms of speed, scalability, and transcoding time, but security and privacy issues were neglected.

Alomari et al. ( 2020 ) developed a methodology based on text mining by using big data technologies for road traffic detection from Arabic tweets. The authors applied three machine learning algorithms, namely Logistic Regression, Support Vector Machine, and Naïve Bayes for classifying eight types of events. The evaluation results showed enhancement in text processing, leading to more accurate event detection with no prior knowledge about those events. However, this methodology could also be used to identify events other than road transportation. They did not focus on improving scalability and data management of the proposed method.

Zhou et al. ( 2016 ) proposed a private video recommendation system based on distributed online learning. Multimedia such as images, audios, and videos produced by users were sent and stored in remote and decentralized data centers. The user’s context vectors were extracted by BOFT (bag-of-features tagging) and converted into distributed video service servers. At last, the recommended video was transferred to multimedia applications in online social networks. The evaluation results on real datasets in Sina microblog and Youku, a video sharing site (VSS) in China, achieved sublinear regret bound and established a trade-off between the performance loss and the privacy protection level. However, for simplicity, a small dataset was chosen in those social networks, so it suffered from low scalability. In another study, Feng et al. ( 2018 ) proposed a Content-Centric Networking (CCN) architecture based on the Monte Carlo Tree Search (MCTS) algorithm. Since the volume and variety of both users and contents are rapidly growing, the MCTS algorithm solved the accurate content push problem in big data. Their algorithm outperformed in the experimental results of push accuracy, scalability, and robustness of users’ arrivals in Sina Weibo on an offline dataset. Although the proposed architecture could evaluate the performance in a real-world CCN-based social media, energy efficiency was neglected.

Sahoo and Gupta ( 2020 ) proposed a framework to distinguish fake profiles on Facebook. The authors applied various ML algorithms along with content analysis and account-based features to detect suspicious accounts from genuine ones. The evaluation results indicated that the presented framework gave the best outcome in terms of accuracy, precision, recall, F-measure, and Matthews’s correlation coefficient, but they did not evaluate the responding time of the presented approach. Moreover, applying this approach on other platforms such as Twitter and Google + or adding an aggregator module for comparing various account features and their activities may lead to different results. Since various microblogs contain videos, emoticons, and pictures as well as texts, Zhang et al. ( 2019 ) proposed a multi-modal emotion analyzer based on deep learning. The authors applied a two-way Long and Short Term Memory network (LSTM) model to integrate contents and user’s features. The offered model attained a higher accuracy, precision, and F-measure compared to previous models, but users’ personalities were not considered and in the proposed model, user-based emotions could not be classified.

4.1.2. Overview of the opinion/sentiment learning approaches

In this section, the selected papers with opinion/sentiment learning approaches are reviewed. Opinion/sentiment learning approaches entail Natural Language Processing (NLP) to extract opinions from the text and classify the polarity of subjects into positive, negative, or neutral to determine what they are talking about and to identify the public group perception. With the help of sentiment analysis, opinions about products, services, brands, politics, or any topic that people care about are extracted. These data can be used in many applications like marketing analysis, product reviews and feedback, emotion detection, intent analysis, customer support and services, social media monitoring, and brand monitoring ( Shirdastian et al., 2019 ).

By reviewing papers relevant to opinion/sentiment learning, we recognized three methods, namely lexicon-based, learning-based, and hybrid approaches, employed to extract and analyse opinion/sentiment in social media contents. In lexicon-based approaches, a set of predefined lexical wordlist, corpus, and dictionaries are used to extract subjectivity, the orientation, and the polarity of opinions and sentiments. Learning-based approaches utilize various ML algorithms (supervised or unsupervised) to classify text into positive or negative classes. Moreover, some of the reviewed papers combine both learning-based and lexicon-based approaches that mentioned hybrid approaches. Table 5 depicts a comparison of the selected papers with opinion/sentiment learning approaches. It includes main ideas, advantages, disadvantages, evaluation methods, tools, and case studies along with their categories. In some studies, the applied tools for analyzing and implementing approaches have not been mentioned. Table 6 shows the parameters used by papers relevant to opinion/sentiment learning approaches to evaluate the intended methods.

Reviewing and comparing papers with opinion/sentiment learning approaches.

Category	Ref.	Main ideas	Disadvantages	Evaluation methods	Tools	Case studies
Lexicon-based	( )	Presenting a fake review detection framework for sentiment analysis of social networks		Data sets	R language,Set of NLP tools such as Stanford CoreNLP, OpenNR, Tidytext, Afinn sentiment lexicon	Amazon website
Lexicon-based	( )	Introducing a sentiment computing method based on the social media big data		Data sets	Not mentioned	Sina microblog

Learning-based	( )	Introducing a data integration approach based on calibration		Data sets	GeNIe Software V 2.1,R package ROSE	San Francisco international airport passengers dataset, Skytrax dataset
	( )	Proposing a two-stage big data and ML framework to analyse social media content		Data sets	Spark, Python, MySQL, Natural Language Toolkit (NLK & Pandas package)	Tourism data from Yelp dataset (Yelp.com)
	( )	Analyzing the opinions of users on COVID-19 epidemic on microblog		Data sets	Python	Sina Weibo
	( )	Presenting an ML model to analyse the tweets of English national team fans during 2018 FIFA world cup		Data sets	Python	Twitter
	( )	Offering a framework to explore brand validity sentiments		Data sets	Python	Twitter
	( )	Presenting a real-time processing system to analyse stock market tweets		Data sets	Apache Spark, Apache Kafka	Twitter
	( )	Presenting a metaheuristic approach in sentiment analysis of tweets		Data sets	Apache Spark	Twitter
Hybrid	( )	Presenting a framework to examine the relationship between volatility in the stock markets and UGCs		Data sets	Matlab	Twitter
	( )	Offering a new methodology for sentiment analysis to explore the impact of social sensing on weather events		Data sets	Python	Twitter
	( )	Introducing a distributed and parallel parsing system on the MapReduce framework		Data sets	Hadoop, Java	KISTI, NDSL
	( )	Analyzing tweets regarding vaccination sentiments and their trends in Twitter		Real test bed	Hadoop, Mahout	Twitter
	( )	Mining tweets that contain reporting on drug side effects		Data sets	Apache Spark’s ML library (MLlib), Python	Twitter
	( )	Presenting a sentiment analysis framework by applying ML techniques		Data sets	Apache Spark’s ML library (MLlib)	Twitter
	( )	Designing a microblog abnormal emotion detection model based on the neural network and CNN-LSTM	(Improving the threshold selection which is a time-consuming process)	Data sets	Not mentioned	Sina Weibo
	( )	Presenting a mechanism to gather and to envision social media information for big data		Data sets	Apache Flume, Hadoop, Java platform	Twitter,Facebook,Amazon dataset,Kaggle dataset
	( )	Proposing a topic classification and sentiment analysis framework		Data sets	Hadoop platform,Apache Flume,Apache Hive	Twitter

An overview of the evaluation parameters in papers with opinion/sentiment learning approaches.

Category	Ref.	Accuracy	Precision	Recall	F-measure	Scalability	Time	Cost
Lexicon-based	( )	✓		✓	✓
Lexicon-based	( )	✓

Learning-based	( )	✓
	( )	✓	✓	✓	✓
	( )						✓
	( )	✓	✓	✓	✓
	( )	✓	✓
	( )	✓				✓
	( )	✓					✓

Hybrid	( )	✓
	( )	✓			✓
	( )	✓	✓	✓		✓	✓	✓
	( )					✓
	( )	✓			✓	✓	✓
	( )				✓
	( )	✓		✓	✓
	( )	✓	✓	✓	✓	✓	✓
	( )	✓	✓	✓	✓		✓

Kauffmann et al. ( 2019 ) offered a modular framework for qualitative interpretation of UGC by employing NLP techniques and applying cosine similarity measures to recognize fake reviews. Their Fake Review Detections Framework (FRDF) utilized NLP techniques to discover similarities between reviews and eliminate fake and unreliable reviews of a product. The major weakness was that FRDF set a threshold in the cosine similarity measure to detect fake reviews; other thresholds or other sentiment analysis tools, except for lexicon Afinn, may produce different outcomes. Furthermore, Jiang et al. ( 2017 ) suggested a method for performing sentiment computing of the news event in social big data. First, a Word Emotion Association Network (WEAN) was constructed to compute both word and text emotions at a specific time. After dividing emotions, a questionnaire was designed to collect ideas about the six-dimensional sentiment emotion of emoticons, and emoticons were used to calculate the emotions of each sentence. Second, based on WEAN, a word emotion computation algorithm was presented to get the primary word emotions. Then an emotional refinement algorithm was offered by employing the standard emotional thesaurus to improve the sentiments of news with high accuracy, but emotion distance and word emotion patterns were not considered into text sentiment computations.

Moreover, Dalla Valle and Kenett ( 2018 ) presented a new approach to integrate online review data with customer survey data. The sentiments of online users were calibrated with customer surveys by resampling and merging data via Bayesian networks in their method. This approach was used in various areas, and the data integration between online blogs and customer satisfaction led to enhancement in sentiment analysis. However, it did not consider methods for integrating vast data sources to enhance the accuracy of results. In addition, Jimenez-Marquez et al. ( 2019 ) presented a two-stage framework to analyse UGC in social media. The first stage, which aimed at managing big data and processing UGC, built a Machine Learning Model (MLM). The second stage, which took MLM of stage one, involved a series of layers to build a big data architecture that analysed unstructured and heterogeneous data. The proposed framework was superior to its competitors in both quantitative and qualitative analysis. Despite high accuracy, better results may be obtained by applying the integration of advanced ML algorithms on different domains.

Despite the advancement and development of medical science, COVID-19 is the most perilous disease of the 21st century around the world, which is a critical threat to the physical and mental health of individuals. In this respect, Zhu et al. ( 2020 ) analysed the topics about COVID-19 in Weibo from January 24 to February 25, 2020. The authors tried to grasp the opinions of users about the epidemic from a temporal and spatial perspective in China. However, the study had some drawbacks. The spatial perspective of opinion analysis was limited to a provincial region. The age and gender of Weibo users were not considered, so they were not reflected in the analysis results. Moreover, since some users did not apply Sina Weibo to express their opinions, the result cannot be generalized. Thus, employing a high volume of data may lead to more predictive and accurate opinion analysis for relevant organizations in emergency conditions.

Fan et al. ( 2020 ) introduced a novel method for exploring real-time sentiments, team identification, and national identification of tweets during the 2018 FIFA world cup. The authors observed how the sentiments of fans’ tweets in two matches (England vs. Croatia and England vs. Colombia) fluctuated during the match. They applied python and ensemble methods not only to design a model with high accuracy for sentiment analysis at different temporal points during the match, but also to analyse emojis as well as their valence. However, since 4% of the collected tweets were in Spanish and Croatian and the ML approaches cannot perform properly on Multilingual datasets, their method had low reliability. Moreover, they only analysed two English competitions, other international matches or other countries were not considered. Finally, all ML techniques did not have the ability to analyse the available sarcasm in tweets, so the results attained a low level of precision, recall, and F-measure.

Shirdastian et al. ( 2019 ) presented a framework to explore brand validity and their sentiment polarity both qualitatively and quantitatively. The authors explored opinion and sentiment polarity towards brand validity on Twitter dataset in terms of uniqueness, heritage, quality commitment, and symbolism. The study results indicate the enhancement of the proposed framework in precision and accuracy to find out the brand authenticity by exploring the related brand sentiments. The main drawback of this study was that neither was the variation of sentiments over time explored, and nor was the sentiment mining of bot-created brands excluded. Sayed et al. ( 2020 ) presented a hybrid approach that applied a combination of ML and lexicon techniques for sentiment analysis of tweets. The authors suggested a new metaheuristic approach based on Particle Swarm Optimization (PSO) and K-means to optimize data clustering. They evaluated their approach on four Twitter datasets with various topics employing spark streaming, leading to better accuracy in real-time analytics compared to previous approaches, but deep learning methods probably may lead to more accurate predictions.

To examine the relationship between volatility in the stock markets and UGCs, van Dieijen et al. ( 2020 ) presented a framework through the use of multivariate regression analysis and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model. The results showed the asymmetric impact of UGC on volatility, which means negative comments, compared to positive ones, increased volatility and had a significant effect on customers. For future research, scaling up may lead to practical implementation. Spruce et al. ( 2020 ) presented a new methodology for exploring the impact of social sensing and social data sentiment analysis of real-world events on named storms in the United Kingdom and Ireland. The authors collected tweets posted in winters 2017 and 2018. Then time zone, bot, and weather-related filters were applied to extract data related to weather incidents. By analyzing the sentiments of tweets during extreme climate events, the effects of weather incidents and their social impacts in terms of physical, emotional, spatial, and temporal perspectives were revealed and enhanced. The main limitation of this study was low scalability due to the small number of tweets retained in filtering weather-related tweets after the collecting phase. Further, the results were somewhat unreliable due to applying the python’s sentiment analysis package (TextBlob) which has a training corpus based on movie review datasets.

Um et al. ( 2013 ) introduced a distributed and parallel parsing system based on MapReduce to analyse users’ sentences in social sensor networks. To conduct the study, a Stanford parser with loose coupling was applied, which led to high scalability. Due to the parallel environment, the parsing time was low, the proposed system had high precision and high portability. The main limitation was that the actual data of social sensor networks like Twitter was not considered, and technical sentences were not analysed in the same way as ordinary users’ phrases were. Moreover, researchers in ( Baltas et al., 2016 , Lee and Paik, 2017 , Moise, 2016 ) employed ML along with NLP for opinion and polarity mining of social big data in sentiment analysis that were applied for various decision-making purposes including marketing or health care issues like reporting drug side effects. In order to conduct sentiment analysis on a microblog big data platform, Sun et al. ( 2018 ) presented a model called Convolutional Neural Network-Long-Short Term Memory (CNN-LSTM). Each type of emotion was modeled through a Single Gaussian Model (SGM). The authors used CNN for extracting local attributes and LSTM as a global attribute extractor. The findings indicated that the sentiment of social language performed through CNN-LSTM model achieved high accuracy, but time was neglected in their model, and threshold selection was still taking too much time.

Also, BalaAnand et al. ( 2019 ) presented a mechanism to collect contents from social media by utilizing big sheets, big vision schemes, and sentiment assessment. In addition to Deep Learning Modified Neural Network (DMNN), which was used to investigate sentiments, the Modified Threshold-based Cuckoo Search Algorithm (MTCSA) was applied as a heuristic search algorithm for weight optimization. The experimental results revealed that the proposed Deep MNN outperformed in terms of reliability, robustness, scalability, accuracy, precision, recall, F-measure, and computational time in comparison with other algorithms, but the cost of the proposed method was not assessed. For topic classification and sentiment analysis of social big data, Rodrigues and Chiplunkar ( 2019 ) presented a distributed Hadoop framework. Additionally, the Bag-of-words method was used to classify the relevant tweets into six different groups. Then four various NLP methods, namely Lexicon uni-gram, bi-gram Lexicon, uni-gram NB, bi-gram NB, and Hybrid Lexicon-Naive Bayesian Classifier (HL-NBC), were employed. HL-NBC was more effective and outperformed other classifiers in terms of accuracy, execution, and response time. However, separating and classifying sarcastic sentences and cross-lingual opinions for sentiment analysis were still unsolved challenges.

4.2. Network-oriented approaches

Network-oriented approaches analyse big social data based on nodes or entities and their relations within social networks. Network-oriented approaches are classified into two groups: Embedding learning and community learning. We review the selected papers with embedding learning and community learning approaches in 4.2.1 , 4.2.2 , respectively. In 4.2.1 , 4.2.2 , the classification of techniques, the definition of methods, and the related papers are discussed.

4.2.1. Overview of the embedding learning approaches

Some of the reviewed papers presented embedding learning that focused on extracting valuable information about users and nodes inside a network for link prediction, influence analysis, and information diffusion in social networks. Social influence means an individual’s ability to influence another user in a network; the more influential a person is, the more followers he will have ( Kumaran and Chitrakala, 2017 ). The embedding learning approach aims to analyse a network based on users and their features and model the process of information diffusion on online social networks through learning user’s characteristics and dissemination of information among users. Embedding learning approaches try to find the influence of different nodes in a network by identifying the position of a node in a path or a number of paths in which it occurs; the node that is most often in the center of a network and has more paths is more influential.

In the aspect of predicting the underlying diffusion process, three categories are distinguished in embedding learning approaches: Graph-based, non-graph based, and explanatory. Graph-based and non-graph based are kinds of predictive models in which, by investigating the previous information propagation, the information dissemination is predicted from spatial or/and temporal points of view. Graph-based approaches focus on the static and graphical structure of a network in which information is transmitted and predicts who influences whom. In this approach, each node can be activated or deactivated, such as Independent Cascades (IC) and Linear Threshold (LT), while in non-graph based approaches, the topology and structure of a network are not taken into account and each node is randomly connected to other nodes in the network with an equal probability such as epidemic models, Linear Influence Model (LIM) and Partial Differential Equations (PDEs). The main goal of explanatory models is to infer the information propagation path and to show how the information is propagated in social networks. Propagation characteristics such as pairwise transmission rate, pairwise transmission probability, and cascade properties are explored in this model whereas the network in which information diffusion takes place is unknown.

This section presents the selected papers with embedding learning approaches. In addition, the selected papers that use this approach in social big data analysis are reviewed. Finally, they are compared and summarized in Table 7 , Table 8 . Table 7 compares them in terms of main ideas, advantages, disadvantages, evaluation methods, tools, and case studies along with their categories. In some studies, the applied tools for analyzing and implementing the intended approach were not mentioned. The evaluation parameters are also specified in Table 8 .

Reviewing and comparing papers with embedding learning approaches.

Category	Ref.	Main ideas	Evaluation methods	Tools	Case studies
Graph-based	( )	Introducing a social influence rank-based determination method on big data streams in online social networks	Data sets	Python,Hadoop, MongoDB	Twitter
	( )	Introducing an influence maximization and diffusion algorithm	Real test bed	Apache Storm, Apache Spark, Microsoft Azure HDInsight	Yahoo Flickr Creative Commons 100 Million (YFCC100M)
	( )	Presenting an information-dependent embedding based diffusion prediction model	Real test bed	Not mentioned	Digg,Meme tracker,GOOGLE +
	( )	Introducing a network-based model to predict disease activity across geographical locations	Real test bed	Not mentioned	Twitter
	( )	Presenting a heuristic approach to maximize influence in social networks	Simulation	Not mentioned	Political blogs, Netscience dataset
	( )	Proposing a heuristic model for minimizing viral marketing costs in social networks	Simulation	Python	Facebook, Epinions
	( )	Presenting a topic-aware influence maximization model based on cloud computing	Simulation	Not mentioned	NetHEPT,Epinions,DBLP,LiveJournal,Friendster

Non-graph based	( )	Offering a protection and recovery model, examining the influential users, and studying virus propagation	Simulation	Matlab	Undirected network BlogCatalog and directed network As-level network
	( )	Proposing an algorithm and calculation model for searching the relationship between nodes, big data, and small data	Simulation	Not mentioned	Population map of Beijing city in China
	( )	Presenting mobile nodes to explore and limit the spread of rumors in social networks	Simulation	C#	Facebook
	( )	Introducing an immunization framework for mobile social networks	Simulation	C#Simulator	Largest Cellular Network in China
Explanatory	( )	Proposing a financial credit scoring model which uses mobile phone data and social network analytics	Data sets	Not mentioned	CDR data of cell phone numbers and the data bank of customers that both operate in the same country
	( )	Proposing mathematical models to compute the probability of staying in social network and FIAEC	Real test bed	Not mentioned	Facebook
	( )	Introducing a framework to deliver mobile social data over content-centric mobile social networks	Simulation	Not mentioned	Not mentioned
	( )	Recognizing the influential user on Twitter by applying the number of followers and friends	Real test bed	R, Hadoop, Python	Twitter
	( )	Analyzing real-world device-to-device datasets in mobile social networks	Real test bed	Apache Spark, Apache Kafka, Hadoop	Not mentioned
	( )	Analyzing the impact of various sampling approach on the influence diffusion on social big data	Real test bed	Not mentioned	Twitter
	( )	Suggesting a depression detection framework by applying ML techniques	Data sets	Apache Spark, R programming language	Facebook
	( )	Proposing two social network measures of communicative activities to characterize information diffusion	Real test bed	Gephi (Network analysis software)	Twitter discussion of TTIP in Europe

An overview of the evaluation parameters in papers with embedding learning approaches.

Category	Ref.	Accuracy	Precision	Recall	F-measure	Scalability	Time	Cost	Influence Diffusion	ROC (AUC)	Kappa	Security
Graph-based	( )	✓				✓	✓		✓
	( )	✓		✓		✓	✓	✓
	( )		✓				✓		✓
	( )	✓			✓
	( )					✓
	( )						✓	✓
	( )					✓	✓	✓

Non-graph based	( )						✓	✓
	( )						✓	✓
	( )						✓
	( )								✓			✓

Explanatory	( )	✓
	( )							✓
	( )						✓
	( )					✓
	( )					✓	✓
	( )	✓						✓	✓
	( )	✓	✓	✓	✓					✓	✓
	( )								✓

Kumaran and Chitrakala ( 2017 ) offered a social influence method based on rank-sampling approach. After collecting Twitter’s data, parallel information diffusion modelling, which took the users’ queries as input, determined forwarding nodes and calculated the path of information flow. The next portion was influential spreader ranking, which took a search query and applied topological and users’ attributes to calculate users’ feature scores. At last, two solutions were provided for an influence maximization problem. Ranking-based sampling, MapReduce, and parallel processing were applied to ensure accuracy and time reduction, respectively. Despite scalability, the sample size was considered fixed, so an approach that could define the most appropriate sample size was needed to be performed.

In another research, Persico et al. ( 2018 ) analysed the efficiency of two big data architectures, namely Lambda and Kappa. Although the size of the dataset affects the performance, both architectures provided good scalability, but in case of increasing input size, Lambda had higher performance than Kappa due to its in-memory computation. Findings indicated that the deployment for Kappa with the same number of executors was more expensive than Lambda. Besides, in both architectures, the performance was improved when the algorithm was executed on more massive clusters. In case of virtual machines (VMs) characteristic enhancement (or with resource-richer nodes), Kappa significantly improved the performance (vertical scaling). In general, reports showed that Lambda performed better, and both architectures supported social network applications properly. To predict information diffusion in the content of social big data, Gao et al. ( 2017 ) offered an efficient Information-dependent Embedding Based Diffusion Prediction (IEDP) model. They also extended a typical margin-based optimization algorithm and presented an efficient learning algorithm based on Stochastic Gradient Descent (SGD). The complexity of the proposed model was significantly reduced, but the social structure was not considered in their proposed embedding model.

Additionally, for illness control and prediction in advance, Elkin et al. ( 2017 ) introduced a network-based approach for modeling illness activity and generated predictions about ILI based (Influenza-Like Illnesses) across geographical locations. This prediction model could help with illness control and provided predictions for one week in advance. Meanwhile, it was unsuccessful with airline traffic data in predicting ILI activity across geographies and had a low level of scalability, and except for geographical locations, other factors such as weather patterns or low population density were not considered. By discovering more factors, the model could have been stronger. Moreover, a heuristic model called PRDiscount was proposed in ( Wang et al., 2014 ) to select the first seeds for maximizing the influence diffusion in social networks. On the contrary, Talukder and Hong ( 2019 ) introduced a heuristic mixed approach to minimize and optimize viral marketing costs in social media.

Since nowadays social networks have a great impact on the dissemination of information and users’ comments and on individuals’ daily lives, Chen et al. ( 2020 ) suggested a topic-aware influence maximization model based on cloud computing. They employed a sketching technique along with a greedy algorithm to discover the optimal top-k seed users that maximize the influence of information being spread within a network. Compared with available influence maximization approaches, the proposed approach achieved low running time and low storage, but a limited number of evaluation parameters were applied to verify the accuracy of the model.

Moreover, to discover the influential users, Wu et al. ( 2020 ) offered a Protection and Recovery Strategy model (PRS) to study the propagation of the virus in social networks. In the proposed mechanism, the users were divided into five groups based on their reactions to the virus: Susceptible, Contagious, Doubt, Immune, and Recoverable (SCDIR). The PRS model made it possible to control viruses and to reduce infected users. Despite the low running time and low cost of the model, a fixed number of nodes and connections were assumed; the dynamic changes in a number of nodes and their connections may lead to different results. Wu et al. ( 2018 ) suggested a model to search small data and to compute the effect of small data nodes to use them instead of big data. They believed that obtaining small data leads to a reduction in the complexity of big data. Results showed that 1% of small data could connect 15% of communication nodes, and 20% of small data could broadcast 80% of data packets, so the other nodes were in waiting status. Although complexity was decreased and the delivery ratio was improved, a new algorithm was needed to establish a trade-off between reliability, delivery ratio, delay, and the use of limited network resources.

Wu et al. ( 2018 ) presented a developed model to recognize and restrict the process of rumor dissemination among users by considering all the users’ behaviors. A time threshold was dedicated to each user to indicate the delays in users’ reactions. The authors suggested a mobile node to propagate authorized information to decrease the penetration of rumors. They simulated the proposed model on the Facebook dataset to investigate the influence of speed, arrival time, and strategies of the mobile node on rumors. The speed and the strategy of mobile nodes could not reduce the spread time point of rumors earlier, but in general, it reduced the spread time of rumor; therefore, the best solution to detect rumors is to send mobile nodes to neighbor nodes with the highest degree.

Furthermore, to prevent the spread of malwares, Peng et al. ( 2017 ) presented a big data-based framework in which social interactions were transformed into a bidirectional weighted graph that displayed people’s daily SMSs/MMSs. Moreover, social influence, involving direct and indirect influence, was measured. Then a set of immunization algorithms were designed, and the Susceptible Infectious Recovery (SIR) model was developed because the top k influential nodes had more influence on the distribution of malware propagation. Thus, based on the presented immunization strategy, the top k influential nodes were minimized; meanwhile, it did not detect social media malware in real-time.

In order to improve the statistical and economic performance of credit scoring applications both, Óskarsdóttir et al. ( 2019 ) employed personalized Page Rank (PR) and SPreading Activation (SPA) methods on Call-Detail Records (CDR), credit and debit account information. The results showed that the features of calling behavior were most effective, and the information extracted from CDR data in terms of “value” facilitated financial prediction. The major challenge was how to maintain privacy-preserving of customer’s data. Moreover, only one type of credit was analysed; other types of credits may lead to different results.

Furthermore, Raj and Babu ( 2015 ) proposed Firefly Inspired Algorithm for Establishing Connections (FIAEC) and mathematical models for computing the probability of staying in social networks. The goal of this algorithm was to maximize the number of connections concerning n individual in social network sites. By using the proposed algorithm, the number of connections was increased, and so did the interaction between connections. On the other hand, FIAEC was not scalable, and it was only tested for a sample size of 10,200 and 600.

Su et al. ( 2016 ) studied the characteristics of mobile big data and presented a new framework to spread these data over content-centric Mobile Social Networks (MSNs). To resolve volume, variety, control, and manage mobile big data challenges, the framework was delivered over CCNs. Findings showed that a low value of weight coefficient for a data packet led to a low delay. As their proposed framework was based on static characteristics, it did not consider dynamic mobile social users and was tested on a limited number of users, so it was not scalable. The limited resource allocation, such as bandwidth and buffer space, was not considered, and security was not maintained for the data stored out of their own mobile devices. In addition, to recognize the influential users, Kumar et al. ( 2016 ) developed a methodology by applying the number of friends and followers of accounts. In another study, Zhang et al. ( 2017 ) analysed an offline device-to-device dataset in mobile social big data and pushed interesting contents to the most influential users.

Besides, Xu et al. ( 2015 ) investigated the impact of various sampling approaches on the distribution of tweets and measured retweets to identify the influence diffusion in social network analysis. Since a notable amount of data in social networks are related to people who declare their opinions and thoughts, Yang et al. ( 2020 ) offered a social big data analysis framework to diagnose depression efficiently. The authors applied a large Facebook dataset to evaluate the proposed framework by investigating the effect of both friendship influence and users’ intentions and interactions on users’ mental health. They evaluated the performance of the framework with a various subset of social and user-level features to indicate that the users' social interactions with their friends on social networks could show their mental states. Unlike other researchers, to analyse friendships’ influence, both indirect and direct neighbors of a user were investigated; however, the topics of users’ posts were not considered as well as various genders, age groups, and their depression risk level.

Additionally, in order to investigate the diffusion structure of networks, Maireder et al. ( 2017 ) presented two new social network measures, namely Audience Diversity Score (ADS) and Communication Connector Bridging Score (CCBS). ADS identified the diversity of a particular actor’s followers, and CCBS highlighted the account that bridge and diffuse information throughout the entire network. The results demonstrated that the network was not divided by a unique factor but by a set of influential ones, like language, geo-identity, and political trends. Despite the advancement in communication patterns, the contents and types of tweets broadcast across the network were not analysed. Moreover, ADS and CCBD measures were not combined to detect the two-factor interaction in the spread of information.

4.2.2. Overview of the community learning approaches

As we stated earlier, social networks comprise a set of vertices or nodes in which nodes stand for users and individuals, which are associated with one another through numerous edges that represent their relations and interactions ( Leung and Zhang, 2016 ). “Community” is referred to as groups of individuals who have similar interests, attitudes, or common characteristics ( Wu et al., 2018 ). From the social aspect, detecting groups of individuals in a network on structural and topological properties is known as community learning which is crucial for various perspectives in society such as business and recommendation systems. Thus, it leads to innovative approaches for identification of communities that can be carried out in micro (micro-communities) or macro (macro-communities) network structural features. In community detection, the assumption is that people in one community interact more with one another because of the similarity of interests among them compared with other communities, so the network is divided into various communities.

In community learning, after identifying clusters of nodes, the number of clusters is determined. A cluster is mapped into a community, then the probability distribution over interactions among users and also within and among clusters is estimated. Community learning approaches can be categorized into node-based or group-based approaches to recognize the communities. Node-based approaches are carried out based on the properties of network nodes. Since similar nodes belong to the same communities, node degree, node similarity, or node reachability are considered in this approach. While group-based approaches do not regard characteristics at the node-level and consider the characteristics and the connections of the whole group and network by recognizing balanced, robust, modular, dense, or hierarchical communities.

In this section, the selected papers with community learning approaches are reviewed. Table 9 depicts a comparison of the selected papers with community learning approaches. It includes the main ideas, advantages, disadvantages, evaluation methods, tools, and case studies along with their categories. Table 10 shows the parameters that these papers with community learning approaches have used to evaluate their methods.

Reviewing and comparing papers with community learning approaches.

Category	Ref.	Main ideas	Evaluation methods	Tools	Case studies
Node-based	( )	Presenting a multi-resolution community detection algorithm on the Hadoop platform	Real testbed	Java, Hadoop, Apache HBase	Orkut,LiveJournal,Flickr,Patents,Skitter,BerkStan,YouTube,WikiTalk,Dblp
	( )	Proposing an incremental community detection model	Real testbed	Not mentioned	DBLP (Digital Bibliography & Library Project Dataset)
	( )	Presenting big data analytics for exploratory social network analysis	Real test bed	Pajek	An electronic store with 98 employees
	( )	Presenting a cloud-based service to manage social big data	Prototype	MySQL, Apache Hadoop, GraphLab(Java), Apache Flume	Twitter
Group-based	( )	Introducing an expert finder system based on big data analytics	Prototype	Hadoop	Scholar Mate
	( )	Proposing a social based localization algorithm and OHSC model	Simulation	Java SE development	Not mentioned
	( )	Introducing a tweet ranking model	Real test bed	Not mentioned	Sina microblog
	( )	Designing a distributed community structure mining framework by using MapReduce	Real test bed	Hadoop	Large-scale artificial dataset,Real-world social media networks
	( )	Presenting a cloud-based online learning algorithm for social big data analysis	Simulation	Hadoop	Not mentioned
	( )	Proposing a parallel approach for creating a graph network	Real test bed	Hadoop	Not mentioned
	( )	Analyzing defrauding information in social networks by employing Apache Hadoop	Real test bed	Apache Hadoop, Gephi, Apache Nifi, Apache Solr	Twitter
	( )	Proposing a method to represent and manage social big data	Real test bed	Apache Hadoop, Java	The Stanford Network Analysis Project(SNAP) ego-Facebook, ego-Twitter dataset
	( )	Presenting a real-time framework for analyzing Twitter data by applying graph analysis	Real test bed	Apache Spark	Twitter,Sina Weibo, Tencent Weibo
	( )	Offering SNA method by adding semantics into nodes and edges in the weighted undirected graph	Real test bed	Not mentioned	Dow Jones Industrial Average (DJIA), Stock exchange markets (NYSE and NASDAQ)
	( )	Presenting a U-model for directed and undirected graph based on similarities	Simulation	Not mentioned	Sina Weibo, Tencent Weibo, Twitter
	( )	Offering a fuzzy logic and density-based clustering algorithm for big data analysis	Real test bed	Not mentioned	Facebook,YouTube
	( )	Analyzing entrepreneurial social big data	Real test bed	MongoDB	Twitter

An overview of the evaluation parameters in papers with community learning approaches.

Category	Ref.	Accuracy	Precision	Recall	Scalability	Time	Security	NMI	Cost	Centrality Measures	Clustering Coefficient
Node-based	( )					✓			✓
	( )	✓				✓			✓		✓
	( )					✓				✓	✓
	( )				✓

Group-based	( )	✓			✓	✓
	( )	✓					✓	✓
	( )		✓			✓
	( )	✓	✓	✓	✓			✓	✓
	( )	✓			✓		✓
	( )		✓	✓	✓	✓
	( )				✓					✓
	( )				✓	✓				✓
	( )					✓
	( )									✓	✓
	( )				✓					✓	✓
	( )				✓
	( )									✓

Aksu et al. ( 2013 ) presented a multi K-core and multi-resolution solution for social network community detection. The authors offered a distributed and scalable algorithm that ran on Apache HBase to compute K-core subgraphs for both client and server-side. The experimental results on dynamic networks indicated that despite such advantages as robustness, parallel, and distributed processing, the proposed algorithm was very costly in case of inserting and deleting edges. Wu, et al. ( Wu et al., 2018 ) presented a hash-based approach along with graph mining to discover interactions and communities among users in social media in which a trade-off between efficiency and effectiveness of incremental and time slices-based approaches was guaranteed.

Since the result of SNA helps managers in decision making for their markets, Dabas ( 2017 ) considered an electronic store with 98 employees, who were responsible for selling, maintaining, and installing mobile phones, tablets, and so on. For experiments, Pajek and different metrics of SNA like degree centrality, betweenness centrality, stress centrality, Power Centrality (PC), Information Centrality (IC), reachability matrix, and clustering coefficient were used. The social analysis informed executive managers of customers’ reactions in real-time to respond quickly if necessary, but it suffered from inadequate security of sensitive and personal data. While Yousfi et al. ( 2016 ) proposed a solution to construct the graph of social big data to enhance the semantic extraction by graph analysis.

As finding the right researcher with the best experience and knowledge is time-consuming and critical in research communities, Sun et al. ( 2015 ) presented an expert recommendation method based on topic relevance, expert quality, and researcher connectivity for experts in scientific communities. The architecture of this expert finder system contained three phases (profiling, modeling, and ranking). Large-scale computation task was supported as well as linear speed up and high accuracy. In their method, except for AHP in the ranking phase, the authors did not use other techniques as the rank aggregation model. In another study, to enhance the quality of vehicle localization in vehicular networks, Lin et al. ( 2016 ) proposed an Overlapping and Hierarchical Social Clustering (OHSC) model. The OHSC model explored the social relations between vehicles, and then classified the vehicles into different social clusters. As a result of OHSC, a Social based Localization Algorithm (SBL) was presented to support the global localization through vehicle location prediction even without the GPS devices. Although SBL had a high overall performance in the vehicle localization, the SBL algorithm had low stability and the worst performance in location error.

By increasing active users and daily tweets, users are faced with a severe problem of overloading information. To overcome ranking and recommending challenge, most micro-blogging services organize tweets in a timely order that place newer tweets at the top, but all these tweets may not be attractive to users. Kuang et al. ( 2016 ) proposed a new tweet ranking model considered three main aspects, consisting of the popularity of a tweet itself, the intimacy between the user and the tweet publisher, and the user’s interest areas. This ranking model improved tweet ranking performance; however, more indicators for ranking in analysing users’ behaviors were not considered. In order to identify all hidden communities in social media networks, Jin et al. ( 2015 ) designed a framework for community structure mining in which network partitioning process was avoided, and map equation process ran directly on MapReduce in the new framework. Instead of PageRank, the authors employed local information of nodes and their neighbors for calculating the distribution probability related to each node. The framework outperformed the previous algorithms, such as Radetal and FastGN, in accuracy, velocity, and scalability. However, the greedy search method that was applied to find an appropriate node for combining had some limitations that needed to be improved.

Additionally, Li et al. ( 2016 ) offered a distributed algorithm for data centers to handle social data to ensure privacy and guarantee the prediction accuracy improvement in real-time. Further, Paik et al. ( 2017 ) presented an effective service discovery through the creation of a graph-based algorithm based on MapReduce and parallel programming. In ( Karimi et al., 2018 ), Twitter data were analysed, and the degree centrality was calculated to investigate deceiving information based on a parallel approach. Leung and Zhang ( 2016 ) offered a novel method to represent and manage social big data. They employed graph mining approaches in directed, bi-directed, undirected, and bipartite graphs for analyzing and mining social big data in distributed settings. In ( Sharma, 2018 ), researchers designed a framework to analyse real-time Twitter hashtags by employing hashtag co-occurrence graph and connected components algorithm. Moreover, Du ( 2018 ) developed a high-frequency pair trading algorithm to perform semantic analysis on a weighted undirected graph by employing SNA approaches along calculating centrality parameters in a stock market.

Since similar nodes are usually placed in the same cluster, in ( Wang et al., 2017 ), a U-model was introduced for directed and undirected graphs based on similarity, which could define social big data characteristics, clustering coefficient, degree, and distance distribution accurately. In order to analyse the conversation in a social network, Ghosh et al. ( 2016 ) offered a new algorithm utilizing fuzzy methodology and density-based clustering on social clouds. This study was applied to examine the rate of users’ participations to find the popularity of the subject under discussion. Besides, this algorithm could have been developed towards more heuristic-based graph mining and put a benchmark towards heuristic optimization. Further, to represent the structures of network communities, Wang et al. ( 2017 ) digitally analysed Twitter’s data about diverse actors involved in entrepreneurial networks by applying the Clauset-Newman-Moore algorithm. The counties that were in the same cluster had stronger internal interactions than those in different clusters, but this research did not analyse entrepreneurial networks on Twitter data and in case of lacking the participation of users in low population regions of the country.

5. Analysis of results

The results of this systematic review are analysed in this section. Section 5.1 presents an overview of the selected papers. Since the goal of this review is to highlight the differences, advantages, and disadvantages of various big data analytic approaches in social networks, a discussion of the mentioned classification is outlined in Section 5.2 .

5.1. Overview of the selected studies

The following complementary questions are defined to explore the state-of-the-art on big data analytic approaches applied in social networks.

• Which publishers have published most papers on big data analytic approaches applied in social networks?
• How was the distribution of publishers and studies per year on big data analytic approaches applied in social networks?
• How was the distribution of studies per publication channel on big data analytic approaches applied in social networks?

In this section, the distribution of 74 papers reviewed in Section 4 —categorized by publishers, the year of publication, the number of papers by year, and the percentage of papers classified by publishers—is shown in Fig. 5 , Fig. 6 , Fig. 7 , respectively. Fig. 5 , which states the papers over time, indicates that ScienceDirect, and Inderscience, have published papers in this field since 2015. IEEE, Springer, and ScienceDirect have provided the highest number of papers in this area, respectively. Also, Emerald and Taylor&Francis have presented the least number of papers. Fig. 6 shows that most papers in this subject were published in 2017 and 2019. Fig. 7 illustrates the classification of papers among nine publishers, out of which IEEE and Springer have provided 37% and 27% of the papers, respectively. 19% of the total papers were related to ScienceDirect, while, ACM, Inderscience, and SAGE publishers had 4% of the papers each. Also, 3% of the papers were published by Wiley. Additionally, Taylor&Francis, and Emerald, had 1% of the reviewed papers each.

An external file that holds a picture, illustration, etc.
Object name is gr5_lrg.jpg

The number of the studied papers categorized by publishers and years.

An external file that holds a picture, illustration, etc.
Object name is gr6_lrg.jpg

The number of the studied papers by years.

An external file that holds a picture, illustration, etc.
Object name is gr7_lrg.jpg

Percentage of the studied papers categorized by the publishers.

In Table 11 , we demonstrate the distribution of publication channel that published more than one paper among 74 studied papers. Table 11 depicts that 23 papers were published in IEEE Access (IF = 3.745), TMM (IF = 5.452), IJIM (IF = 8.210), IMMGT (IF = 4.695), FGCS (IF = 6.125), MTAP (IF = 2.313), WPC (IF = 1.061), I4C, and IEEE Big Data.

Distribution of the studies per publication channel.


		IEEE Access	4
		IEEE Transactions on Multimedia (TMM)	2
		International Journal of Information Management (IJIM)	3
		Industrial Marketing Management (IMMGT)	2
		Future Generation Computer Systems (FGCS)	2
		Multimedia Tools and Applications (MTAP)	3
		Wireless Personal Communications (WPC)	2

		IEEE International Conference on Big Data (Big Data Congress) (IEEE Big Data)	3
		International Conference on Circuits, Controls, Communications and Computing (I4C)	2

5.2. Research objectives, approaches, and evaluation parameters

The reviewed studies were studied and classified according to various characteristics to answer some of the research questions listed in Section 3.1 , as explained below:

Big data analysis has many applications in social networks and is performed in various ways. As it was stated earlier, selected papers were reviewed, and big data analytic approaches in social networks were described in two main categories based on their analysis method: Content-oriented approaches, and network-oriented approaches. In content-oriented approaches, user-generated posts are analysed with the aid of lexical codes, linguistic codes, and statistical tools. Meanwhile, network-oriented approaches considered nodes or users and their relations for big social analysis. Also, the interaction between social group members and the relationship between group members and people outside the group are discovered. We categorized content-oriented approaches into two groups, topical learning and opinion/sentiment learning, and network-oriented approaches into two groups: Embedding learning and community learning.

Fig. 8 represents the percentage of social big data analytic techniques in reviewed papers based on Fig. 4 . Fig. 8 shows that the content-oriented approaches have the highest percentage (51%) in which topical learning and opinion/sentiment learning comprise 27% and 24% of the studied papers in the literature, respectively. Further, 49% of the papers are network-oriented approaches out of which 26% and 23% of the papers are related to embedding learning and community learning, respectively. The main properties of the selected papers reviewed were shown in Table 3 , Table 5 , Table 7 , Table 9 . The selected papers were evaluated based on critical parameters such as accuracy, scalability, precision, recall, F-measure, cost, and time. The advantages and disadvantages of the discussed taxonomy are summarized in Table 12 based on Table 3 , Table 5 , Table 7 , Table 9 . As specified in Table 12 , the main focus of researchers in content-oriented approaches are on some parameters such as accuracy, precision, recall, and time. This table also illustrates that accuracy and scalability are enhanced in network-oriented approaches, but privacy and security are not considered by most researchers. Moreover, findings have shown that since manipulating community-based features is challenging and not user-controlled, and extracting these features requires an in-depth analysis of a large and complex social community, which has high complexity and requires plenty of resources, community learning approaches have high costs. Besides, according to Table 12 , security and privacy-preserving are still the main drawbacks of community learning approaches.

An external file that holds a picture, illustration, etc.
Object name is gr8_lrg.jpg

Percentage of social big data analytic techniques in the selected papers.

A summarization of the advantages and disadvantages of the discussed taxonomy.


Content-oriented approaches
Content-oriented approaches

Network-oriented approaches
Network-oriented approaches

In this study, reviewed papers have been evaluated by various evaluation parameters, which were presented in Table 4 , Table 6 , Table 8 , Table 10 . Fig. 9 , illustrates the parameters used by researchers to evaluate the techniques and methods applied in reviewed papers. The results of the provided comparison in Fig. 9 show that 20% of the studies have enhanced accuracy, 16% of them have reduced time, and 12% of the studies have assessed scalability. Recall, precision, F-measure, and cost were also important among parameters. Based on the mentioned parameters, the percentage of each parameter was computed using (1) ( Hamzei and Navimipour, 2018 ). This equation means that the number of each occurrence was counted and divided by the sum of the whole number of occurrences, then the answer was multiplied by 100 (Eq. (1) ).

An external file that holds a picture, illustration, etc.
Object name is gr9_lrg.jpg

Percentage of evaluation parameters in the selected papers.

Fig. 10 indicates that in topical learning approaches, researchers focused on accuracy (23%) and recall (15%), while in opinion/sentiment learning approaches, accuracy (31%) and F-measure (18%) are the crucial ones. The significant parameters in embedding learning approaches were time and cost by 23% and 16%, respectively. To say more, 20% of the papers with community learning approaches have optimized scalability and 18% of them have reduced time, so the results showed that accuracy is essential in most approaches; however, privacy, reliability, and security are somewhat neglected in these approaches.

An external file that holds a picture, illustration, etc.
Object name is gr10_lrg.jpg

Percentage of evaluation parameters in each approach of the selected papers.

Some of the papers did not mention any tools for analyzing and implementing the intended approaches. According to tool columns in Table 3 , Table 5 , Table 7 , Table 9 , along with python programming language, Hadoop was the top used tool in 74 research studies of social network analysis. The high frequent application of Hadoop is due to its open-source libraries for distributed and parallel processing of large datasets, cost-effective, big storage, reliability, scalability, and handling unstructured and semi-structured data.

Fig. 11 demonstrates the social big data analysis applications of the reviewed papers, along with their percentage of repetitions. The results showed that, in the reviewed papers, the business and decision making, and parsing and sentiment analysis platform had the highest applications with 19% each. Along with these two applications, health care (15%) was a significant application of big social data analysis in studied papers.

An external file that holds a picture, illustration, etc.
Object name is gr11_lrg.jpg

Percentage of social big data analysis applications in the studied papers.

Selected studies have used various datasets to evaluate their approaches for analyzing the results of experiments. Based on the findings shown in Fig. 12 , most of the researchers used Twitter. In addition to Twitter, the most significant percentage of the usage of datasets belongs to Sina microblog and Facebook.

An external file that holds a picture, illustration, etc.
Object name is gr12_lrg.jpg

Repetition of used datasets and case studies in the selected papers.

Based on Table 3 , Table 5 , Table 7 , Table 9 , which have depicted the evaluation methods applied in each approach, there were five evaluation methods in the reviewed papers: Simulation, prototype, data sets, real testbed, and example application. As shown in Fig. 13 , 42% of assessments were related to data sets, while 35% of them were associated with real testbed. Lucidly, simulation dedicated 19% in itself. Fig. 14 , displays the repetition of evaluation methods in each learning approach. The comparative results illustrate that in topical and opinion/sentiment learning, most evaluation methods are data sets. ML algorithms and data sets were widely used in semantic analysis and incorporated many ideas and innovations into social networks, welcoming virtual world users and social network growth; however, in community learning approaches, the real testbed has the highest usage in most evaluations. Finally, real testbed and simulation cover most of the evaluations for embedding learning approaches.

An external file that holds a picture, illustration, etc.
Object name is gr13_lrg.jpg

Percentage of evaluation methods in the selected papers.

An external file that holds a picture, illustration, etc.
Object name is gr14_lrg.jpg

Repetition of evaluation methods in each approach in the selected papers.

6. Open issues and future directions

Given the vast quantity of live social media streams and their impact on society, many techniques have been proposed to collect and analyse live UGC to support various applications. The techniques studied in this paper assist us in gaining insights into social data via big data analytics. The presented systematic literature is a good starting point to reveal open challenges. However, content-oriented and network-oriented approaches still face many vital challenges as mentioned below:

• The extensive usage of social media has resulted in the advancement of many disciplines and industries in which healthcare is one vertical application that has attracted much attention. Fig. 11 demonstrates an increasing tendency towards healthcare systems along with other domains. Patients join different social media groups, sharing experiences, describing their illness, and the treatment process. Social platforms provide patients with emotional supports from peers with similar conditions. The first-hand experiences and comments from other members in the network are invaluable sources for making informed decisions, especially for those with chronic conditions ( Akbari et al., 2019 , Akbari et al., 2018 ). Further, healthcare professionals also utilize social media to share healthcare, psychology, and medical information and to interact with their peers as well as patients ( Nie et al., 2014 ).

In this respect, public care organizations can start-up social health networks for diagnosing and preventing the spread of contagious disease in various geographical locations at different times by exploring public health posts in various social networks ( Elkin et al., 2017 ). On the other hand, by analyzing the graph of interactions between users on social networks and examining influential users, nodes with multiple edges have been identified, so by limiting and quarantining them, the transmission rate of contagious disease can be forecasted, which allows us in better decision making to control infectious ailments. This would ultimately lead to a notable reduction in healthcare costs ( Zadeh et al., 2019 ).

They can also track the origin of diseases, the transmission of diseases from generation to generation, the effects of drugs, and their interactions in different diseases ( Thorstad and Wolff, 2019 ). This helps the pharmaceutical industry as well as healthcare promotion and health disorder diagnosis. One of the limitations of the current work in this area is that the nodes and their relations were considered static over time. Considering and analyzing the network in real-time and the dynamic interaction among nodes are still open issues that can achieve more accurate predictions. Most researchers also have studied social influence and information diffusion in a particular platform; analyzing information diffusion and social influence across multiple platforms simultaneously can also be a challenge in the future. However, among the reviewed literature, there were few papers on political and e-commerce applications, so these two issues are good topics for future research.

• In case of a vast number of data sources, another challenge is enhancing accuracy to improve services and predictions in various social network applications. For example, in social networking services, users frequently publish about themselves via status updates, photos, videos, self-description, and interests. Some of the recommendation and prediction systems predict the users’ personalities by considering the users’ profile data. On the other hand, some people keep some of their personal information private, or some users deliberately create fake accounts or fake information such as birth date, location, occupation, and status to increase the number of followers or get more likes; the available data may be fake or cannot be achieved due to privacy concerns, so the result of prediction is not accurate. Further, user profiling would be an essential aspect of social networking services to attest accurate prediction and recommendation ( Akbari and Chua, 2017 , Akbari et al., 2017 ).
• The ever-increasing volume of social media data has led to the distribution of files in various physical locations. A key future direction is to investigate factors such as network traffic, data locality, latency, high-level runtime of feature extraction, and clustering users . Despite the fact that enhancing the speed of feature extraction has been considered by a limited number of papers ( Hsu et al., 2017 ), other challenges are still unsolved.
• Conspicuously, due to the high volume of data and the rapid growth of contents produced in social platforms, scalability is still a key factor to determine the effectiveness of social network analysis frameworks. The scalability issue includes handling an immense number of users, updating users’ profiles and status, internal network traffic, as well as data storage and database management, so the expanding infrastructure, infrastructure management, and operational costs can affect the scalability challenge. Although some papers have proposed algorithms or methods to increase scalability in their approaches ( Sachar and Khullar, 2017 , Feng et al., 2018 , Aa et al., 2015 ), others implement their approaches on small scale datasets; hence, it is still a significant challenge.
• The Internet has increased the growth of social networks to connect people and make it easier for them to find friends and share multimedia information, such as photos, videos, which are considered big data in social networks. With the increasing likelihood of cyber-attacks or malicious users, there is a risk of personal data being misused. A limited number of studies made efforts to solve this issue ( Zhou et al., 2016 ); hence, offering novel approaches to ensure the privacy-preserving of social network users to secure photos, videos, sensitive personal data, and profiles, without crippling the utility of social media data, is a crucial challenge for future research.
• Due to the streaming nature of social data, both collecting and analyzing real-time data from various sources can assist the organization of customers’ tweets, blog posts, and status updates. It allows organizations to track and answer customers’ updates and comments as soon as possible. Some papers debated this challenge ( Sayed et al., 2020 , Lee and Paik, 2017 , Rodrigues and Chiplunkar, 2019 ), but, unlike Twitter Streaming API, Facebook’s graph API does not provide any real-time streaming access. The analytic approach should be able to investigate social media platforms in real-time, which leads to real-time results, so the real-time nature of social data is still an appealing challenge.
• Predictive analytics is another interesting direction that still remains as an open and challenging task. The key challenges which were focused on in ( Yang et al., 2015 ) are aggregating data, extracting high dimensionality features, and building a model that can predict future events. Considerable amounts of data that are produced by users in social networks represent the views, suggestions, and thoughts of users in the form of texts, images, and videos, which may be high-quality or low-quality. As these data come from grass roots users with informal and unconstructed formats, social data are popular as noisy sources of information. Low quality, out of date, or incorrect data can lead to wrong or inaccurate analytics results; therefore, in addition to extracting high-quality information from a variety of sources, it is also essential to prevent the flow of misinformation. Although ML and data mining permit us to reduce the impact of low-quality data, it still cannot assure the proper quality of data. Besides, the modeling process should be repeatable to ensure and extract meaningful relationships among data. Without a useful model, the predictive system cannot produce satisfactory results; therefore, data quality and modeling are two engrossing directions for future works in predictive analysis.
• From the sentiment analysis aspect, the following key challenges are still open to be addressed:
o Domain dependency : As sentiment analysis is a domain-dependent task in which the polarity of some words and phrases vary from one domain to the other; thus, a classifier trained for a specific domain may fail to perform well on other domains.
o The rare-resource languages : Most of the resources of sentiment analysis are only built for English language. There is no sufficient corpus for such languages as Chines, French, Hindi, Spanish, and so on. The bottleneck of performing opinion/sentiment analysis is the scarcity of predefined dictionaries and tools for various languages.
o Detecting sarcasm : Since sentiment analysis classifies texts as positive, negative, or neutral, another challenging issue in sentiment analysis is detecting sarcasm. It refers to sentences that have negative meanings despite the use of positive sentiment-bearing words. In other words, the meaning is just the opposite. It is a challenging task for a system to identify sarcastic sentences. Researchers should allocate their attention to find innovative approaches to analyse sarcasm in the sentiment of social big data analysis.
o Detecting slang : Most of the people use slang to express their feelings and, as slang words contain extreme sentiments, detecting slang words is a serious problem.
o Heterogeneous nature of data : The sentiment classifiers should work effectively and handle the diverse types of data from various data sources.
o Unreliable and incomplete data : Users usually use abbreviations in a social network. Social network data may contain a lot of noise and misspellings; the sentiment classification of these data is not accurate; therefore, sentiment classifiers should be able to predict incomplete information to have a more accurate prediction.
o Semantic relations in multiple data sources : Different social networks such as Twitter, Facebook, Instagram, and YouTube may discuss the same topic. Researchers in studied papers, investigate data only on a single social media, so the analysis of an event from various social media is a challenge that can offer better insights for the task of sentiment analysis and its model creation.
o Subjectivity detection : Regarding the personality of a user or his political views, a text may be neutral to one person, but not for the other, so a sentence may have a different interpretation.
o Spam detection : Spammers or fake users try to post fake reviews and to mislead other readers, so detecting these spams among posts is a significant challenge.

Many researchers try to mitigate a limited number of these challenges ( Sun et al., 2018 , Kauffmann et al., 2019 , Jimenez-Marquez et al., 2019 ), but they failed to achieve high accuracy, so most of these challenges in sentiment analysis have not yet been resolved, and further research is needed.

• Finally, a few number of the studied papers did not test their approaches on real datasets of social networks. Unlike users’ typical sentences, specialized sentences were not technically analysed. Also, specific vocabulary is used for particular platforms, e.g., the use of slang terms, which makes analysis very specific to each platform; therefore, it is another research direction, and further studies may test various social networks and real datasets of social networks. More experiments can be performed to increase the performance of social big data analytic approaches in the future.

7. Threats to validity and limitations

This SLR presents a taxonomy and a comparison of big data analytics in social networks. These types of review papers usually have constraints ( Brereton et al., 2007 ), but the results of SLRs are mainly reliable ( Zhang and Babar, 2013 ). The major limitations and threats to the validity of this SLR are discussed below.

• The scope of the research: In the paper selection process, only academic journals and conferences were considered. Furthermore, national conferences and journals, non-English papers, book chapters, and review papers were neglected.
• Study and publication bias: These nine electronic publishers offer the most related and valid papers; some of them were neglected via the paper selection process; therefore, the selection of all related papers cannot be guaranteed.
• Study queries: This paper is proposed according to seven questions, which were defined to find their answers. Other researchers may add some other questions.
• Taxonomy: The reviewed papers were classified into two main categories based on analysis methods: Content-oriented and network-oriented approaches, but it can be categorized otherwise.
• Simulation: The reviewed papers were not simulated.
• Time range: Only papers from 2013 to August 2020 were reviewed, and those before 2013 were not considered.

As a matter of fact, by defining a review protocol, following a systematic procedure, and the involvement of various researchers, this SLR has high validity.

8. Conclusion

This paper presents a systematic review of big data analytics in social networks. We explained the research methodology, paper selection process, and selected 74 papers between 2013 and August 2020, from among 785 papers in our search query. A significant number of the studied papers were related to IEEE, Springer, and ScienceDirect journals, with 37%, 27%, and 19%, respectively. On the other hand, each of Taylor&Francis and Emerald publishers with 1% had the lowest number of published papers. From these studies, 74 papers were categorized into two approaches: Content-oriented approaches (51%) and network-oriented approaches (49%). Besides, the main ideas, advantages, disadvantages, evaluation methods, tools, and evaluation parameters of each studied paper were discussed. It was found that the most widely considered evaluation parameters were accuracy (20%), time (16%), and scalability (12%), but privacy, reliability, and security measures were somewhat neglected. Considering the applied tools, it is observed that, in the selected studies, along with Python programming language, Hadoop was used more than other tools. Concerning the outcome of this SLR, the existing social big data analytic approaches have inadequate capability to guarantee privacy-preserving and scalability and have faced several open issues such as latency, real-time processing, and high run-time of feature selection. Lucidly, the most unresolved challenges are various aspects of opinion/sentiment analysis such as domain dependency, the rare resource languages, detecting sarcasm and slangs, subjectivity detection, and multiple data sources. We hope that the findings of this paper will assist researchers to propose novel contributions to overcome social big data challenges.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

We are grateful for the insightful and constructive comments offered by Dr. Mohammad Akbari, and also appreciate anonymous reviewers for their precious comments which improved the final version.

1 https://scholar.google.com

2 https://link.springer.com

3 https://ieeeexplore.ieee.org

4 https://www.sciencedirect.com

5 https://online.sagepub.com

6 https://www.tandfonline.com

7 https://onlinelibrary.wiley.com

8 https://www.emeraldinsight.com

9 https://dl.acm.org

10 https://www.inderscienceonline.com

Appendix A. List of evaluation parameters and their description

Evaluation parameter	Description and formula
Confusion matrix	For a binary classifier, four possible outputs of the confusion matrix are defined as below: (TP): The number of correctly positive predictions (TN): The number of correctly negative predictions (FP): The number of predictions that are labelled positive incorrectly (FN): The number of predictions that are labelled negative incorrectly
Accuracy	Accuracy in the social network refers to the degree of similarity between the actual structure of a relationship and the individuals’ perceptions of the structure of the same relationship in a particular social media ( ). In other words, it is the number of correctly predicted observations over the total number of observations.
Accuracy
Precision	Precision focuses on false positives and is the number of correctly predicted positive observations over the total predicted positive observations. Indeed, it is the ability of the model not to label a negative sample as a positive.
Precision
Recall (Sensitivity or TPR)	Recall is the fraction of correctly predicted positive observations by a proposed model among all positive observations in the actual class of the dataset. Intuitively, it is the ability of a classifier to discover all positive samples correctly.
Recall (Sensitivity or TPR)
F-measure	F-measure is a harmonic mean of precision and recall to identify if a presented model reaches the objective of high precision and recall at a time. Since it is a weighted average of precision and recall and takes both FP and FN into account, it can be applied for measuring the efficiency of the model in many domains.
F-measure
Specificity (TNR)	Specificity shows what the proportion of negative observations is predicted correctly.
Specificity (TNR)
ROC (AUC)	ROC curve is illustrated graphically to show the trade-off between sensitivity (TPR) on Y-axis and (1-specificity) (FPR) on X-axis for every possible threshold value. The area under the curve refers to AUC that is applied to determine the ability of a classifier in distinguishing positive and negative classes. The higher the AUC, the better the performance of a classifier is.
Kappa coefficient	Kappa is an inter-rater reliability measure to evaluate the agreement between two raters. In other words, it shows how closely the observations classified by a classifier are in agreement with the data labeled as ground truth. It can be calculated by this formula:
Kappa coefficient
Matthews Correlation Coefficient (MCC)	It evaluates the correlation between the observed and predicted classifications of an instance. The formula of the MCC is:
Matthews Correlation Coefficient (MCC)
Clustering coefficient	Clustering coefficient indicates how much each node is willing to create clusters in a network. There are two types of clustering coefficients: the local clustering and the global clustering. The local refers to the embeddedness of every single node, while the global refers to an overall indication of clustering in the network. A clustering parameter is a real number between zero and one ( , ). When there are no clusters, this coefficient is equal to zero, and in case of disjoint cliques, in which the maximal clustering occurs, it is equal to one ( , ).
Security	Security refers to the requirements that the system needs to protect against potential attacks, threats, unauthorized access, and privacy-preserving issues ( ).
Scalability	Scalability means the ability of a social network to expand in case of rising demand for processors, networks, or file system resources. Scalability consists of two categories, as follows: It refers to the addition of new hardware instead of increasing the capability of the existing hardware. It can be performed by adding resources, or powerful hardware to (or removing resources from) a system like adding CPU or RAM to a single system node or a single computer.
Time	In this paper, all the factors related to time, such as execution time, average response time, statistical analysis time (starting time), delay, and running time are considered as the time factor.
Normalized mutual information (NMI)	NMI is an information theoretic-based measure that can be used to assess the quality of clustering to compare community detection methods. This measure compares different clusters, and whenever its value is high, it means that the two clusters are similar ( ). If clusters X and Y are precisely the same, their NMI is equal to one ( ).
Cost	The price of acquiring, producing, performing, or maintaining the requested service
Influence diffusion	This measure shows how one person’s actions affect other people in a network ( ); it shows how many users are affected by the most influential users in the network.
Centrality measures	In the context of web information retrieval, using centrality measures is a vital task in community analysis ( ). By using centrality measures, researchers try to answer the question “who is the most important, impressive, or central person in the network?” ( ). Some of the popular centrality measures are discussed below: : It means the degree and the number of neighbors of a node and is computed by the number of direct links to a node. In the undirected graph, the more central the node is, the higher the degree will be ( ). In a digraph, there are two types of this measure, in-degree, which refers to the number of inbound links to a node, and out-degree, which is the number of outbound links of a node ( ). : Closeness centrality that calculates the shortest path among all nodes and is defined for a node V as the inverse of the distance (Eq. ). In other words, closeness means the length of time it takes to transfer information from one node to all other nodes ( ).

	: It refers to the number of times a node is placed among the shortest paths of other nodes, that is, after identifying all the shortest paths, the number of paths in which a given node is located is counted ( ). : Eigenvector centrality is different from in-degree centrality, referring to the importance of each node of the graph. A node with high in-degree centrality does not necessarily have a high eigenvector centrality and vice versa ( ), so this parameter shows the important nodes that influence the entire network ( ). : PageRank is calculated to determine the importance of the node by considering the degree and quality of the nodes. It focuses on the centrality of linkers, link directions, and their weights ( ). It is a recursive measure where the value for one node grows with the PageRank of its neighbors weighted by the reciprocal of their degrees. It can be thought of as the probability of visiting a node under the random surfer model ( ).

Arora A., Bansal S., Kandpal C., Aswani R., Dwivedi Y. Measuring social media influencer index-insights from facebook, Twitter and Instagram. J. Retail. Cons. Serv. 2019; 49 :86–101. [ Google Scholar ]
Lai W.K., Chen Y.U., Wu T.-Y. Analysis and evaluation of random-based message propagation models on the social networks. Comput. Netw. 2020; 170 [ Google Scholar ]
Alalwan A.A., Rana N.P., Dwivedi Y.K., Algharabat R. Social media in marketing: A review and analysis of the existing literature. Telematics Inform. 2017; 34 (7):1177–1190. [ Google Scholar ]
R. Kumar, J. Novak, and A. Tomkins, Structure and evolution of online social networks. In Link mining: models, algorithms, and applications: Springer, 2010, pp. 337–357.
Feng Y., Zhou P., Wu D., Hu Y. Accurate content push for content-centric social networks: A big data support online learning approach. IEEE Trans. Emerg. Top. Comput. Intell. 2018; 99 :1–13. [ Google Scholar ]
Heidemann J., Klier M., Probst F. Online social networks: A survey of a global phenomenon. Comput. Netw. 2012; 56 (18):3866–3878. [ Google Scholar ]
Busalim A.H. Understanding social commerce: A systematic literature review and directions for further research. Int. J. Inf. Manage. 2016; 36 (6):1075–1088. [ Google Scholar ]
Bello-Orgaz G., Jung J.J., Camacho D. Social big data: Recent achievements and new challenges. Inf. Fusion. 2016; 28 :45–59. [ PMC free article ] [ PubMed ] [ Google Scholar ]
M. Jamali and H. Abolhassani, Different aspects of social network analysis. In Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on, 2006, pp. 66–72: IEEE.
Martinez-Rojas M., del Carmen Pardo-Ferreira M., Rubio-Romero J.C. Twitter as a tool for the management and analysis of emergency situations: A systematic literature review. Int. J. Inf. Manage. 2018; 43 :196–208. [ Google Scholar ]
Cetto A., Klier M., Richter A., Zolitschka J.F. “Thanks for sharing”—Identifying users’ roles based on knowledge contribution in Enterprise Social Networks. Comput. Netw. 2018; 135 :275–288. [ Google Scholar ]
Go E., You K.H. But not all social media are the same: Analyzing organizations’ social media usage patterns. Telematics Inform. 2016; 33 (1):176–186. [ Google Scholar ]
[13] L. Manovich, Trending: The promises and the challenges of big social data. In Debates in the digital humanities, vol. 2, pp. 460–475, 2011.
Lomborg S., Bechmann A. Using APIs for data collection on social media. Inf. Soc. 2014; 30 (4):256–265. [ Google Scholar ]
F. B. Abdesslem, I. Parris, and T. Henderson, Reliable online social network data collection. In Computational Social Networks: Springer, 2012, pp. 183–210.
Otte E., Rousseau R. Social network analysis: a powerful strategy, also for the information sciences. J. Inf. Sci. 2002; 28 (6):441–453. [ Google Scholar ]
Cross R., Borgatti S.P., Parker A. Making invisible work visible: Using social network analysis to support strategic collaboration. Calif. Manage. Rev. 2002; 44 (2):25–46. [ Google Scholar ]
Parveen F., Jaafar N.I., Ainin S. Social media usage and organizational performance: Reflections of Malaysian social media managers. Telematics Inform. 2015; 32 (1):67–78. [ Google Scholar ]
Boyd D., Crawford K. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 2012; 15 (5):662–679. [ Google Scholar ]
A. Katal, M. Wazid, and R. Goudar, Big data: Issues, challenges, tools and good practices. In Contemporary Computing (IC3), 2013 Sixth International Conference on, 2013, pp. 404–409: IEEE.
Terrazas G., Ferry N., Ratchev S. A cloud-based framework for shop floor big data management and elastic computing analytics. Comput. Ind. 2019; 109 :204–214. [ Google Scholar ]
Canito J., Ramos P., Moro S., Rita P. Unfolding the relations between companies and technologies under the Big Data umbrella. Comput. Ind. 2018; 99 :1–8. [ Google Scholar ]
di Bella E., Leporatti L., Maggino F. Big data and social indicators: Actual trends and new perspectives. Soc. Indic. Res. 2018; 135 (3):869–878. [ Google Scholar ]
Hadi M.S., Lawey A.Q., El-Gorashi T.E., Elmirghani J.M. Big data analytics for wireless and wired network design: A survey. Comput. Netw. 2018; 132 :180–199. [ Google Scholar ]
Gandomi A., Haider M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manage. 2015; 35 (2):137–144. [ Google Scholar ]
Kitchin R. The real-time city? Big data and smart urbanism. GeoJournal. 2014; 79 (1):1–14. [ Google Scholar ]
S. Sagiroglu and D. Sinanc, Big data: A review. In Collaboration Technologies and Systems (CTS), 2013 International Conference on, 2013, pp. 42–47: IEEE.
Pei F.-Q., Li D.-B., Tong Y.-F. Double-layered big data analytics architecture for solar cells series welding machine. Comput. Ind. 2018; 97 :17–23. [ Google Scholar ]
Peng S., Wang G., Zhou Y., Wan C., Wang C., Yu S. An immunization framework for social networks through big data based influence modeling. IEEE Trans. Dependable Secure Comput. 2017 [ Google Scholar ]
Duan Y., Edwards J.S., Dwivedi Y.K. Artificial intelligence for decision making in the era of Big Data–Evolution, challenges and research agenda. Int. J. Inf. Manage. 2019; 48 :63–71. [ Google Scholar ]
Brereton P., Kitchenham B.A., Budgen D., Turner M., Khalil M. Lessons from applying the systematic literature review process within the software engineering domain. J. Syst. Softw. 2007; 80 (4):571–583. [ Google Scholar ]
B. Kitchenham and S. Charters, Guidelines for performing systematic literature reviews in software engineering, 2007.
Jamshidi P., Ahmad A., Pahl C. Cloud migration research: A systematic review. IEEE Trans. Cloud Comput. 2013; 1 (2):142–157. [ Google Scholar ]
Jatoth C., Gangadharan G., Buyya R. Computational intelligence based QoS-aware web service composition: A systematic literature review. IEEE Trans. Serv. Comput. 2015; 10 (3):475–492. [ Google Scholar ]
Yaqoob I. TEMPORARY REMOVAL: Information fusion in social big data: Foundations, state-of-the-art, applications, challenges, and future research directions. Int. J. Inf. Manage. 2016 [ Google Scholar ]
Ghani N.A., Hamid S., Hashem I.A.T., Ahmed E. Social media big data analytics: A survey. Comput. Hum. Behav. 2018 [ Google Scholar ]
Bukovina J. Social media big data and capital markets—An overview. J. Behav. Exp. Finance. 2016; 11 :18–26. [ Google Scholar ]
M. E. Martin and N. Schuurman, Social media big data acquisition and analysis for qualitative GIScience: challenges and opportunities. Ann. Am. Assoc. Geogr., pp. 1–18, 2019.
M. Arnaboldi, C. Busco, and S. Cuganesan, Accounting, accountability, social media and big data: revolution or hype? Acc. Audit. Account. J., 2017.
Peng S., Wang G., Xie D. Social influence analysis in social networking big data: Opportunities and challenges. IEEE Netw. 2016; 31 (1):11–17. [ Google Scholar ]
I. Guellil and K. Boukhalfa, Social big data mining: A survey focused on opinion mining and sentiments analysis. In 2015 12th International Symposium on Programming and Systems (ISPS), 2015, pp. 1–10: IEEE.
S. Gole and B. Tidke, A survey of big data in social media using data mining techniques. In 2015 International Conference on Advanced Computing and Communication Systems, 2015, pp. 1–6: IEEE.
P. V. Paul, K. Monica, and M. Trishanka, A survey on big data analytics using social media data. In 2017 Innovations in Power and Advanced Computing Technologies (i-PACT), 2017, pp. 1–4: IEEE.
Sebei H., Taieb M.A.H., Aouicha M.B. Review of social media analytics process and Big Data pipeline. Social Netw. Anal. Min. 2018; 8 (1):30. [ Google Scholar ]
Al-Garadi M.A. Predicting cyberbullying on social media in the big data era using machine learning algorithms: Review of literature and open challenges. IEEE Access. 2019; 7 :70701–70718. [ Google Scholar ]
O. Lerena, F. Barletta, F. Fiorentin, D. Suárez, and G. Yoguel, Big data of innovation literature at the firm level: a review based on social network and text mining techniques. Econ. Innov. New Technol., pp. 1–17, 2019.
Kitchenham B., Brereton O.P., Budgen D., Turner M., Bailey J., Linkman S. Systematic literature reviews in software engineering–A systematic literature review. Inf. Softw. Technol. 2009; 51 (1):7–15. [ Google Scholar ]
Rahimi M., Songhorabadi M., Kashani M.H. Fog-based smart homes: A systematic review. J. Netw. Comput. Appl. 2020 [ Google Scholar ]
Haghi Kashani M., Rahmani A.M., Jafari Navimipour N. Quality of service-aware approaches in fog computing. Int. J. Commun. Syst. 2020 [ Google Scholar ]
C. Calero, M. F. Bertoa, and M. Á. Moraga, A systematic literature review for software sustainability measures. In 2013 2nd international workshop on green and sustainable software (GREENS), 2013, pp. 46–53: IEEE.
Aznoli F., Navimipour N.J. Deployment strategies in the wireless sensor networks: systematic literature review, classification, and current trends. Wireless Pers. Commun. 2017; 95 (2):819–846. [ Google Scholar ]
Yang M., Kiang M., Shang W. Filtering big data from social media–Building an early warning system for adverse drug reactions. J. Biomed. Inform. 2015; 54 :230–240. [ PubMed ] [ Google Scholar ]
Aa V., Shekhara V.S., Jb R., Aggrawalb T., Balasubramanya K., Murthya S.N. Cloud based big data analytics framework for face recognition in social networks using machine learning. Procedia Comput. Sci. 2015; 50 :623–630. [ Google Scholar ]
Moessner M., Feldhege J., Wolf M., Bauer S. Analyzing big data in social media: Text and network analyses of an eating disorder forum. Int. J. Eat. Disord. 2018 [ PubMed ] [ Google Scholar ]
Cheung M., She J., Jie Z. Connection discovery using big data of user-shared images in social media. IEEE Trans. Multimedia. 2015; 17 (9):1417–1428. [ Google Scholar ]
N. Straton, R. R. Mukkamala, and R. Vatrapu, Big social data analytics for public health: Predicting facebook post performance using artificial neural networks and deep learning. In 2017 IEEE International Congress on Big Data (BigData Congress), 2017, pp. 89–96: IEEE.
P. Sachar and V. Khullar, Social media generated big data clustering using genetic algorithm. In 2017 International Conference on Computer Communication and Informatics (ICCCI), 2017, pp. 1–6: IEEE.
A. Vakali, N. Kitmeridis, and M. Panourgia, A distributed framework for early trending topics detection on big social networks data threads. In INNS Conference on Big Data, 2016, pp. 186–194: Springer.
Huo Y., Ma L., Zhong Y. A Big Data privacy respecting dissemination method for social network. J. Signal Process. Syst. 2018; 90 (4):467–475. [ Google Scholar ]
A. H. Zadeh, H. M. Zolbanin, R. Sharda, and D. Delen, Social media for nowcasting flu activity: Spatio-temporal big data analysis. Inf. Syst. Front., pp. 1–18, 2019.
Xylogiannopoulos K.F., Karampelas P., Alhajj R. A password creation and validation system for social media platforms based on big data analytics. J. Ambient Intell. Hum. Comput. 2020; 11 (1):53–73. [ Google Scholar ]
Subroto A., Apriyana A. Cyber risk prediction through social media big data analytics and statistical machine learning. J. Big Data. 2019; 6 (1):50. [ Google Scholar ]
D. Makaroğlu, A. Çakır, and K. Kocabaş, Social Media and Clickstream Analysis in Turkish News with Apache Spark. In International Conference on Intelligent and Fuzzy Systems, 2019, pp. 221–228: Springer.
Singh A., Kaur M. Intelligent content-based cybercrime detection in online social networks using cuckoo search metaheuristic approach. J. Supercomput. 2019:1–23. [ Google Scholar ]
R. Thorstad and P. Wolff, Predicting future mental illness from social media: A big-data approach. Behav. Res. Methods, pp. 1–15, 2019. [ PubMed ]
E. Alomari, I. Katib, and R. Mehmood, Iktishaf: A Big Data road-traffic event detection tool using twitter and spark machine learning. Mob. Netw. Appl., pp. 1–16, 2020.
Panarello A., Celesti A., Fazio M., Puliafito A., Villari M. A big video data transcoding service for social media over federated clouds. Multimedia Tools Appl. 2020; 79 (13):9037–9061. [ Google Scholar ]
Sahoo S.R., Gupta B. Fake profile detection in multimedia big data on online social networks. Int. J. Inf. Comput. Secur. 2020; 12 (2–3):303–331. [ Google Scholar ]
Zhou P., Zhou Y., Wu D., Jin H. Differentially private online learning for cloud-based video recommendation with multimedia big data in social networks. IEEE Trans. Multimedia. 2016; 18 (6):1217–1229. [ Google Scholar ]
Zhang C., Xie L., Aizezi Y., Gu X. User multi-modal emotional intelligence analysis method based on deep learning in social network Big Data environment. IEEE Access. 2019; 7 :181758–181766. [ Google Scholar ]
Kauffmann E., Peral J., Gil D., Ferrández A., Sellers R., Mora H. A framework for big data analytics in commercial social networks: A case study on sentiment analysis and fake review detection for marketing decision-making. Ind. Mark. Manage. 2019 [ Google Scholar ]
Jiang D., Luo X., Xuan J., Xu Z. Sentiment computing for the news event based on the social media big data. IEEE Access. 2017; 5 :2373–2382. [ Google Scholar ]
Dalla Valle L., Kenett R. Social media big data integration: A new approach based on calibration. Expert Syst. Appl. 2018; 111 :76–90. [ Google Scholar ]
Jimenez-Marquez J.L., Gonzalez-Carrasco I., Lopez-Cuadrado J.L., Ruiz-Mezcua B. Towards a big data framework for analyzing social media content. Int. J. Inf. Manage. 2019; 44 :1–12. [ Google Scholar ]
Shirdastian H., Laroche M., Richard M.-O. Using big data analytics to study brand authenticity sentiments: The case of Starbucks on Twitter. Int. J. Inf. Manage. 2019; 48 :291–307. [ Google Scholar ]
Zhu B., Zheng X., Liu H., Li J., Wang P. Analysis of spatiotemporal characteristics of big data on social media sentiment with COVID-19 epidemic topics. Chaos, Solitons Fractals. 2020; 140 [ PMC free article ] [ PubMed ] [ Google Scholar ]
Fan M., Billings A., Zhu X., Yu P. Twitter-based BIRGing: Big Data analysis of English national team fans during the 2018 FIFA World Cup. Commun. Sport. 2020; 8 (3):317–345. [ Google Scholar ]
C. Lee and I. Paik, Stock market analysis from Twitter and news based on streaming big data infrastructure. In 2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST), 2017, pp. 312–317: IEEE.
A. A. Sayed, M. M. Abdallah, A. M. Zaki, and A. A. Ahmed, Big Data analysis using a metaheuristic algorithm: Twitter as Case Study. In 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), 2020, pp. 20–26: IEEE.
van Dieijen M., Borah A., Tellis G.J., Franses P.H. Big data analysis of volatility spillovers of brands across social media and stock markets. Ind. Mark. Manage. 2020; 88 :465–484. [ Google Scholar ]
Spruce M., Arthur R., Williams H. Using social media to measure impacts of named storm events in the United Kingdom and Ireland. Meteorol. Appl. 2020; 27 (1) [ Google Scholar ]
Um J.-H., Jeong C.-H., Choi S.-P., Lee S., Kim H.-M., Jung H. Distributed and parallel big textual data parsing for social sensor network. Int. J. Distrib. Sens. Netw. 2013; 9 (12) [ Google Scholar ]
I. Moise, The technical hashtag in Twitter data: A hadoop experience. In 2016 IEEE International Conference on Big Data (Big Data), 2016, pp. 3519–3528: IEEE.
D. Hsu, M. Moh, and T.-S. Moh, Mining frequency of drug side effects over a large twitter dataset using apache spark. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 2017, pp. 915–924.
A. Baltas, A. Kanavos, and A. K. Tsakalidis, An apache spark implementation for sentiment analysis on twitter data. In International Workshop of Algorithmic Aspects of Cloud Computing, 2016, pp. 15–25: Springer.
X. Sun, C. Zhang, S. Ding, and C. Quan, Detecting anomalous emotion through big data from social networks based on a deep learning method. Multimedia Tools Appl., pp. 1–22, 2018.
BalaAnand M., Karthikeyan N., Karthik S. Envisioning social media information for big data using big vision schemes in wireless environment. Wireless Pers. Commun. 2019:1–20. [ Google Scholar ]
A. P. Rodrigues and N. N. Chiplunkar, A new big data approach for topic classification and sentiment analysis of Twitter data. Evol. Intell., pp. 1–11, 2019.
Persico V., Pescapé A., Picariello A., Sperlí G. Benchmarking big data architectures for social networks data processing using public cloud platforms. Future Gener. Comput. Syst. 2018; 89 :98–109. [ Google Scholar ]
Elkin L.S., Topal K., Bebek G. Network based model of social media big data predicts contagious disease diffusion. Inf. Disc. Del. 2017; 45 (3):110–120. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Gao S., Pang H., Gallinari P., Guo J., Kato N. A novel embedding method for information diffusion prediction in social network big data. IEEE Trans. Ind. Inf. 2017; 13 (4):2097–2105. [ Google Scholar ]
A. Talukder and C. S. Hong, A heuristic mixed model for viral marketing cost minimization in social networks. In 2019 International Conference on Information Networking (ICOIN), 2019, pp. 141–146: IEEE.
Chen S., Yin X., Cao Q., Li Q., Long H. Targeted influence maximization based on cloud computing over big data in social networks. IEEE Access. 2020; 8 :45512–45522. [ Google Scholar ]
Y. Wang, B. Zhang, A. V. Vasilakos, and J. Ma, PRDiscount: A heuristic scheme of initial seeds selection for diffusion maximization in social networks. In International Conference on Intelligent Computing, 2014, pp. 149–161: Springer.
Kumaran P., Chitrakala S. Social influence determination on big data streams in an online social network. Multimedia Tools Appl. 2017; 76 (21):22133–22167. [ Google Scholar ]
Wu Y., Huang H., Wu N., Wang Y., Bhuiyan M.Z.A., Wang T. An incentive-based protection and recovery strategy for secure big data in social networks. Inf. Sci. 2020; 508 :79–91. [ Google Scholar ]
Wu Y., Huang H., Zhao J., Wang C., Wang T. Using mobile nodes to control rumors in big data based on a new rumor propagation model in vehicular social networks. IEEE Access. 2018; 6 :62612–62621. [ Google Scholar ]
Wu J., Zhao M., Chen Z. Small data: Effective data based on big communication research in social networks. Wireless Pers. Commun. 2018; 99 (3):1391–1404. [ Google Scholar ]
Óskarsdóttir M., Bravo C., Sarraute C., Vanthienen J., Baesens B. The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics. Appl. Soft Comput. 2019; 74 :26–39. [ Google Scholar ]
Yang X., McEwen R., Ong L.R., Zihayat M. A big data analytics framework for detecting user-level depression from social networks. Int. J. Inf. Manage. 2020; 54 [ Google Scholar ]
Raj E.D., Babu L.D. A firefly swarm approach for establishing new connections in social networks based on big data analytics. Int. J. Commun. Netw. Distrib.Syst. 2015; 15 (2–3):130–148. [ Google Scholar ]
K. Xu, F. Wang, X. Jia, and H. Wang, The impact of sampling on big data analysis of social media: A case study on flu and ebola. In 2015 IEEE Global Communications Conference (GLOBECOM), 2015, pp. 1–6: IEEE.
Su Z., Xu Q., Qi Q. Big data in mobile social networks: A QoE-oriented framework. IEEE Network. 2016; 30 (1):52–57. [ Google Scholar ]
K. S. Kumar, D. E. Geetha, N. Nagesh, and T. S. Manoj, Identify the influential user in online social networks using R, Hadoop and Python. In 2016 International Conference on Circuits, Controls, Communications and Computing (I4C), 2016, pp. 1–6: IEEE.
Y. Zhang, Z. Huang, S. Wang, X. Wang, and T. Jiang, “Spark-based measurement and analysis on offline mobile application market over device-to-device sharing in mobile social networks. in 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), 2017, pp. 545–552: IEEE.
Maireder A., Weeks B.E., Gil de Zúñiga H., Schlögl S. Big data and political social Networks: Introducing audience diversity and communication connector bridging measures in social network theory. Social Sci. Comput. Rev. 2017; 35 (1):126–141. [ Google Scholar ]
Dabas C. Big data analytics for exploratory social network analysis. Int. J. Inf. Technol. Manage. 2017; 16 (4):348–359. [ Google Scholar ]
H. Aksu, M. Canim, Y.-C. Chang, I. Korpeoglu, and Ö. Ulusoy, Multi-resolution social network community identification and maintenance on big data platform. In Big Data (BigData Congress), 2013 IEEE International Congress on, 2013, pp. 102–109: IEEE.
Z. Wu, J. Chen, and Y. Zhang, An incremental community detection method in social big data. In 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT), 2018, pp. 136–141: IEEE.
S. Yousfi, D. Chiadmi, F. Nafis, Toward a Big Data-as-a-service for social networks graphs analysis. In Proceedings of the Mediterranean Conference on Information & Communication Technologies 2015, 2016, pp. 593–598: Springer.
Sun J., Xu W., Ma J., Sun J. Leverage RAF to find domain experts on research social network services: A big data analytics methodology with MapReduce framework. Int. J. Prod. Econ. 2015; 165 :185–193. [ Google Scholar ]
Ghosh G., Banerjee S., Yen N.Y. State transition in communication under social network: An analysis using fuzzy logic and density based clustering towards big data paradigm. Future Gener. Comput. Syst. 2016; 65 :207–220. [ Google Scholar ]
Wang F., Mack E.A., Maciewjewski R. Analyzing entrepreneurial social networks with big data. Ann. Am. Assoc. Geogr. 2017; 107 (1):130–150. [ Google Scholar ]
K. Lin, J. Luo, L. Hu, M. S. Hossain, and A. Ghoneim, Localization based on social big data analysis in the vehicular networks. IEEE Trans. Ind. Inform, 99(1), 2016.
C. Li, P. Zhou, Y. Zhou, K. Bian, T. Jiang, and S. Rahardja, Distributed private online learning for social big data computing over data center networks. In 2016 IEEE International Conference on Communications (ICC), 2016, pp. 1–6: IEEE.
I. Paik, Y. Koshiba, and T. A. S. Siriweera, Efficient service discovery using social service network based on big data infrastructure. In 2017 IEEE 11th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2017, pp. 166–173: IEEE.
J. Wang, C. Jiang, S. Guan, L. Xu, and Y. Ren, Big data driven similarity based U-model for online social networks. In GLOBECOM 2017-2017 IEEE Global Communications Conference, 2017, pp. 1–6: IEEE.
S. Sharma, Building Real-time knowledge in Social Media on Focus Point: An Apache Spark Streaming Implementation. In 2018 IEEE Punecon, pp. 1–6: IEEE.
H. F. Karimi, S. U. Masruroh, F. Mintarsih, The influence of iteration calculation manipulation on social network analysis toward twitter's users against hoax in Indonesia with single cluster multi-node method using apache Hadoop Hortonworkstm distribution. In 2018 6th International Conference on Cyber and IT Service Management (CITSM), 2018, pp. 1–6: IEEE.
W. Du, Toward semantic social network analysis for business big data. In 2018 14th International Conference on Semantics, Knowledge and Grids (SKG), 2018, pp. 1–8: IEEE.
C. K. Leung and H. Zhang, Management of distributed big data for social networks. In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2016, pp. 639–648: IEEE.
Jin S., Lin W., Yin H., Yang S., Li A., Deng B. Community structure mining in big data social media networks with MapReduce. Cluster computing. 2015; 18 (3):999–1010. [ Google Scholar ]
Kuang L., Tang X., Yu M., Huang Y., Guo K. A comprehensive ranking model for tweets big data in online social network. EURASIP J. Wire. Commun. Netw. 2016; 2016 (1):46. [ Google Scholar ]
Hamzei M., Navimipour N.J. Toward efficient service composition techniques in the Internet of things. IEEE Internet Things J. 2018; 5 (5):3774–3787. [ Google Scholar ]
M. Akbari, X. Hu, and T.-S. Chua, Learning wellness profiles of users on social networks: The case of diabetes. In Social Web and Health Research: Springer, 2019, pp. 139–169.
M. Akbari, K. Relia, A. Elghafari, R. Chunara, From the user to the medium: Neural profiling across web communities. In Twelfth International AAAI Conference on Web and Social Media, 2018.
Nie L., Zhao Y.-L., Akbari M., Shen J., Chua T.-S. Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans. Knowl. Data Eng. 2014; 27 (2):396–409. [ Google Scholar ]
M. Akbari and T.-S. Chua, Leveraging behavioral factorization and prior knowledge for community discovery and profiling. Presented at the Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, United Kingdom, 2017.
Akbari M., Hu X., Wang F., Chua T. Wellness representation of users in social media: Towards joint modelling of heterogeneity and temporality. IEEE Trans. Knowl. Data Eng. 2017; 29 (10):2360–2373. [ Google Scholar ]
Zhang H., Babar M.A. Systematic reviews in software engineering: An empirical investigation. Inf. Softw. Technol. 2013; 55 (7):1341–1354. [ Google Scholar ]
Casciaro T., Carley K.M., Krackhardt D. Positive affectivity and accuracy in social network perception. Motiv. Emotion. 1999; 23 (4):285–306. [ Google Scholar ]
Kalna G., Higham D.J. A clustering coefficient for weighted networks, with application to gene expression data. AI Commun. 2007; 20 (4):263–271. [ Google Scholar ]
Zhang P., Wang J., Li X., Li M., Di Z., Fan Y. Clustering coefficient and community structure of bipartite networks. Physica A. 2008; 387 (27):6869–6875. [ Google Scholar ]
Holland P.W., Leinhardt S. Transitivity in structural models of small groups. Comp. Group Stud. 1971; 2 (2):107–124. [ Google Scholar ]
Watts D.J., Strogatz S.H. Collective dynamics of ‘small-world’networks. Nature. 1998; 393 (6684):440. [ PubMed ] [ Google Scholar ]
L. A. Cutillo, M. Manulis, T. Strufe, Security and privacy in online social networks. In Handbook of Social Network Technologies and Applications .Springer, 2010, pp. 497–522.
Amelio A., Pizzuti C. Correction for closeness: Adjusting normalized mutual information measure for clustering comparison. Comput. Intell. 2017; 33 (3):579–601. [ Google Scholar ]
X. Wang, L. Tang, H. Gao, H. Liu. Discovering overlapping groups in social media. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, 2010, pp. 569–578: IEEE.
V. Junquero-Trabado, N. Trench-Ribes, M. A. Aguila-Lorente, D. Dominguez-Sal, Comparison of influence metrics in information diffusion networks. In Computational Aspects of Social Networks (CASoN), 2011 International Conference on, 2011, pp. 31–36: IEEE.
Getoor L., Diehl C.P. Link mining: A survey. Acm Sigkdd Explor. News. 2005; 7 (2):3–12. [ Google Scholar ]
Abbasi A., Altmann J., Hossain L. Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. J. Inf. 2011; 5 (4):594–607. [ Google Scholar ]
Everett M.G. Centrality and the dual-projection approach for two-mode social network data. Methodol. Innovations. 2016; 9 [ Google Scholar ]
Kim Y., Choi T.Y., Yan T., Dooley K. Structural investigation of supply networks: A social network analysis approach. J. Oper. Manage. 2011; 29 (3):194–211. [ Google Scholar ]
D. G. Luenberger, Introduction to Dynamic Systems: Theory, Models, and Applications. Wiley New York, 1979.
Newman M.E. Analysis of weighted networks. Phys. Rev. E. 2004; 70 (5) [ PubMed ] [ Google Scholar ]
S. A. Catanese, P. De Meo, E. Ferrara, G. Fiumara, A. Provetti, Crawling facebook for social network analysis purposes. In Proceedings of the International Conference on Web Intelligence, Mining and Semantics, 2011, p. 52: ACM.
L. Page, S. Brin, R. Motwani, T. Winograd, The pagerank citation ranking: Bringing order to the web. Stanford InfoLab1999.

Digital Marketing
IT Staff Augmentation
Data & AI
E-commerce Development

Understanding Social Network Analysis: A Complete Guide

Get Free SEO Audit Report

Boost your website's performance with a free SEO audit report. Don't miss out on the opportunity to enhance your SEO strategy for free!

Key Takeaways

Social Network Analysis (SNA) offers deep insights into interconnected relationships and network structures, aiding decision-making processes across various domains.

Understanding data privacy concerns and ethical considerations is crucial in conducting responsible SNA research and analysis.

Effectively handling big data is a significant challenge in SNA, requiring advanced tools and strategies for accurate analysis and interpretation.

Ethical guidelines and transparency are paramount in navigating the complexities of SNA research, ensuring integrity and respect for individual privacy.

Mastering SNA involves striking a balance between technical proficiency and ethical considerations, unlocking its full potential for impactful insights and applications.

Learning Social Network Analysis (SNA) reveals a world of connections and data insights. This guide will teach you its basics, methods, and uses. Yet, a key question remains: How can we fully use SNA to better understand human interactions and decisions?

Introduction To Social Network Analysis

What is Social Network Analysis (SNA)?

Social Network Analysis (SNA) uses networks and graph theory to study social structures. It maps and measures relationships and flows among people, groups, or organizations.

This reveals interaction patterns and network structures. SNA is key in understanding information, resources, and influence flow. It offers insights beyond traditional methods.

Importance of SNA in Modern Research

Modern research values Social Network Analysis (SNA). It uncovers social interaction dynamics and complexity. SNA is crucial in sociology, anthropology, epidemiology, and organizational studies. It reveals how relationships shape behavior.

For example, in public health, SNA tracks disease spread in communities. In business, it shows how informal networks impact effectiveness. Researchers, through SNA, gain insights into social issues. This leads to better interventions and strategies.

Key Concepts in Social Network Analysis

Nodes and Edges

Social Network Analysis (SNA) focuses on two key elements: nodes and edges. Nodes stand for network members, like people, organizations, or computers. Meanwhile, edges are their direct connections, showing interactions. Knowledge of these elements is vital. They are the building blocks of social networks, allowing analysts to understand complex relationships.

Types of Networks

Different types of networks are essential to grasp in Social Network Analysis. These include:

Undirected Networks : Here, connections between nodes have no direction, indicating a mutual relationship, such as friendships.
Directed Networks : In these networks, edges have a direction, showing a one-way relationship, like followers on Twitter.
Weighted Networks : These networks assign weights to edges, representing the strength or frequency of the connection, such as the number of emails exchanged between individuals.
Network Metrics

Network metrics are critical for quantifying the structure and properties of social networks. Key metrics in Social Network Analysis include:

Degree Centrality : This measures the number of direct connections a node has, indicating its activity level within the network.
Betweenness Centrality : This metric shows the extent to which a node lies on the shortest paths between other nodes, highlighting its role as a bridge or mediator.
Closeness Centrality : This measures how close a node is to all other nodes in the network, reflecting its ability to spread information efficiently.

Methodologies for Social Network Analysis

Data Collection Techniques

In Social Network Analysis (SNA), data collection plays a pivotal role in extracting meaningful insights from social networks.

One of the primary techniques used is surveying , where individuals are asked to identify their connections and relationships within a network. This approach helps in mapping out the structure and dynamics of the network.

State of Technology 2024

Humanity's Quantum Leap Forward

Explore 'State of Technology 2024' for strategic insights into 7 emerging technologies reshaping 10 critical industries. Dive into sector-wide transformations and global tech dynamics, offering critical analysis for tech leaders and enthusiasts alike, on how to navigate the future's technology landscape.

Data and AI Services

With a Foundation of 1,900+ Projects, Offered by Over 1500+ Digital Agencies, EMB Excels in offering Advanced AI Solutions. Our expertise lies in providing a comprehensive suite of services designed to build your robust and scalable digital transformation journey.

Another valuable technique is archival data analysis , which involves studying existing records such as communication logs, email threads, or organizational charts to uncover patterns and relationships within the network. This method provides a historical perspective and can reveal how networks evolve over time.

Commonly Used SNA Software

Several software tools are available for conducting Social Network Analysis (SNA), each offering unique features and functionalities.

Gephi is a popular open-source tool known for its interactive visualization capabilities and extensive network analysis algorithms. It allows users to explore and analyze large-scale networks with ease.

UCINET (UCI Network) is another widely used software package that provides a comprehensive suite of tools for network analysis, including centrality measures, clustering algorithms, and statistical tests. It is favored by researchers and analysts for its robustness and versatility in handling diverse network datasets.

NodeXL stands out for its integration with Microsoft Excel, making it accessible to users familiar with spreadsheet-based data manipulation. It offers a user-friendly interface and supports various network metrics and visualizations, making it suitable for both beginners and advanced analysts.

Visualization Techniques in SNA

Visualization is a crucial aspect of Social Network Analysis (SNA) as it allows researchers and practitioners to interpret complex network structures and patterns visually.

Node-Link Diagrams represent nodes (individual entities) and edges (relationships) in a network graphically, providing a clear depiction of connections and clusters.

Heatmaps and matrix plots are employed to visualize network data in a matrix format, highlighting the strength and density of relationships between nodes. These visualizations aid in identifying key influencers, detecting communities, and understanding the flow of information or resources within the network.

Interactive visualizations enhance the exploration and analysis process by enabling users to interactively navigate and filter network data, zoom into specific regions, and extract detailed information on nodes and edges. This dynamic approach fosters deeper insights and facilitates communication of findings to stakeholders effectively.

Applications of Social Network Analysis

SNA in Social Media Analytics

Social Network Analysis (SNA) plays a pivotal role in understanding the dynamics of social media platforms . It helps in analyzing the relationships, interactions, and influence among individuals or entities within these digital networks.

By applying SNA techniques, businesses can gain insights into user behavior, identify key influencers, track information flow, and optimize their social media strategies for better engagement and ROI .

SNA in Healthcare

In the realm of healthcare, Social Network Analysis (SNA) has emerged as a valuable tool for studying patient-provider relationships, healthcare collaborations, and disease transmission patterns.

By mapping out the social networks within healthcare settings, researchers and practitioners can identify central nodes, assess information dissemination, detect potential bottlenecks, and enhance care coordination for improved patient outcomes and organizational efficiency.

SNA in Organizational Behavior

Social Network Analysis (SNA) offers profound insights into organizational behavior by examining the relationships, communication patterns, and knowledge sharing among employees, departments, and external stakeholders.

By leveraging SNA, organizations can identify informal leaders, enhance collaboration, streamline decision-making processes, foster innovation, and strengthen overall performance and productivity.

SNA in Political Science

In the realm of political science, Social Network Analysis (SNA) provides a systematic approach to studying political actors, alliances, power dynamics, and information dissemination within political systems.

By employing SNA techniques, researchers can analyze political networks, assess influence flows, map out lobbying efforts, understand coalition formations, and gain a deeper understanding of the complex socio-political landscape for informed decision-making and policy development.

Advanced Topics in SNA

Network Dynamics and Evolution

Social Network Analysis (SNA) delves into the dynamic nature of networks, exploring how they evolve and transform over time. This field investigates the intricate processes that drive changes within networks, encompassing both growth and decline phenomena. By studying these dynamics, analysts gain valuable insights into the underlying mechanisms that shape network structures.

Modeling Network Growth and Decline

In understanding Social Network Analysis, it’s essential to grasp the methodologies used to model network growth and decline. Researchers employ various mathematical and computational models to simulate these processes, allowing them to predict and analyze network changes over time. These models play a crucial role in forecasting network trends and anticipating potential shifts in connectivity patterns.

Community Detection Algorithms

A fundamental aspect of Social Network Analysis involves community detection algorithms. These algorithms are designed to identify clusters and subgroups within a network, revealing distinct communities based on shared attributes or interactions. Different methods, such as modularity optimization and hierarchical clustering, are employed to uncover meaningful structures within complex networks.

Different Community Detection Methods

Social Network Analysis encompasses a range of community detection methods, each offering unique advantages and applications. From traditional approaches like hierarchical clustering to advanced techniques like spectral clustering and Louvain algorithm, analysts have a diverse toolkit to explore and analyze network communities. These methods facilitate a nuanced understanding of network dynamics and community structures.

Social Network Analysis Tools and Software

To conduct in-depth analyses, researchers and practitioners rely on specialized Social Network Analysis tools and software. Popular packages like Gephi and NetworkX provide comprehensive functionalities for visualizing, modeling, and analyzing networks.

Additionally, online platforms and resources offer accessible tools for conducting SNA studies, enhancing collaboration and knowledge sharing within the field.

Challenges and Limitations of Social Network Analysis

1. Data Privacy Concerns

When delving into Social Network Analysis (SNA), one immediate challenge is navigating data privacy concerns. The intricate web of connections analyzed in SNA often involves personal information, raising questions about consent, confidentiality, and data protection.

Striking a balance between extracting valuable insights and respecting individuals’ privacy rights remains a critical consideration in SNA research and practice.

2. Handling Big Data

Another significant challenge in Social Network Analysis is effectively handling big data. With the exponential growth of digital interactions, SNA researchers often encounter vast amounts of data that require advanced tools and techniques for processing and analysis.

Scalability, computational resources, and data management strategies become paramount in ensuring the accuracy and reliability of SNA outcomes.

3. Ethical Considerations in SNA Research

Ethical considerations play a crucial role in Social Network Analysis research endeavors. Researchers must navigate ethical dilemmas concerning data collection methods, participant consent, and the potential impact of their findings on individuals and communities. Maintaining transparency, integrity, and adherence to ethical guidelines are fundamental pillars in conducting ethically sound SNA studies.

In conclusion, Social Network Analysis is a powerful tool for understanding relationships and interactions within networks. By analyzing connections, nodes, and patterns, businesses can gain valuable insights into their audience, improve decision-making, and enhance network performance. Mastering these concepts can lead to more effective strategies and meaningful outcomes in various fields.

What is Social Network Analysis?

Social Network Analysis (SNA) is a methodology used to study relationships and interactions within a network of individuals, groups, or organizations. It involves mapping and measuring the relationships and flows between people, groups, organizations, computers, or other information/knowledge processing entities. By analyzing these networks, SNA can uncover patterns and insights that are not apparent through traditional analysis.

Why is Social Network Analysis important?

Social Network Analysis is crucial for understanding the complex dynamics of interactions within various networks, from social media platforms to organizational structures. It helps identify key influencers, understand information flow, and detect communities or clusters. This analysis is vital for strategic decision-making in marketing, public health, organizational management, and more.

What tools are commonly used in Social Network Analysis?

Common tools for Social Network Analysis include Gephi, UCINET, and NodeXL, which provide powerful visualization and analysis capabilities. These tools help researchers and analysts map networks, calculate network metrics, and visualize relationships. Each tool offers unique features tailored to different types of network analysis, making them essential for both beginners and experts.

What are the key metrics used in Social Network Analysis?

Key metrics in Social Network Analysis include degree centrality, betweenness centrality, and closeness centrality. Degree centrality measures the number of direct connections an entity has, betweenness centrality indicates the entity’s role as a bridge within the network, and closeness centrality measures how quickly an entity can access others in the network. These metrics help identify influential nodes and understand the network’s structure.

What are the ethical considerations in Social Network Analysis?

Ethical considerations in Social Network Analysis include data privacy, consent, and the potential misuse of network data. Researchers must ensure that data is collected and used responsibly, protecting individuals’ privacy and obtaining necessary permissions. It’s also important to consider the impact of network analysis findings on individuals and groups, avoiding harm or exploitation.

What is file management things to know, a beginner’s guide to issue tracking, what is web filtering essential things to know, all you should know about it strategy, what is bot management and how does it work, understanding human-computer interaction (hci), table of contents.

Expand My Business is Asia's largest marketplace platform which helps you find various IT Services like Web and App Development, Digital Marketing Services and all others.

Article Categories

Technology 871
Business 417
Digital Marketing 348
Social Media Marketing 135
E-Commerce 134

Sitemap / Glossary

Privacy Overview
Strictly Necessary Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

DOI: 10.46632/jdaai/3/3/11
Corpus ID: 272451697

Twitter Sentiment Analysis with LSTM Neural Networks

Jayanth Kande
Published in REST Journal on Data… 6 September 2024
Computer Science
REST Journal on Data Analytics and Artificial Intelligence

4 References

Glove: global vectors for word representation, long short-term memory, mining and summarizing customer reviews, learning long-term dependencies with gradient descent is difficult, related papers.

Showing 1 through 3 of 0 Related Papers

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

Publications
Our Methods
Short Reads
Tools & Resources

Read Our Research On:

Who is Hispanic?

Beauty pageant contestants at the Junta Hispana Hispanic cultural festival in Miami.

Debates over who is Hispanic have often fueled conversations about identity among Americans who trace their heritage to Latin America or Spain .

So, who is considered Hispanic in the United States today? How exactly do the federal government and others count the Hispanic population? And what role does race play in deciding who counts as Hispanic?

We’ll answer these and other common questions here.

To answer the question of who is Hispanic, this analysis draws on about five decades of U.S. Census Bureau data and about two decades of Pew Research Center surveys of Hispanic adults in the United States.

National counts of the Latino population come from the Census Bureau’s decennial census (this includes P.L. 94-171 census data ) and official population estimates . The bureau’s American Community Survey (ACS) provides demographic details such as race, country of origin and intermarriage rates. Some ACS data was accessed through IPUMS USA from the University of Minnesota.

Views of Hispanic identity draw on the Center’s National Survey of Latinos (NSL), which is fielded in English and Spanish. The survey has been conducted online since 2019, primarily through the Center’s American Trends Panel (ATP), which is recruited through national, random sampling of residential addresses. This way nearly all adults have a chance of selection. The survey is weighted to be representative of the U.S. Hispanic adult population by gender, Hispanic origin, partisan affiliation, education and other categories. Read more about the ATP’s methodology . The NSL was conducted by phone from 2002 to 2018.

Read further details on how the Census Bureau asked about race and ethnicity and coded responses in the 2020 census. Here is a full list of origin groups that were coded as Hispanic in the 2020 census.

How many Hispanics are in the U.S. today?

The Census Bureau estimates there were 65.2 million Hispanics in the U.S. as of July 1, 2023, a new high. They made up more than 19% of the nation’s population .

How are Hispanics identified and counted in government surveys, public opinion polls and other studies?

Before diving into the details, keep in mind that some surveys ask about Hispanic origin and race separately, following current Census Bureau practices – though these are soon to change.

One way to count Hispanics is to include those who say they are Hispanic, with no exceptions – that is, you are Hispanic if you say you are. Pew Research Center uses this approach in our surveys, as do other polling firms such as Gallup and voter exit polls .

The Census Bureau largely counts Hispanics this way, too, but with some exceptions. If respondents select only the “Other Hispanic” category and write in only non-Hispanic responses such as “Irish,” the Census Bureau recodes the response as non-Hispanic.

However, beginning in 2020 , the bureau widened the lens to include a relatively small number of people who did not check a Hispanic box on the census form but answered the race question in a way that implied a Hispanic background. As a result, someone who answered the race question by saying that they are “Mexican” or “Argentinean” was counted as Hispanic, even if they did not check the Hispanic box.

From the available data, the exact number of respondents affected by this change is difficult to determine. But it appears to be about 1% of Hispanics or fewer, according to a Pew Research Center analysis of U.S. Census Bureau data.

An image showing how the U.S. Census Bureau determines who is Hispanic in government surveys.

How do Hispanics identify their race in Census Bureau surveys?

In the eyes of the Census Bureau, Hispanics can be of any race, because “Hispanic” is an ethnicity and not a race. However, this distinction is subject to debate . A 2015 Center survey found that 17% of Hispanic adults said being Hispanic is mainly a matter of race, while 29% said it is mainly a matter of ancestry. Another 42% said it is mainly a matter of culture.

A bar chart showing that most Hispanics do not identify their race only as White, Black or Asian.

Nonetheless, the Census Bureau’s 2022 American Community Survey (ACS) provides the self-reported racial identity of Hispanics: 22.5 million single-race Hispanics identified only as “some other race.” This group mostly includes those who wrote in a Hispanic origin or nationality as their race. Another 10.7 million identified as White. Fewer Hispanics identified as American Indian (1.5 million), Black (1.0 million) or Asian (300,000).

Multiracial Hispanics

Another roughly 27.5 million Hispanics identified as more than one race in 2022, up from just 3 million in 2010.

Growth in the number of multiracial Hispanics comes primarily from those who identify as White and “some other race.” That population grew from 1.6 million to 24.9 million between 2010 and 2022. The number of Hispanics who identify as White and no other race declined from 26.7 million to 10.7 million.

The sharp increase in multiracial Hispanics could be due to several factors, including changes to the census form introduced in 2020 that added more space for written responses to the race question and growing racial diversity among Hispanics. This explanation is supported by the fact that almost 25 million of the Hispanics who identified as two or more races in 2022 were coded as “some other race” (and wrote in a response) and one of the specific races (such as Black or White). About 2.6 million Hispanics identified with two or more of the five major races offered in the census.

Changes for the 2030 census

The 2030 census will combine the race and ethnicity questions , a change that other federal surveys will implement in coming years. The new question will add checkboxes for “Hispanic or Latino” and “Middle Eastern or North African” among other race groups long captured in Census Bureau surveys.

Officials hope the changes will reduce the number of Americans who choose the “Some other race” category, especially among Hispanics . However, it’s worth noting that public feedback has raised a variety of concerns, including that combining the race and ethnicity questions could lead to an undercount of the nation’s Afro-Latino population .

Is there an official definition of Hispanic or Latino?

In 1976, Congress passed a law that required the government to collect and analyze data for a specific ethnic group: “Americans of Spanish origin or descent.” That legislation defined this group as “Americans [who] identify themselves as being of Spanish-speaking background and trace their origin or descent from Mexico, Puerto Rico, Cuba, Central and South America, and other Spanish-speaking countries.” This includes around 20 Spanish-speaking nations from Latin America and Spain itself, but not Portugal or Portuguese-speaking Brazil.

To implement this law, the U.S. Office of Management and Budget (OMB) developed Statistical Policy Directive No. 15 (SPD 15) in 1977, then revised it in 1997 and again in March 2024. In the most recent revision, OMB updated racial and ethnic definitions when it announced the combined race and ethnicity question. The current definition of “ Hispanic or Latino ” is “individuals of Mexican, Puerto Rican, Salvadoran, Cuban, Dominican, Guatemalan, and other Central or South American or Spanish culture or origin.”

The Census Bureau first asked everybody in the U.S. about Hispanic ethnicity in 1980. But it made some efforts before then to count people who today would be considered Hispanic. The Census Bureau also has a long history of changing labels and shifting categories . In the 1930 census, for example, the race question had a category for “Mexican.”

The first major attempt to estimate the size of the nation’s Hispanic population came in 1970 and prompted widespread concerns among Hispanic organizations about an undercount. A portion of the U.S. population (5%) was asked if their origin or descent was from the following categories: “Mexican, Puerto Rican, Cuban, Central or South American, Other Spanish” or “No, none of these.”

This approach indeed undercounted about 1 million Hispanics. Many second-generation Hispanics did not select one of the Hispanic groups because the question did not include terms like “Mexican American.” The question wording also resulted in hundreds of thousands of people living in the Central or Southern regions of the U.S. being mistakenly included in the “Central or South American” category.

By 1980, the current approach – in which someone is asked if they are Hispanic – had taken hold, with some changes to the question and response categories since then. In 2000, for example, the term “Latino” was added to make the question read, “Is this person Spanish/Hispanic/Latino?”

What’s the difference between Hispanic and Latino?

“Hispanic” and “Latino” are pan-ethnic terms meant to describe – and summarize – the population of people of that ethnic background living in the U.S. In practice, the Census Bureau often uses the term “Hispanic” or “Hispanic or Latino.”

Some people have drawn sharp distinctions between these two terms . For example, some say that Hispanics are from Spain or from Spanish-speaking countries in Latin America, which matches the federal definition, and Latinos are people from Latin America, regardless of language. In this definition, Latinos would include people from Brazil (where Portuguese is the official language) but not Spain or Portugal.

A stacked bar chart showing that Hispanics describe their identity in different ways.

Pan-ethnic labels like Hispanic and Latino, though widely used, are not universally embraced by the population being labeled. Our 2023 National Survey of Latinos shows a preference for other terms to describe identity: 52% of respondents most often described themselves by their family’s country of origin, while 30% used the terms Hispanic, Latino, Latinx or Latine, and 17% most often described themselves as American.

The 2023 survey also finds varying preferences for pan-ethnic labels: 52% of Hispanics prefer to describe themselves as Hispanic, 29% prefer Latino, 2% prefer Latinx, 1% prefer Latine and 15% have no preference.

What is ‘Latinx’ and who uses it?

A line chart showing that awareness of ‘Latinx’ has doubled since 2019, but use remains low.

Latinx is a pan-ethnic identity term that has emerged in recent years as an alternative to Hispanic and Latino. Some news and entertainment outlets, corporations , local governments and universities use it to describe the nation’s Hispanic population.

However, its popularity has brought increased scrutiny in the U.S. and abroad . Some critics say it ignores the gendered forms of Spanish language, while others see Latinx as a gender- and LGBTQ+-inclusive term . Adding to the debate, some state lawmakers favor banning the use of the term entirely in government documents; Arkansas has done so already .

A 2023 survey found that awareness of Latinx has doubled among U.S. Hispanics since 2019, with growth across all major demographic subgroups. Still, the share of Hispanic adults who use Latinx to describe themselves is statistically unchanged: In 2023, 4% said they use it, compared with 3% in 2019.

Latinx is also broadly unpopular among Latinos who know the term. Three-in-four Latino adults who are aware of Latinx say the term should not be used to describe Hispanics or Latinos.

The emergence of Latinx coincides with a global movement to introduce gender-neutral nouns and pronouns into many languages that have traditionally used male or female constructions. In the U.S., Latinx first appeared more than a decade ago, and it was added to a widely used English dictionary in 2018.

What is ‘Latine’ and who uses it?

A pie chart showing that about 1 in 5 Hispanics have heard of ‘Latine.’

Latine is another pan-ethnic term that has emerged in recent years. Our 2023 survey found that 18% of U.S. Hispanics have heard of the term.

Similar to familiarity with Latinx, awareness of Latine varies by age, education and sexual orientation. Among Latinos, awareness of Latine is highest among those ages 18 to 29 (22%), college graduates (24%) and lesbian, gay and bisexual adults (32%).

How do factors like language, parental background and last name affect whether someone is considered Hispanic?

Many U.S. Hispanics have an inclusive view of what it means to be Hispanic:

78% of Hispanic adults said in a 2022 Center survey that speaking Spanish is not required to be considered Hispanic. English-dominant Hispanics were more likely than Spanish-dominant Hispanics to say so (93% vs. 64%).
33% of Hispanic adults said in a 2019 survey that having two Hispanic parents is not an essential part of what being Hispanic means to them. Another 34% said it was important but not essential and 32% said it was essential.
84% of Hispanic adults said in a 2015 survey that having a Spanish last name is not required.

Views of Hispanic identity may change in the coming decades as broad societal changes, such as rising intermarriage rates, produce an increasingly diverse and multiracial U.S. population .

Today, many Hispanic families include people who are not Hispanic:

A chart showing that, in 2022, 3 in 10 Hispanic newlyweds in the U.S. married someone who is not Hispanic.

Spouses: Among all married Hispanics in 2022, 22% had a spouse who is not Hispanic. And in a 2023 Center survey , 27% of Hispanics with a spouse or partner said their spouse or partner is not Hispanic.

Newlyweds: In 2022, 30% of Hispanic newlyweds married someone who is not Hispanic. Among them, 41% of those born in the U.S. married someone who is not Hispanic, compared with 11% of immigrant newlyweds, according to an analysis of ACS data.

Parents: Our 2015 survey found that 15% of U.S. Hispanic adults had at least one parent who is not Hispanic. This share rose to 29% among the U.S. born and 48% among the third or higher generation – those born in the U.S. to parents who were also U.S. born.

What role does skin color play in whether someone is Hispanic?

In surveys like those from the Census Bureau, skin color does not play a role in determining who is Hispanic or not. However, as with race, Latinos can have many different skin tones. A 2021 Center survey of Latino adults showed respondents a palette of 10 skin colors and asked them to choose which one most closely resembled their own.

Latinos reported having a variety of skin tones, reflecting the diversity within the group. Eight-in-ten Latinos selected one of the four lightest skin colors. By contrast, only 3% selected one of the four darkest skin colors.

A bar chart showing that Afro-Latinos are about 2% of U.S. adult population and 12% of Latino adults but almost one-in-seven do not identify as Hispanic or Latino.

A majority of Latino adults (57%) say skin color shapes their daily life experiences at least somewhat. Similar shares say having a lighter skin color helps Latinos get ahead in the U.S. (59%) and that having a darker skin color hurts Latinos’ ability to get ahead (62%).

Are Afro-Latinos Hispanic?

Afro-Latino identity is distinct from and can exist alongside a person’s Hispanic identity. Afro-Latinos’ life experiences are shaped by race, skin tone and other factors in ways that differ from other Hispanics. While most Afro-Latinos identify as Hispanic or Latino, not all do, according to our estimates based on a survey of U.S. adults conducted in 2019 and 2020.

In 2020, about 6 million Afro-Latino adults lived in the U.S., making up about 2% of the U.S. adult population and 12% of the adult Latino population. About one-in-seven Afro-Latinos – an estimated 800,000 adults – do not identify as Hispanic.

Are Brazilians, Portuguese, Belizeans and Filipinos considered Hispanic?

Officially, Brazilians are not considered Hispanic or Latino because the federal government’s definition applies only to those of “Spanish culture or origin.” In most cases, people who report their Hispanic or Latino ethnicity as Brazilian in Census Bureau surveys are later recategorized – or “back coded” – as not Hispanic or Latino . The same is true for people with origins in Belize, the Philippines and Portugal.

An error in how the Census Bureau processed data from a 2020 national survey omitted some of this coding and provided a rare window into how Brazilians (and other groups) living in the U.S. view their identity.

In 2020, at least 416,000 Brazilians — more than two-thirds of Brazilians in the U.S. — described themselves as Hispanic or Latino on the ACS and were mistakenly counted that way. Only 14,000 Brazilians were counted as Hispanic in 2019, and 16,000 were in 2021.

The large number of Brazilians who self-identified as Hispanic or Latino highlights how their view of their own identity does not necessarily align with official government definitions. It also underscores that being Hispanic or Latino means different things to different people .

How many people with Hispanic ancestry do not identify as Hispanic?

Of the 42.7 million adults with Hispanic ancestry living in the U.S. in 2015, an estimated 5 million people, or 11%, said they do not identify as Hispanic or Latino , according to a 2015-16 Center survey. These people aren’t counted as Hispanic in our surveys.

Notably, Hispanic self-identification varies across immigrant generations. Among immigrants from Latin America, nearly all identify as Hispanic. But by the fourth generation, only half of people with Hispanic heritage in the U.S. identify as Hispanic.

Note: This is an update of a post originally published on May 28, 2009.

Hispanic/Latino Identity
Racial & Ethnic Identity

Mark Hugo Lopez is director of race and ethnicity research at Pew Research Center .

Jens Manuel Krogstad is a senior writer and editor at Pew Research Center .

Jeffrey S. Passel is a senior demographer at Pew Research Center .

Latinx Awareness Has Doubled Among U.S. Hispanics Since 2019, but Only 4% Use It

A majority of latinas feel pressure to support their families or to succeed at work, key facts about u.s. latinos for national hispanic heritage month, latinos’ views of and experiences with the spanish language, 11 facts about hispanic origin groups in the u.s., most popular.

901 E St. NW, Suite 300 Washington, DC 20004 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 | Media Inquiries

Research Topics

Email Newsletters

ABOUT PEW RESEARCH CENTER Pew Research Center is a nonpartisan, nonadvocacy fact tank that informs the public about the issues, attitudes and trends shaping the world. It does not take policy positions. The Center conducts public opinion polling, demographic research, computational social science research and other data-driven research. Pew Research Center is a subsidiary of The Pew Charitable Trusts , its primary funder.

Introduction
Conclusions
Article Information

Data Sharing Statement

Sociodemographic and Lifestyle Factors and Epigenetic Aging in US Young Adults JAMA Network Open Original Investigation July 29, 2024 This cohort study investigates the association of sociodemographic and lifestyle factors with biological age as measured by epigenetic clocks among younger adults. Kathleen Mullan Harris, PhD; Brandt Levitt, PhD; Lauren Gaydosh, PhD; Chantel Martin, PhD; Jess M. Meyer, PhD; Aura Ankita Mishra, PhD; Audrey L. Kelly, PhD; Allison E. Aiello, PhD
Telehealth Parenting Program and Epigenetic Biomarkers in Children With Developmental Delay JAMA Network Open Original Investigation July 29, 2024 This secondary analysis of a randomized clinical trial assesses the association of a telehealth parent-child interaction training program with biomarkers associated with aging and chronic inflammation among preschool-aged children with developmental delay. Sarah M. Merrill, PhD; Christina Hogan, MS; Anne K. Bozack, PhD; Andres Cardenas, PhD; Jonathan S. Comer, PhD; Daniel M. Bagner, PhD; April Highlander, PhD; Justin Parent, PhD
Socioeconomic Status, Lifestyle, and DNA Methylation Age JAMA Network Open Original Investigation July 29, 2024 This cohort study explores whether the rate of biological aging estimated by an epigenetic clock is associated with social determinants of health in a racially and ethnically diverse population. Alika K. Maunakea, PhD; Krit Phankitnirundorn, PhD; Rafael Peres, PhD; Christian Dye, PhD; Ruben Juarez, PhD; Catherine Walsh, PhD; Connor Slavens, BSc; S. Lani Park, PhD; Lynne R. Wilkens, DrPH; Loïc Le Marchand, MD, PhD
Epigenetic Age Acceleration and Disparities in Posttraumatic Stress in Women JAMA Network Open Original Investigation July 29, 2024 This cohort study examines the association of epigenetic age acceleration with probable posttraumatic stress disorder and symptom severity in US women exposed to disaster. Alicia K. Smith, PhD; Seyma Katrinli, PhD; Dawayland O. Cobb, MS; Evan G. Goff, BS; Michael Simmond, BS; Grace M. Christensen, PhD, MPH; Tyler Prusisz, BS; Sierra N. Garth, MPH; Meghan Brashear, MPH; Anke Hüls, PhD, MSc; Erika J. Wolf, PhD; Edward J. Trapido, ScD; Ariane L. Rung, PhD, MPH; Nicole R. Nugent, PhD; Edward S. Peters, DMD, SM, ScD
Childhood Maltreatment and Longitudinal Epigenetic Aging JAMA Network Open Original Investigation July 29, 2024 This cohort study examines whether childhood exposure to physical and emotional abuse and neglect is associated with the rate of epigenetic aging. Olivia D. Chang, MSW; Helen C. S. Meier, PhD; Kathryn Maguire-Jack, PhD; Pamela Davis-Kean, PhD; Colter Mitchell, PhD
Familial Loss of a Loved One and Biological Aging JAMA Network Open Original Investigation July 29, 2024 This cohort study evaluates associations between losing a loved one and accelerated biological aging. Allison E. Aiello, PhD, MS; Aura Ankita Mishra, PhD; Chantel L. Martin, PhD; Brandt Levitt, PhD; Lauren Gaydosh, PhD; Daniel W. Belsky, PhD; Robert A. Hummer, PhD; Debra J. Umberson, PhD; Kathleen Mullan Harris, PhD
Obesity and Early-Onset Breast Cancer in Black and White Women JAMA Network Open Original Investigation July 29, 2024 This cohort study of patients with breast cancer examines whether a race-specific association exists between obesity and early-onset breast cancer or the diagnosis of specific molecular subtypes. Sarabjeet Kour Sudan, PhD; Amod Sharma, PhD; Kunwar Somesh Vikramdeo, PhD; Wade Davis, BS; Sachin K. Deshmukh, PhD; Teja Poosarla, MD; Nicolette P. Holliday, MD; Pranitha Prodduturvar, MD; Cindy Nelson, BS; Karan P. Singh, PhD; Ajay P. Singh, PhD; Seema Singh, PhD
Psychosocial Disadvantage During Childhood and Midlife Health JAMA Network Open Original Investigation July 29, 2024 This cohort study examines independent and additive associations of low childhood socioeconomic status and perceived stress in childhood with insulin resistance and epigenetic aging among women followed up from 10 to 40 years of age. Ryan L. Brown, PhD; Katie E. Alegria, PhD; Elissa Hamlat, PhD; A. Janet Tomiyama, PhD; Barbara Laraia, PhD; Eileen M. Crimmins, PhD; Terrie E. Moffitt, PhD; Elissa S. Epel, PhD
Epigenetic Aging and Racialized, Economic, and Environmental Injustice JAMA Network Open Original Investigation July 29, 2024 This cross-sectional study assesses whether socially structured adversity is associated with increased epigenetic accelerated aging among US-born Black non-Hispanic, Hispanic, and White non-Hispanic adults. Nancy Krieger, PhD; Christian Testa, BS; Jarvis T. Chen, ScD; Nykesha Johnson, MPH; Sarah Holmes Watkins, PhD; Matthew Suderman, PhD; Andrew J. Simpkin, PhD; Kate Tilling, BSc, MSc, PhD; Pamela D. Waterman, MPH; Brent A. Coull, PhD; Immaculata De Vivo, PhD; George Davey Smith, MA(Oxon), MD, BChir(Cantab), MSc(Lond); Ana V. Diez Roux, MD, PhD, MPH; Caroline Relton, PhD
Prenatal Maternal Occupation and Child Epigenetic Age Acceleration JAMA Network Open Original Investigation July 29, 2024 This cohort study of mother-infant pairs examines the association between prenatal maternal occupation and epigenetic aging among children in a Latino agricultural community in California. Saher Daredia, MPH; Anne K. Bozack, PhD; Corinne A. Riddell, PhD; Robert Gunier, PhD; Kim G. Harley, PhD; Asa Bradman, PhD; Brenda Eskenazi, PhD; Nina Holland, PhD; Julianna Deardorff, PhD; Andres Cardenas, PhD
Advancing Health Disparities Science Through Social Epigenomics Research JAMA Network Open Special Communication July 29, 2024 This special communication introduces the studies included in this special issue as part of the National Institutes of Health National Institute on Minority Health and Health Disparities Social Epigenomics Program. Arielle S. Gillman, PhD, MPH; Eliseo J. Pérez-Stable, MD; Rina Das, PhD

See More About

Customize your JAMA Network experience by selecting one or more topics from the list below.

Academic Medicine
Acid Base, Electrolytes, Fluids
Allergy and Clinical Immunology
American Indian or Alaska Natives
Anesthesiology
Anticoagulation
Art and Images in Psychiatry
Artificial Intelligence
Assisted Reproduction
Bleeding and Transfusion
Caring for the Critically Ill Patient
Challenges in Clinical Electrocardiography
Climate and Health
Climate Change
Clinical Challenge
Clinical Decision Support
Clinical Implications of Basic Neuroscience
Clinical Pharmacy and Pharmacology
Complementary and Alternative Medicine
Consensus Statements
Coronavirus (COVID-19)
Critical Care Medicine
Cultural Competency
Dental Medicine
Dermatology
Diabetes and Endocrinology
Diagnostic Test Interpretation
Drug Development
Electronic Health Records
Emergency Medicine
End of Life, Hospice, Palliative Care
Environmental Health
Equity, Diversity, and Inclusion
Facial Plastic Surgery
Gastroenterology and Hepatology
Genetics and Genomics
Genomics and Precision Health
Global Health
Guide to Statistics and Methods
Hair Disorders
Health Care Delivery Models
Health Care Economics, Insurance, Payment
Health Care Quality
Health Care Reform
Health Care Safety
Health Care Workforce
Health Disparities
Health Inequities
Health Policy
Health Systems Science
History of Medicine
Hypertension
Images in Neurology
Implementation Science
Infectious Diseases
Innovations in Health Care Delivery
JAMA Infographic
Law and Medicine
Leading Change
Less is More
LGBTQIA Medicine
Lifestyle Behaviors
Medical Coding
Medical Devices and Equipment
Medical Education
Medical Education and Training
Medical Journals and Publishing
Mobile Health and Telemedicine
Narrative Medicine
Neuroscience and Psychiatry
Notable Notes
Nutrition, Obesity, Exercise
Obstetrics and Gynecology
Occupational Health
Ophthalmology
Orthopedics
Otolaryngology
Pain Medicine
Palliative Care
Pathology and Laboratory Medicine
Patient Care
Patient Information
Performance Improvement
Performance Measures
Perioperative Care and Consultation
Pharmacoeconomics
Pharmacoepidemiology
Pharmacogenetics
Pharmacy and Clinical Pharmacology
Physical Medicine and Rehabilitation
Physical Therapy
Physician Leadership
Population Health
Primary Care
Professional Well-being
Professionalism
Psychiatry and Behavioral Health
Public Health
Pulmonary Medicine
Regulatory Agencies
Reproductive Health
Research, Methods, Statistics
Resuscitation
Rheumatology
Risk Management
Scientific Discovery and the Future of Medicine
Shared Decision Making and Communication
Sleep Medicine
Sports Medicine
Stem Cell Transplantation
Substance Use and Addiction Medicine
Surgical Innovation
Surgical Pearls
Teachable Moment
Technology and Finance
The Art of JAMA
The Arts and Medicine
The Rational Clinical Examination
Tobacco and e-Cigarettes
Translational Medicine
Trauma and Injury
Treatment Adherence
Ultrasonography
Users' Guide to the Medical Literature
Vaccination
Venous Thromboembolism
Veterans Health
Women's Health
Workflow and Process
Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

Download PDF
X Facebook More LinkedIn

Chiu DT , Hamlat EJ , Zhang J , Epel ES , Laraia BA. Essential Nutrients, Added Sugar Intake, and Epigenetic Age in Midlife Black and White Women : NIMHD Social Epigenomics Program . JAMA Netw Open. 2024;7(7):e2422749. doi:10.1001/jamanetworkopen.2024.22749

Manage citations:

Permissions

Essential Nutrients, Added Sugar Intake, and Epigenetic Age in Midlife Black and White Women : NIMHD Social Epigenomics Program

1 Community Health Sciences Division, School of Public Health, University of California, Berkeley
2 Osher Center for Integrative Health, University of California, San Francisco
3 Department of Psychiatry and Behavioral Sciences, University of California, San Francisco
4 Department of Human Genetics, University of California, Los Angeles
Original Investigation Sociodemographic and Lifestyle Factors and Epigenetic Aging in US Young Adults Kathleen Mullan Harris, PhD; Brandt Levitt, PhD; Lauren Gaydosh, PhD; Chantel Martin, PhD; Jess M. Meyer, PhD; Aura Ankita Mishra, PhD; Audrey L. Kelly, PhD; Allison E. Aiello, PhD JAMA Network Open
Original Investigation Telehealth Parenting Program and Epigenetic Biomarkers in Children With Developmental Delay Sarah M. Merrill, PhD; Christina Hogan, MS; Anne K. Bozack, PhD; Andres Cardenas, PhD; Jonathan S. Comer, PhD; Daniel M. Bagner, PhD; April Highlander, PhD; Justin Parent, PhD JAMA Network Open
Original Investigation Socioeconomic Status, Lifestyle, and DNA Methylation Age Alika K. Maunakea, PhD; Krit Phankitnirundorn, PhD; Rafael Peres, PhD; Christian Dye, PhD; Ruben Juarez, PhD; Catherine Walsh, PhD; Connor Slavens, BSc; S. Lani Park, PhD; Lynne R. Wilkens, DrPH; Loïc Le Marchand, MD, PhD JAMA Network Open
Original Investigation Epigenetic Age Acceleration and Disparities in Posttraumatic Stress in Women Alicia K. Smith, PhD; Seyma Katrinli, PhD; Dawayland O. Cobb, MS; Evan G. Goff, BS; Michael Simmond, BS; Grace M. Christensen, PhD, MPH; Tyler Prusisz, BS; Sierra N. Garth, MPH; Meghan Brashear, MPH; Anke Hüls, PhD, MSc; Erika J. Wolf, PhD; Edward J. Trapido, ScD; Ariane L. Rung, PhD, MPH; Nicole R. Nugent, PhD; Edward S. Peters, DMD, SM, ScD JAMA Network Open
Original Investigation Childhood Maltreatment and Longitudinal Epigenetic Aging Olivia D. Chang, MSW; Helen C. S. Meier, PhD; Kathryn Maguire-Jack, PhD; Pamela Davis-Kean, PhD; Colter Mitchell, PhD JAMA Network Open
Original Investigation Familial Loss of a Loved One and Biological Aging Allison E. Aiello, PhD, MS; Aura Ankita Mishra, PhD; Chantel L. Martin, PhD; Brandt Levitt, PhD; Lauren Gaydosh, PhD; Daniel W. Belsky, PhD; Robert A. Hummer, PhD; Debra J. Umberson, PhD; Kathleen Mullan Harris, PhD JAMA Network Open
Original Investigation Obesity and Early-Onset Breast Cancer in Black and White Women Sarabjeet Kour Sudan, PhD; Amod Sharma, PhD; Kunwar Somesh Vikramdeo, PhD; Wade Davis, BS; Sachin K. Deshmukh, PhD; Teja Poosarla, MD; Nicolette P. Holliday, MD; Pranitha Prodduturvar, MD; Cindy Nelson, BS; Karan P. Singh, PhD; Ajay P. Singh, PhD; Seema Singh, PhD JAMA Network Open
Original Investigation Psychosocial Disadvantage During Childhood and Midlife Health Ryan L. Brown, PhD; Katie E. Alegria, PhD; Elissa Hamlat, PhD; A. Janet Tomiyama, PhD; Barbara Laraia, PhD; Eileen M. Crimmins, PhD; Terrie E. Moffitt, PhD; Elissa S. Epel, PhD JAMA Network Open
Original Investigation Epigenetic Aging and Racialized, Economic, and Environmental Injustice Nancy Krieger, PhD; Christian Testa, BS; Jarvis T. Chen, ScD; Nykesha Johnson, MPH; Sarah Holmes Watkins, PhD; Matthew Suderman, PhD; Andrew J. Simpkin, PhD; Kate Tilling, BSc, MSc, PhD; Pamela D. Waterman, MPH; Brent A. Coull, PhD; Immaculata De Vivo, PhD; George Davey Smith, MA(Oxon), MD, BChir(Cantab), MSc(Lond); Ana V. Diez Roux, MD, PhD, MPH; Caroline Relton, PhD JAMA Network Open
Original Investigation Prenatal Maternal Occupation and Child Epigenetic Age Acceleration Saher Daredia, MPH; Anne K. Bozack, PhD; Corinne A. Riddell, PhD; Robert Gunier, PhD; Kim G. Harley, PhD; Asa Bradman, PhD; Brenda Eskenazi, PhD; Nina Holland, PhD; Julianna Deardorff, PhD; Andres Cardenas, PhD JAMA Network Open
Special Communication Advancing Health Disparities Science Through Social Epigenomics Research Arielle S. Gillman, PhD, MPH; Eliseo J. Pérez-Stable, MD; Rina Das, PhD JAMA Network Open

Question Are dietary patterns, including essential nutrients and added sugar intakes, and scores of nutrient indices associated with epigenetic aging?

Findings In this cross-sectional study of 342 Black and White women at midlife, higher added sugar intake was associated with older epigenetic age, whereas higher essential, pro-epigenetic nutrient intake and higher Alternate Mediterranean Diet (aMED) and Alternate Healthy Eating Index (AHEI)–2010 scores (reflecting dietary alignment with Mediterranean diet and chronic disease prevention guidelines, respectively) were associated with younger epigenetic age.

Meaning The findings of this study suggest a tandem importance in both optimizing nutrient intake and reducing added sugar intake for epigenetic health.

Importance Nutritive compounds play critical roles in DNA replication, maintenance, and repair, and also serve as antioxidant and anti-inflammatory agents. Sufficient dietary intakes support genomic stability and preserve health.

Objective To investigate the associations of dietary patterns, including intakes of essential nutrients and added sugar, and diet quality scores of established and new nutrient indices with epigenetic age in a diverse cohort of Black and White women at midlife.

Design, Setting, and Participants This cross-sectional study included analyses (2021-2023) of past women participants of the 1987-1997 National Heart, Lung, and Blood Institute Growth and Health Study (NGHS), which examined cardiovascular health in a community cohort of Black and White females aged between 9 and 19 years. Of these participants who were recruited between 2015 and 2019 from NGHS’s California site, 342 females had valid completed diet and epigenetic assessments. The data were analyzed from October 2021 to November 2023.

Exposure Diet quality scores of established nutrient indices (Alternate Mediterranean Diet [aMED], Alternate Healthy Eating Index [AHEI]–2010); scores for a novel, a priori–developed Epigenetic Nutrient Index [ENI]; and mean added sugar intake amounts were derived from 3-day food records.

Main Outcomes and Measures GrimAge2, a second-generation epigenetic clock marker, was calculated from salivary DNA. Hypotheses were formulated after data collection. Healthier diet indicators were hypothesized to be associated with younger epigenetic age.

Results A total of 342 women composed the analytic sample (mean [SD] age, 39.2 [1.1] years; 171 [50.0%] Black and 171 [50.0%] White participants). In fully adjusted models, aMED (β, −0.41; 95% CI, −0.69 to −0.13), AHEI-2010 (β, −0.05; 95% CI, −0.08 to −0.01), and ENI (β, −0.17; 95% CI, −0.29 to −0.06) scores, and added sugar intake (β, 0.02; 95% CI, 0.01-0.04) were each significantly associated with GrimAge2 in expected directions. In combined analyses, the aforementioned results with GrimAge2 were preserved with the association estimates for aMED and added sugar intake retaining their statistical significance.

Conclusions and Relevance In this cross-sectional study, independent associations were observed for both healthy diet and added sugar intake with epigenetic age. To our knowledge, these are among the first findings to demonstrate associations between added sugar intake and epigenetic aging using second-generation epigenetic clocks and one of the first to extend analyses to a diverse population of Black and White women at midlife. Promoting diets aligned with chronic disease prevention recommendations and replete with antioxidant or anti-inflammatory and pro-epigenetic health nutrients while emphasizing low added sugar consumption may support slower cellular aging relative to chronological age, although longitudinal analyses are needed.

Epigenetic clocks powerfully predict biological age independent of chronological age. These clocks reflect altered gene and protein expression patterns, particularly those resulting from differential DNA methylation (DNAm) at CpG (5′-C-phosphate-G-3′) sites. DNAm that accumulates over time is a testament to the toll social, behavioral, and environmental forces can have on the body. 1 - 3 These alterations often result in pathogenic processes (eg, genomic instability, systemic inflammation, and oxidative stress) characteristic of aging and chronic disease. 1 , 4 , 5 As such, myriad clocks reflecting epigenetic age have been developed for a range of age- or disease-related targets. 4 , 6 The GrimAge series contains second-generation markers of epigenetic aging that account for clinical and functional biomarkers, and is most notable for its robust associations with human mortality and morbidity risk, including time to death and comorbidity counts. 6 , 7 The recently developed version 2 of the GrimAge clock (hereafter, GrimAge2) improved on the first’s predictive abilities and confirmed its applicability for people at midlife and of different racial and ethnic backgrounds. 1 , 6

Epigenetic changes are modifiable and efforts to counter epigenetic alteration in humans have centered on lifestyle factors including diet, inspiring concepts of an “epigenetic diet” and “nutriepigenetics.” 8 , 9 So far, 2 epidemiological studies have found inverse associations between higher diet quality and slower epigenetic aging using clock measures related to mortality, including the first version of GrimAge. 7 , 10 In those studies, diet measures were reflective of healthy dietary patterns (eg, the Dietary Approaches to Stop Hypertension [DASH] diet, the Alternate Mediterranean Diet [aMED] score) emphasizing consumption of fruits, vegetables, whole grains, nuts and seeds, and legumes. 8 , 11 For example, the Mediterranean-style diet is largely plant-based with emphasis on extra virgin olive oil and seafood. This makes it replete with bioactive nutrients and phytotherapeutic compounds and low in highly processed, high fat, and nutrient-poor foods, a mixture hypothesized to be protective against low-grade chronic inflammation (“inflammaging”), oxidative stress, intracellular and extracellular waste accumulation, and disrupted intracellular signaling and protein-protein interactions. Thus, such a pattern is likely effective in preventing and reversing the epigenetic changes and pathogenic processes associated with aging, disease, and decline. 4 , 8 , 12 - 14

Dietary Reference Intakes (DRIs) are an established set of nutrient-specific reference values determined by experts that guide population intakes for adequacy and toxic effects. 15 Recent thinking, however, suggests that diets may not always adequately supply nutrients and other bioactives, particularly relative to the amounts necessary to fully condition gene expression or counteract epigenetic alterations to ensure optimal physiological metabolism. 8 Macronutrients and micronutrients play crucial roles in DNA replication, damage prevention, and repair, whereas nutrient deficiencies (and excesses) can cause genomic damage to the same degree as physical or chemical exposures. 16 Given that (1) progenome effects of some micronutrients have been observed at different and higher levels than the established DRIs and (2) determination of DRIs does not solely consider genomic stability (ie, lesser susceptibility to genomic alterations), experts have called for refining the DRIs to be better aligned for genomic health maintenance. 14 , 16 - 18 Diet quality inventories, such as those for Mediterranean-style diets, have not generally incorporated DRIs, although such considerations could clarify how food-based indices compare against requirements for related nutrients (eg, those with epigenetic properties) and refine epidemiological and intervention efforts. Accordingly, for this study, a novel nutrient index theoretically associated with epigenetic health was created and its associations with epigenetic aging were tested alongside established diet quality indices.

To date, nutriepigenetic work has mostly involved older White populations and focused on healthy dietary aspects. It is therefore important to examine the associations between nutrition and epigenetic aging in more diverse samples and to better understand what specific dietary aspects could be underlying the observed associations. Nutrients with established epigenetic action should be examined, especially considering intakes relative to amounts set forth in the DRIs and nutritional recommendations. Similarly, sugar is an established pro-inflammatory and oxidative agent that has been implicated in cancer as well as cardiometabolic diseases. 19 - 21 However, in diet quality indices often studied in the epigenetic context (eg, the aMED), sugar is noticeably unaccounted for, and it has also yet to be examined alone. Given the high consumption of sugar globally and the demographic variations within, 22 - 24 elucidating this association could motivate future dietary interventions and guidelines as well as health disparities research. This study sought to examine associations of diet with GrimAge2 in a midlife cohort comprising Black and White US women. The central hypothesis was that indicators of a healthier diet may be associated with decelerated epigenetic aging, and added sugar intake with accelerated aging.

This cross-sectional study used data from the original National Heart, Lung, and Blood Institute (NHLBI) Growth and Health Study (NGHS) (1987-1999) and its follow-up (2015-2019), which studied a cohort of Black and White females aged from 9 or 10 years into midlife (age 36-43 years), examining cardiometabolic health and related determinants. The participants were recruited based on biological female sex at age 9 or 10 years. The follow-up study re-recruited women from the California site. 25 , 26 Participants (and/or their parent[s] or guardian[s]) provided demographic data and completed online or paper surveys and new assessments. Participants received remuneration and provided informed consent. The institutional review board of the University of California, Berkeley, approved all study protocols. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology ( STROBE ) reporting guideline.

For inclusion in current analyses, the participants needed valid diet records and epigenetic data at midlife along with age and race and ethnicity information (participant self-reported); after excluding 5 women with epigenetic data quality issues, 342 individuals were included in the analytic sample. Complete case analyses were done. Among the 624 women who were followed up, the women composing the analytic sample were younger (39.2 years vs 39.9 years; P < .001) and had greater body mass index (BMI, calculated as weight in kilograms divided by height in meters squared) compared with women without complete diet and epigenetic data (32.5 vs 30.7; P = .02) ( Table 1 ). No differences were otherwise observed.

Participants provided saliva samples used for DNAm analyses performed by the University of California, Los Angeles Neuroscience Genomics Core (UNGC) of the Semel Institute for Neuroscience and Human Behavior using the Infinium HumanMethylation450 BeadChip platform (Illumina, Inc). DNAm profiles were generated by Horvath’s online calculator, 27 which provided (1) estimates of epigenetic age based on GrimAge2 estimation methods; and (2) assessments of data quality (again, 5 observations did not pass quality checks). GrimAge2 uses Cox proportional hazards regression models that regress time to death (due to all-cause mortality) on DNAm-based surrogates of plasma proteins, a DNAm-based estimator of smoking pack-years, age, and female sex. It was updated from GrimAge, version 1 6 by including 2 new DNAm-based estimators of plasma proteins—high-sensitivity C-reactive protein (logCRP) and hemoglobin A 1c (log A 1c )—beyond the original 7. Linear transformation of results from these models allows GrimAge2 to be taken as an epigenetic age estimate (in years). Further information can be accessed from studies on DNA treatment and isolation and advanced analysis options for generating output files 28 or GrimAge2. 1

The participants were instructed by the NGHS study staff to self-complete a 3-day food record at follow-up for 3 nonconsecutive days. 29 Data were entered into and analyzed by the Nutrition Data System for Research (NDSR) software, version 2018 (University of Minnesota Nutrition Coordinating Center).

Mean nutrient and food intakes were calculated across valid food records for each woman based on the NDSR 2018 output. These values were used to calculate the scores of 2 overall diet quality nutrient indices (aMED and the Alternate Healthy Eating Index [AHEI]–2010) and a novel index (Epigenetic Nutrient Index [ENI]) score as described below. The aMED (Mediterranean-style diet) followed published scoring methodology 30 reflecting the degree of adherence to 9 components of an anti-inflammatory, antioxidant-rich diet. The AHEI-2010 was assessed following published scoring instructions 31 and reflects the degree of adherence to 11 dietary components associated with decreased risk for chronic disease.

This study developed a novel nutrient index (ENI) after the Mediterranean-style diet, but via a nutrient-based approach rather than a food-based one. Nutrient selection was done a priori based on antioxidant and/or anti-inflammatory capacities as well as roles in DNA maintenance and repair documented in the literature. 16 , 32 , 33 Scores can range from 0 to 24, with higher scores reflecting higher DRI adherence ( Table 2 ). 34 The internal consistency of the ENI was acceptable (Cronbach α = 0.79). The ENI also demonstrated convergent validity with r = 0.51 ENI-aMED correlation as well as higher ENI scores in women from childhood households with higher annual incomes (13.9 vs 11.7, for ≥$40 000/y vs <$10 000/y, respectively) and parental educational attainment (14.7 vs 12.3, for ≥college graduate vs < high school graduate, respectively), corresponding to the literature. 36 Pearson correlations between the ENI and diet scores and added sugar intake were also calculated. The ENI score was moderately correlated with the AHEI-2010 score ( r = 0.44) but not correlated with added sugar intake. The aMED and AHEI-2010 scores were highly correlated at r = 0.73. Added sugar intake had moderate correlation with the AHEI-2010 score ( r = −0.44) and low correlation with the aMED score ( r = −0.28).

Added sugar intake was calculated as the mean across valid food records using NDSR output. The NDSR defines added sugar intake as the total sugar added to foods (eg, as syrups and sugars) during food preparation and commercial food processing. Monosaccharides and disaccharides naturally occurring in foods are not included. 35

To maximize internal validity and minimize confounding, several covariates were included. Age and sample batch were controlled for as well as naive CD8 and CD8pCD28nCD45Ran memory and effector T-cell counts, thus accounting for normal cell count variation. To control for baseline factors and their potential influence on diet and epigenetic age over time, the following parameters assessed at age 9 or 10 years (mostly parent or caregiver reported) were further adjusted for annual household income, highest parental educational attainment, number of parents in household, and number of siblings. Additionally, self-reported race (Black or White) as well as the current health and lifestyle factors of self-reported chronic conditions (yes to any of the following ever: cancer, diabetes [including gestational, prediabetes], hypertension, or hypercholesterolemia) or medication use (currently yes for any of the following conditions: diabetes, hypertension, hypercholesterolemia, or thyroid), BMI (measured), having ever smoked (yes or no), and mean daily total energy intake (as higher diet quality scores might result from higher energy intake) 37 were also included.

Descriptive analyses provided summary statistics. Linear regression models estimated unadjusted and adjusted cross-sectional associations between each of the 4 dietary exposures with GrimAge2. Per expert recommendations, unadjusted models controlled for women’s current age, sample batch, and both naive CD8 and CD8pCD28nCD45Ran memory and effector T-cell counts. Adjusted models controlled for those variables in addition to relevant sociodemographic and health behavior–related covariates already listed. To examine the association between healthy diet measures together with added sugar intake and GrimAge2, aMED, AHEI-2010, and ENI scores were each separately put into the same fully adjusted multivariable linear regression model. The threshold for statistical significance was 2-tailed (α = .05) and all statistical analyses were conducted from October 2021 to November 2023 with Stata15 SE, version 15.1 (StataCorp LLC).

The analytic sample of this study comprised 342 women (mean [SD] age at follow-up, 39.2 [1.1] years; 171 [50.0%] Black and 171 [50.0%] White participants; mean [SD] BMI, 32.5 [10.0]; 150 [43.9%] ever smokers; 164 [48.0%] ever diagnosed with a chronic condition; and 58 [17.0%] currently taking medication) ( Table 1 ). The participants were well distributed across socioeconomic status categories at baseline (9-10 years old). The participants presented with low to moderate levels of diet quality; the mean (SD) scores were 3.9 (1.9) (possible range, 0-9) on the anti-inflammatory, antioxidant Mediterranean-style pattern (aMED); 55.4 (14.7) (possible range, 0-110) on the AHEI-2010 for chronic disease risk; and 13.5 (5.0) (possible range, 0-24) on the ENI for intakes of epigenetic-relevant nutrients relative to DRIs. The participants also reported mean (SD) daily added sugar intake of 61.5 (44.6) g, although the score range was large (2.7-316.5 g).

Table 3 provides the overall unadjusted and adjusted associations between each dietary exposure of interest and GrimAge2 resulting from multivariable linear regression models. In both unadjusted and adjusted models, all dietary exposures were statistically and significantly associated with GrimAge2 in the hypothesized, anticipated direction. In adjusted models, the associations observed for each dietary exposure were slightly attenuated. Each unit increase in the scores was associated with year changes in GrimAge2, as follows: aMED (β, −0.41; 95% CI, −0.69 to −0.13), AHEI-2010 (β, −0.05; 95% CI, −0.08 to −0.01), and ENI (β, −0.17; 95% CI, −0.29 to −0.06), indicating that healthier diets were associated with decelerated epigenetic aging. Each gram increase in added sugar intake was associated with a 0.02 (95% CI, 0.01 to 0.04) increase in GrimAge2, reflecting accelerated epigenetic aging.

Table 4 illustrates the associations of healthy diet measures (aMED, AHEI-2010, and ENI scores) and added sugar intake with epigenetic aging and gives the adjusted results for each healthy diet measure and added sugar intake with GrimAge2 in the context of each other. In all instances, healthier diet measures and added sugar intake appeared to maintain their independent associations with GrimAge2 in the expected directions. Associations were statistically significant for added sugar intake in all models as well as for aMED scores; 95% CIs were more imprecise for AHEI-2010 and ENI scores.

The findings of this cross-sectional study are among the first, to our knowledge, to demonstrate the association of added sugar intake with an epigenetic clock. Further, to our knowledge, it is the first study to examine the associations of diet with GrimAge2 and extend the applicability of such results to a cohort of Black and White women at midlife. As hypothesized, measures of healthy dietary patterns (aMED, AHEI-2010 scores), and high intakes of nutrients theoretically related to epigenetics (ENI) were associated with younger epigenetic age, while a higher intake of added sugar was associated with older epigenetic age. Additionally, this study examined indicators of healthy and less healthy diets in the same model, allowing simultaneous evaluation of each in the presence of the other. Although the magnitudes of associations were diminished and some 95% CIs became wider, their statistical significance generally persisted, supporting the existence of independent epigenetic associations of both healthy and less healthy diet measures. This approach is informative, as dietary components are often examined singularly or in indices, which can lead to erroneous conclusions if key contextual dietary components are not accounted for or are obscured. From these findings, even in healthy dietary contexts, added sugar still has detrimental associations with epigenetic age. Similarly, despite higher added sugar intake, healthier dietary intakes appear to remain generally associated with younger epigenetic age.

The number of published nutriepigenetic studies, particularly on examining second-generation epigenetic clock markers, is still relatively small. However, the results of the present study are consistent with the literature. Two other studies 7 , 10 have examined GrimAge1-associated outcomes and found higher diet quality scores, including the DASH and aMED, were associated with slower epigenetic aging. However, those studies were limited to older (>50 years) and White populations, limiting their demographic generalizability. Analyses of epigenetic aging and added sugar intake are new, but findings are consistent with the larger body of epidemiological work that has drawn connections between added sugar intake and cardiometabolic disease, 19 , 20 perhaps suggesting a potential mechanism underlying such observations. Granted, point and 95% CI estimates for the added sugar–GrimAge2 associations were close to zero, suggesting a smaller role for added sugar compared with healthy dietary measures; however, more studies are needed. Nevertheless, their statistical significance was persistent.

Nutrient-based inventories can provide epidemiological contributions for genomic health studies. The idea of epigenetically critical nutrients is important for 2 reasons. First, it supports the notion that epigenetic nutrient intakes above DRI levels could boost epigenetic preservation and potentially motivate updates to nutritional guidelines, an outcome advocated for by nutriepigenetic experts. 16 - 18 In the novel ENI constructed for the present study, points were awarded based on comparisons of average daily intakes with: (1) estimated average requirements, or the requirement considered adequate for half of the healthy individuals in a population, and (2) recommended dietary allowances or adequate intakes, or where 97% to 98% or essentially all of a population’s healthy individuals’ requirements for a nutrient are met. 15 Future iterations could test varying ENI scoring parameters relative to DRIs for epigenetic benefit. Second, taking a nutrient approach suggests that any dietary pattern rich in vitamins, minerals, and other bioactives could be useful for preserving epigenetic health. This is helpful because dietary patterns are socioculturally influenced, but a nutrient focus rather than a focus on foods could help bridge cultures, class, and geography. 9 The Okinawan diet, for example, is nutritionally similar to the Mediterranean-style diet but more aligned to Asian tastes. 38 In general, the sociodemographic determinants of diet should not be discounted. Across the US population, for instance, it is known that overall diet quality is mediocre and relatively low while added sugar intake is considerably high, as also observed in the sample of the present study. However, specific nutrient intakes will vary based on the particulars of dietary patterns. 22 , 36 As dietetics and medicine progresses into the era of personalized nutrition and personalized medicine, the role of social factors including diet will be important to consider in epigenetic studies and could figure prominently in work on health disparities.

Strengths of this study are its inclusion of a diverse group of women as well as use of robust measures of diet and DNAm. It was also possible to control for several potential sociodemographic confounders.

This study also has limitations. As a cross-sectional study, it is not possible to infer causality without temporality, and therefore longitudinal studies are needed. Additionally, diet was self-reported via 3-day food records, which may lead to underestimates and overestimates of intakes depending on the nutrient. Therefore, augmenting dietary assessment with food frequency questionnaires and/or biomarkers could be helpful. 39 Also, other nutrients with pro-epigenetic properties were not included in the current ENI. Still, the Cronbach α for this first ENI version was acceptable at 0.79 and it demonstrated good convergent validity with customary socioeconomic and demographic characteristics. The tolerable upper intake levels of the DRIs were not considered in constructing the ENI. Future work should assess the prevalence of intakes beyond upper limits to assess whether toxicity could be a concern.

To our knowledge, the findings of this cross-sectional study are among the first to find associations between indicators of healthy diet as well as added sugar intake and second-generation epigenetic aging markers and one of the first to include a cohort of Black women. Higher diet quality and higher consumption of antioxidants or anti-inflammatory nutrients were associated with younger epigenetic age, whereas higher consumption of added sugar was associated with older epigenetic age. Promotion of healthy diets aligned with chronic disease prevention and decreased added sugar consumption may support slower cellular aging relative to chronological age, although longitudinal analyses are needed.

Accepted for Publication: April 29, 2024.

Published: July 29, 2024. doi:10.1001/jamanetworkopen.2024.22749

Corresponding Author: Dorothy T. Chiu, PhD, Osher Center for Integrative Health, University of California, San Francisco, 1545 Divisadero St, #301D, San Francisco, CA 94115 ( [email protected] ); Barbara A. Laraia, PhD, MPH, RD, Community Health Sciences Division, School of Public Health, University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA 94720 ( [email protected] ).

Author Contributions: Drs Chiu and Laraia had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Epel and Laraia share co–senior authorship on this article.

Concept and design: Chiu, Hamlat, Epel, Laraia.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Chiu, Hamlat, Laraia.

Critical review of the manuscript for important intellectual content: Hamlat, Zhang, Epel, Laraia.

Statistical analysis: Chiu, Hamlat, Zhang.

Obtained funding: Epel, Laraia.

Administrative, technical, or material support: Chiu, Laraia.

Supervision: Epel, Laraia.

Conflict of Interest Disclosures: Dr Chiu reported receiving support from grants from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD); the National Heart, Lung, and Blood Institute (NHLBI); the National Institute on Aging (NIA); and the National Center for Complementary and Integrative Health (NCCIH) during the conduct of the study. Dr Hamlat reported receiving grants from the National Institutes of Health (NIH) during the conduct of the study. Dr Laraia reported receiving grants from NIH NICHD during the conduct of the study. No other disclosures were reported.

Funding/Support: The research reported in this publication was supported by grant R01HD073568 from the Eunice Kennedy Shriver NICHD (Drs Laraia and Epel, principal investigators [PIs]); grant R56HL141878 from the NHLBI; and grants R56AG059677 and R01AG059677 from the NIA (both for Drs Epel and Laraia, PIs). The participation of Dr Chiu was supported by the University of California, San Francisco Osher Center research training fellowship program under grant T32AT003997 from NCCIH.

Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data Sharing Statement: See the Supplement .

Additional Contributions: We recognize the past and present NHLBI Growth and Health Study (NGHS) staff for their talents and dedication, without which the study and these analyses would not have been possible. We also thank the Nutrition Policy Institute for providing consultation and support with historical study data. Additionally, we express immense gratitude to Ake T. Lu, PhD, and Steve Horvath, PhD, now of Altos Labs, for their epigenetic clock expertise and consultation. Neither was financially compensated for their contributions beyond their usual salary. Of note, we thank the NGHS participants for their time and efforts over the years.

Register for email alerts with links to free full-text articles
Access PDFs of free articles
Manage your interests
Save searches and receive search alerts

A visual approach to tracking emotional sentiment dynamics in social network commentaries

Original Article
Published: 05 September 2024
Volume 14 , article number 182 , ( 2024 )

Cite this article

Ismail Hossain 1 ,
Sai Puppala 2 na1 ,
Md. Jahangir Alam 1 &
Sajedul Talukder 1 na1

17 Accesses

Explore all metrics

The expansion of social media has unlocked a real-time barometer of public opinion. This paper introduces a novel framework to analyze sentiment shifts in social network comment sections, a reflection of the broader public discourse over time. Leveraging a pre-trained uncased $RoBERTa_{large}$ model, we predict emotional scores from user comments, mapping these to key sentiment trends such as Approval, Toxicity, Obscenity, Threat, Hate, Offensive, and Neutral. Our methodology employs machine learning techniques to train a dataset that connects emotional scores with these trends, generating trend probability scores. We utilize a bottom-up recursive algorithm to aggregate emotional scores within comment threads, enabling the prediction of trend scores using three distinct aggregation methods. The results demonstrate that our emotional prediction model achieves an AUC of 0.92, and XGBoost stands out with an F1 score exceeding 0.40. Our research elucidates the temporal evolution of online public sentiment, enhancing the understanding of digital social dynamics and offering insights for strategic online interaction, intervention, and content moderation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Sentiment Analysis in Social Media

Kevin Kwan’s Crazy Rich Asians: Opinion Mining and Emotion Detection on Fans’ Comments on Social Media

Opinion Mining and Sentiment Analysis in Social Media: Challenges and Applications

Explore related subjects.

Artificial Intelligence

Data availability

No datasets were generated or analysed during the current study.

Anusha PV, Anuradha C, Murty PSC, Kiran CS (2019) Detecting outliers in high dimensional data sets using z-score methodology. Int J Innovat Technol Explor Eng 9(1):48–53

Article Google Scholar

Atagün E, Hartoka B, Albayrak A (2021) Topic modeling using LDA and bert techniques: Teknofest example. In: 2021 6th International conference on computer science and engineering (UBMK), pp 660–664. IEEE

Backstrom L, Kleinberg J, Lee L, Danescu-Niculescu-Mizil C (2018) Characterizing and curating conversation threads: expansion, focus, volume, re-entry

Blackburn J, Kwak H (2014) STFU NOOB! Predicting crowdsourced decisions on toxic behavior in online games

Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8

Bollen J, Mao H, Pepe A (2011) Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: Proceedings of the international AAAI conference on web and social media, vol 5, pp 450–453

Chang JS, Danescu-Niculescu-Mizil C (2019) Trouble on the Horizon: forecasting the derailment of online conversations as they develop. https://doi.org/10.48550/ARXIV.1909.01362

cjadams J.E.L.D.M.M.n.W.C. Jeffrey Sorensen: toxic comment classification challenge. Kaggle. (2017) https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge

Coletto M, Garimella K, Gionis A, Lucchese C (2017) Automatic controversy detection in social media: a content-independent motif-based approach. Online Social Network Media. https://doi.org/10.1016/J.OSNEM.2017.10.001

Dash CSK, Behera AK, Dehuri S, Ghosh A (2023) An outliers detection and elimination framework in classification task of data mining. Decision Anal J 6:100164

Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language

Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th international AAAI conference on Web and Social Media. ICWSM ’17, pp 512–515

Demszky D, Movshovitz-Attias D, Ko J, Cowen A, Nemade G, Ravi S (2020) Goemotions: a dataset of fine-grained emotions. arXiv preprint arXiv:2005.00547

FasterCaptial S (2017) Z-Scores and their significance. Figshare. Dataset

Fortuna P, Nunes S (2018) A survey on automatic detection of hate speech in text. https://doi.org/10.1145/3232676

Founta A-M, Djouvas C, Chatzakou D, Leontiadis I, Blackburn J, Stringhini G, Vakali A, Sirivianos M, Kourtellis N (2018) Large scale crowdsourcing and characterization of Twitter abusive behavior

General Data Protection Regulation (GDPR). (2021) https://gdpr-info.eu/ . Accessed 12 Feb 2021

Guide to Protecting the Confidentiality of Personally Identifiable Information (PII). (2021) https://tinyurl.com/ylyjst5y . Accessed 12 Feb 2021

Hessel J, Lee L (2019) Something’s Brewing! Early prediction of controversy-causing posts from discussion features. https://doi.org/10.18653/V1/N19-1166

Hossain I, Puppala S, Alam MJ, Talukder S (2023) Monitoring dynamics of emotional sentiment in social network commentaries

JCharisTech Neattext: a python library for cleaning and pre-processing textual data. https://blog.jcharistech.com/neattext/ . Accessed 1 Jan 2024

Jigsaw Alphabet Inc.: Perspective API Research. https://perspectiveapi.com/research/ . Accessed 1 Jan 2024

Jurgens D, Hemphill L, Chandrasekharan E (2019) A just and comprehensive strategy for using NLP to address online abuse. https://doi.org/10.18653/V1/P19-1357

Kumari HV, Suresh D, Dhananjaya P (2022) Clinical data analysis and multilabel classification for prediction of dengue fever by tuning hyperparameter using gridsearchcv. In: 2022 14th International conference on computational intelligence and communication networks (CICN), pp 302–307. IEEE

Lee SY, Ryu MH (2019) Exploring characteristics of online news comments and commenters with machine learning approaches. Telemat Inform 43:101249

Mathew B, Saha P, Yimam SM, Biemann C, Goyal P, Mukherjee A (2021) Hatexplain: a benchmark dataset for explainable hate speech detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 14867–14875

Mohammad SM, Kiritchenko S (2018) Understanding emotions: a dataset of tweets to study interactions between affect categories. In: International conference on language resources and evaluation

Mondal M, Silva LA, Benevenuto F (2017) A measurement study of hate speech in social media. https://doi.org/10.1145/3078714.3078723

Mosbach M, Andriushchenko M, Klakow D (2020) On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines. arXiv preprint arXiv:2006.04884

Niculae V, Danescu-Niculescu-Mizil C (2016) Conversational markers of constructive discussions

Oh YW, Park CH (2021) Machine cleaning of online opinion spam: developing a machine-learning algorithm for detecting deceptive comments. Am Behav Sci 65(2):389–403

Pennycook G, Bear A, Collins ET, Rand DG (2020) The implied truth effect: attaching warnings to a subset of fake news headlines increases perceived accuracy of headlines without warnings. Manag Sci 66(11):4944–4957

Pota M, Ventura M, Fujita H, Esposito M (2021) Multilingual evaluation of pre-processing for bert-based sentiment analysis of tweets. Exp Syst Appl 181:115119

Python Package Index: Neattext—text pre-processing and cleaning in python. https://pypi.org/project/neattext/ . Accessed 1 Jan 2024

Röttger P, Vidgen B, Nguyen D, Waseem Z, Margetts H, Pierrehumbert JB (2020) Hatecheck: functional tests for hate speech detection models. arXiv preprint arXiv:2012.15606

Saveski M, Roy B, Roy D (2021) The structure of toxic conversations on twitter. In: Proceedings of the web conference 2021, pp 1086–1097

Seo S (2006) A review and comparison of methods for detecting outliers in univariate data sets. Ph.D. thesis, University of Pittsburgh

Sharma HK, Singh T, Kshitiz K, Singh H, Kukreja P (2017) Detecting hate speech and insults on social commentary using NLP and machine learning. Int J Eng Technol Sci Res 4(12):279–285

Google Scholar

Shugars S, Beauchamp N (2019) Why keep arguing? predicting engagement in political conversations online:. SAGE Open https://doi.org/10.1177/2158244019828850

Talukder Z, Islam MA (2022) Computationally efficient auto-weighted aggregation for heterogeneous federated learning. In: 2022 IEEE international conference on edge computing and communications (EDGE), pp 12–22. IEEE

Vidhya A (2021) Cleaning and pre-processing textual data with Neattext library. https://www.analyticsvidhya.com/blog/2021/10/cleaning-and-pre-processing-textual-data-with-neattext-library/ . Accessed 1 Jan 2024

Wang L, Cardie C (2016) A piece of my mind: a sentiment analysis approach for online dispute detection

Wulczyn E, Thain N, Dixon L (2017) Ex Machina: Personal attacks seen at scale. https://doi.org/10.1145/3038912.3052591

Wulczyn E, Thain N, Dixon L (2017) Ex machina: personal attacks seen at scale. In: Proceedings of the 26th international conference on World Wide Web, pp 1391–1399

Yao M, Chelmis C, Zois D-S (2019) Cyberbullying ends here: towards robust detection of cyberbullying in social. Media doi. https://doi.org/10.1145/3308558.3313462

Zhang J, Chang J, Danescu-Niculescu-Mizil C, Dixon L, Hua Y, Thain N, Taraborelli D (2018) Conversations gone awry: detecting early signs of conversational failure

Zhang J, Danescu-Niculescu-Mizil C, Sauper C, Taylor SJ (2018) Characterizing online public discussions through patterns of participant interactions. https://doi.org/10.1145/3274467

Zhang T, Wu F, Katiyar A, Weinberger KQ, Artzi Y (2020) Revisiting few-sample bert fine-tuning. arXiv preprint arXiv:2006.05987

Zhao F, Li X, Gao Y, Li Y, Feng Z, Zhang C (2022) Multi-layer features ablation of bert model and its application in stock trend prediction. Exp Syst Appl 207:117958

Download references

Acknowledgements

This research was supported by NSF Grant CNS-2153482.

Author information

All authors have contributed equally to this work.

Authors and Affiliations

Department of Computer Science, The University of Texas at El Paso, 1801 Hawthorne St., El Paso, TX, 79902, USA

Ismail Hossain, Md. Jahangir Alam & Sajedul Talukder

School of Computing, Southern Illinois University Carbondale, 1230 Lincoln Dr., Carbondale, IL, 62901, USA

Sai Puppala

You can also search for this author in PubMed Google Scholar

Contributions

This research represents a collaborative effort where each author has significantly contributed to the development and execution of the work presented: Ismail Hossain (I.H.) and Sai Puppala (S.P.): These authors contributed equally to this work. I.H. and S.P. were instrumental in the conceptualization and design of the study. They focused on the development of the methodology and played a leading role in the analysis of emotional sentiment dynamics within social network commentaries. Both authors also contributed to the writing and editing of the manuscript, ensuring the clarity and coherence of the presentation. Md Jahangir Alam (M.J.A.): Contributed to both the data collection and the development of the analytical framework for sentiment analysis. M.J.A. was heavily involved in the preprocessing of data and conducted extensive tests on new datasets, contributing to the substantial expansion of the research findings. He also assisted in drafting and revising the manuscript, providing critical insights into the interpretation of results. Sajedul Talukder (S.T.): As the corresponding author, S.T. oversaw the entire project, ensuring the research aligned with the objectives and the manuscript met publication standards. He was responsible for project coordination, acquisition of funding, and provided guidance on the overall research direction. S.T. contributed to the refinement of the research methodology, analysis of results, and played a pivotal role in manuscript revision, focusing on the integration of feedback and enhancement of the manuscript's overall quality. All authors have reviewed the manuscript, contributed to its critical revision for important intellectual content, and approved the final version to be published. Each author agrees to be accountable for all aspects of the work, ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Sajedul Talukder .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Hossain, I., Puppala, S., Alam, M.J. et al. A visual approach to tracking emotional sentiment dynamics in social network commentaries. Soc. Netw. Anal. Min. 14 , 182 (2024). https://doi.org/10.1007/s13278-024-01332-8

Download citation

Received : 01 March 2024

Revised : 01 August 2024

Accepted : 05 August 2024

Published : 05 September 2024

DOI : https://doi.org/10.1007/s13278-024-01332-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Find a journal
Publish with us
Track your research

Project 2025 partner president makes misleading claims about IVF and hospital treatment of women dealing with miscarriages

Written by Media Matters Staff

Published 09/13/24 12:57 PM EDT

Two claims from Family Research Council President Tony Perkins in this clip are misleading.

First, on IVF, Perkins denies that women have been forced “to travel out of state for in vitro fertilization treatments.” There have been reports to the contrary after Alabama's Supreme Court ruling that imperiled IVF. From CNN :

Yesterday, Goidel was days away from having her eggs retrieved at an Alabama fertility clinic, after three miscarriages and more than a $20,000 investment in a grueling in vitro fertilization journey. Now, she and her husband are packing for a flight to Texas tonight, in hopes of salvaging their shot at a successful pregnancy. After the Alabama Supreme Court ruled last week that frozen embryos are considered human beings and those who destroy them can be held liable for wrongful death, fertility clinics throughout the state began pausing IVF treatments out of fear of legal prosecution. Goidel said her provider, Alabama Fertility Specialists, called her Thursday morning and told her because she is so far along in the IVF process, the clinic was still willing to retrieve her eggs – but could not make any guarantees about whether they would be able to use them to make embryos, store or ship them.

At least one IVF clinic in Alabama has announced plans to shutter due to litigation concerns.

Second, Perkins claims that he is not aware that “that pro-life laws in various states are prohibiting doctors for treating women who show up at a hospital because of a miscarriage.”

Perkins should read more news articles, because this has been widely reported. From The Associated Press :

More than 100 pregnant women in medical distress who sought help from emergency rooms were turned away or negligently treated since 2022, an Associated Press analysis of federal hospital investigations found. Two women — one in Florida and one in Texas — were left to miscarry in public restrooms. In Arkansas, a woman went into septic shock and her fetus died after an emergency room sent her home. At least four other women with ectopic pregnancies had trouble getting treatment, including one in California who needed a blood transfusion after she sat for nine hours in an emergency waiting room. ... In Texas, where doctors face up to 99 years of prison if convicted of performing an illegal abortion, medical and legal experts say the law is complicating decision-making around emergency pregnancy care. Although the state law says termination of ectopic pregnancies isn’t considered abortion, the draconian penalties scare Texas doctors from treating those patients, the Center for Reproductive Rights argues. “As fearful as hospitals and doctors are of running afoul of these state abortion bans, they also need to be concerned about running afoul of federal law,” said Marc Hearron, a center attorney. Hospitals face a federal investigation, hefty penalties and threats to their Medicare funding if they violate the federal law.

The Family Research Council is a Project 2025 partner .

Citation From the September 12, 2024, edition of Family Research Council’s Washington Watch

TONY PERKINS (HOST): In Tuesday night's presidential debate, Vice President Kamala Harris gave a rambling and incoherent statement that seemed to imply the Supreme Court's overturning of Roe v. Wade had forced women to travel out of state for in vitro fertilization treatments.

PERKINS: It remains unclear what the vice president meant by that political word salad. I mean, IVF is legal in all 50 states, and I'm not sure where the plane and the strangers came in.

Well, joining me now to discuss this is Dr. Marguerite Duane. She is a board-certified family physician and the executive director of FACTS, the Fertility Appreciation Collaborative to Teach — Collaboration to Teach the Science, an organization dedicated to educating health care professionals and students about the scientifically valid natural-based family planning methods. Dr. Duane, welcome to Washington Watch .

MARGUERITE DUANE: Thank you so much for having me.

PERKINS: Now let me ask you, are you aware of any place in the country where IVF is not allowed?

DUANE: No. I'm not aware that IVF is not available throughout the country. My understanding, again, I'm not a lawyer. I'm a physician. But to my knowledge, IVF is legal. But where women do need to travel extensively is to seek physicians who are trained to provide comprehensive reproductive health care through a restorative lens, one that's really designed to treat the underlying causes of infertility.

And in fact, I've had patients drive four to six hours to see me to receive the kind of care that I'm trained to provide, and we currently train physicians across the country to provide. Again, care that is real women's health care that seeks to identify underlying causes of infertility and treat those through a restorative reproductive approach.

PERKINS: And is also respectful of human life.

PERKINS: You mentioned the miscarriages and the vice president, her comments there suggesting that pro-life laws in various states are prohibiting doctors for treating women who show up at a hospital because of a miscarriage. Again, I'm not aware of that.

DUANE: And it's simply not true. And I can tell you, as a physician who cares for patients who regularly experience miscarriage, we are trained to provide both medical and surgical treatments to treat miscarriage. Now the difference between a miscarriage and an abortion is with miscarriage, the embryo has already passed. The heart has stopped beating. The child is no longer alive.

Things you buy through our links may earn Vox Media a commission.

The Debate Reviews Say Trump Lost, Harris Won: How It Happened

Fox News and MSNBC are united after the debate: Donald Trump lost and Kamala Harris won. Stretching well past the scheduled 90 minutes, the first and possibly only debate between the candidates, which aired on ABC, featured the Republican candidate almost immediately straying from his campaign’s playbook for the face-off, getting angry and increasingly personal — twice even hushing the first woman to serve as vice-president. If that weren’t enough, he repeated a bizarre lie about migrants eating pets in Ohio, among numerous other tangents. Democrats are ecstatic about Harris’s performance, while Republicans were left wincing at their man’s defensive posture and blaming the moderators for being too hard on him.

All our debate coverage

• Gabriel Debenedetti on the success of Kamala Harris’s debate strategy . • Nia Prater on whether there will be a second debate . • Jonathan Chait on how Trump was sabotaged by the online right . • Photos and anonymous overheard comments from the New York Young Republican Club’s debate watch party. • The Cut’s Laura Bassett on how Harris out-alphaed Trump . • Jonathan Chait on the contrast Harris was able to draw with her Trump-baiting. • Margaret Hartmann on Trump’s pet-eating tangent . • Ed Kilgore on Trump’s torrent of denials .

Below is a reverse chronological account of what happened as it happened, including commentary and analysis from the entire Intelligencer team.

Placeholder icon for Gabriel Debenedetti

How Harris flummoxed Trump

From my new report on how the Harris team’s debate strategy played out:

It took only a few minutes for Trump to grow flustered by Harris’s reference to a negative analysis of his economic plans by professors at Penn’s Wharton School, his alma mater. Minutes later, she directly quoted a tweet of his praising Chinese leader Xi Jinping over Beijing’s handling of COVID, and he once again spluttered. Soon after, he mixed up Virginia for West Virginia when he went on a tirade about Democrats and “after birth” abortion. He also praised the “genius and heart and strength” of the six conservative Supreme Court justices who overturned Roe v. Wade —a historically unpopular move. As Trump refused to make eye contact with Harris and grimaced into his notes, I was reminded of what Celinda Lake, a senior Democratic pollster who works with the Harris campaign, told me a few hours earlier: Research shows that 70 percent of what voters take away from debates is the theater aspect, only 30 percent is the actual policy difference. Harris’s campaign relished the chance to throw Trump off his game after he won the first debate against Biden by simply letting his opponent expose himself as just too old for the job. This time, the Harris team ran an ad on Fox News and stationed billboards around the city taunting Trump about his smaller crowd sizes, an obsession of his that voters find childish. When Trump accused her of busing in paid crowds to her own events, Harris looked like she almost couldn’t believe he took the bait instead of responding to her claim that he doesn’t care about everyday voters. She laughed as Trump insisted that undocumented migrants were eating family pets in Ohio, a far-right conspiracy that took his focus far from his straightforward attempts to blame her for the migrant surge at the southern border. One top Democratic operative, who’d been basically comatose at that point early in the first debate, started texting me “YES” “YES” “YES” every few moments as Trump preached to the far-right corners of X more than persuadable voters in swing states.

You can read the rest here .

The pet-eating memes continue

A couple more indications of a Harris victory

To be clear, these results should be taken with a grain of salt, but:

And this isn’t a poll, but still interesting:

If there’s going to be another debate, it could only be on Fox News

Which begs the question:

Harris’s Trump-bait helped draw out an important contrast

As I argue in my review of the debate , Harris “baited Donald Trump into losing his temper, then used the visual contrast between them to establish herself as not only a plausible president but the only plausible president onstage.” Also:

The clearest success Harris registered was in performing the role of president. She repeatedly touted her economic plan, rebutting the charge she lacks ideas, which is intended to present her as a lightweight. She also did this by citing her foreign-policy experience, meeting with Volodymyr Zelenskyy and organizing a NATO response to Russia’s invasion. The importance of these validators might be overlooked, but many Americans have old-fashioned views of presidential qualifications, associating it with masculinity. Most important, she established herself as presidential by appearing calm and confident, in vivid contrast to the bellowing lunatic on the stage beside her.

Trump’s big debate denialathon

From my new take on Trump’s bad night:

Trump has a devoted following of people who believe his revisionist take on reality, who don’t accept the experts or the statistics or logic or the evidence of their own eyes and ears. It’s hard to imagine, however, that many persuadable people watching this debate will find it so easy to accept that Trump is right about everything and everyone else is wrong. To the extent that Trump made his war on reality so sweeping and absolute and furious on the stage in Philadelphia, he lost not just the debate but his grip.

Read the rest here .

But will this matter to swing voters?

Doing your own stint in the spin room is rarely a good sign, and now taylor swift has endorsed kamala harris.

From her cat-featuring announcement post on Instagram following the debate, which Swift said she watched:

Recently I was made aware that AI of ‘me’ falsely endorsing Donald Trump’s presidential run was posted to his site. It really conjured up my fears around AI, and the dangers of spreading misinformation. It brought me to the conclusion that I need to be very transparent about my actual plans for this election as a voter. The simplest way to combat misinformation is with the truth. I will be casting my vote for Kamala Harris and Tim Walz in the 2024 Presidential Election. I’m voting for @kamalaharris because she fights for the rights and causes I believe need a warrior to champion them. I think she is a steady-handed, gifted leader and I believe we can accomplish so much more in this country if we are led by calm and not chaos. I was so heartened and impressed by her selection of running mate @timwalz , who has been standing up for LGBTQ+ rights, IVF, and a woman’s right to her own body for decades.

Lindsey Graham wants to fire … who, exactly?

Trump: ‘my best debate, ever’.

That’s what he claimed in a post-debate Truth Social message (while also attacking Harris and the moderators):

I thought that was my best Debate, EVER, especially since it was THREE ON ONE!

Post-debate on Fox News

Harris already wants a rematch

Minutes after the first presidential debate between Trump and Harris ended, Harris campaign chair Jen O’Malley Dillon issued a statement claiming victory and extending an offer for a second matchup between the two candidates. “Vice President Harris is ready for a second debate. Is Donald Trump?” she wrote.

The Washington Post reports that Trump’s team appears game, for now, with campaign adviser Chris LaCivita saying of the request, “Of course. They need clean up.”

Meanwhile, during those paid messages

Reports pooler Sara Cook on the second commercial break:

The second the stage hand said they were clear for a 4 minute break, Trump turned towards the exit, gave a big sigh through closed lips, and walked off stage without looking at Harris. From the time the moderators announced they were going to break, Harris began writing on her notepad. She wrote continuously for the entire first two minutes of the break, occasionally bringing one hand to her chin or brushing hair behind her ear. She then reviewed what she wrote for the next minute, making a few tweaks, before putting the pen down and looking out around the room with her hands folded in front of her. She took a sip of water from a glass placed under the lectern. Trump walked back onstage 30 seconds before the end of break. He did not look at Harris, she did not look at him. Harris made small adjustments to her collar. Both candidates looked straight ahead until the program restarted. Again, no words were spoken.

Their closing statements

As it was throughout the debate, a clear contrast:

What is a debate win worth?

Julia Ioffe notes wryly that Hillary Clinton won her debates against Donald Trump, too. It’s an important reminder that what liberals and media critics consider a successful performance is not necessarily going to be persuasive to the small segment of swing voters that need to be persuaded in this election.

On health care, Trump has “concepts of a plan”

Donald Trump has promised a big beautiful Obamacare replacement for almost a decade without ever really articulating what it would look like. (It was even unclear when Republicans came close to repealing the Affordable Care Act in 2017.) But never fear: He finally unveiled some specifics on Tuesday night:

Concepts of a plan??

“I have concepts of a plan,” won’t rescue Trump from a disastrous performance, but to my weary brain, it’s gold. Not only does it sound like the title of a forgotten shoegaze album; it’s emblematic of Trump himself. He has nothing, really, just bigotry and a handful of vague positions. He specializes in vibes, and bad ones at that. That line was one of his more honest moments. It’ll rattle around in my mind palace for weeks to come.

Harris owns a gun?

After Trump accused Democrats for wanting to take away everyone’s firearms, Harris said something surprising: She is a gun owner. It’s not news, however. In 2019, according to CNN , her presidential campaign at the time said she purchased a handgun for personal protection and keeps it locked in a safe.

Trump doubles down on Harris race comments

When Trump was asked about his past comments on Harris’s race, he started out by saying that he doesn’t care at all about how she identifies. And then he doubled down.

“All I can say is I read where she was not Black, that she put out — I’ll say that. And then I read that she was Black and that’s okay,” he said of Harris, who is Black and Indian. “Either one was okay with me. That’s up to her.”

In response, Harris raised Trump’s past examples of racism, spending a significant amount of time on his treatment of the Central Park 5, who were heavily featured at the DNC last month. “It’s a tragedy that we have someone who wants to be president who has consistently used race to try to divide us,” she said.

Yusef Salaam, a New York City councilman and member of the Central Park 5, is expected to be in the spin room after the debate.

Trump is getting more mic time

And considering how he’s using that time, Harris is probably fine with that.

Harris uses Ukraine question to establish her bona fides

Kamala Harris’s response to Donald Trump on the Russia-Ukraine war is not focused on attacking Trump. Instead, she uses it to recount her foreign-policy work, meeting with Zelenskyy and NATO.

One of her most important obstacles to overcome is still that many voters question whether she, or any woman, is strong enough to serve as commander-in-chief.

Harris is looking at Trump, but he is not reciprocating

Harris is taking trump seriously.

I am surprised she’s not being more dismissive in her posture. What she is doing is effective, but I was anticipating her emphasizing, for instance, that the reason why she had to introduce herself to Trump at the start of the debate is because he skipped the inauguration, because he was a sore loser, etc. They are both taking each other extremely seriously in their exchanges.

Democratic reaction so far: PHEW

They know it’s not over and that there’s plenty to do, but the overriding feeling I’ve gotten from Democrats close to the Harris campaign over the debate’s first hour is immense relief. So many were scarred by the last debate and downplayed what Harris had to do tonight. But they’re unanimous now that her obvious strategy of getting under Trump’s skin has worked wonders.

One top Democrat who was catatonic during the last debate has just been texting me “YES” “YES” “YES” every few minutes, peaking as Trump rambled about pets being eaten and when Harris started laughing at him.

Their deeper feeling isn’t quite so gleeful. They know her most important audience tonight is undecided voters, not just people who hate Trump, and that this is almost certainly the largest audience she’ll get all campaign long. There’s a half-hour left, and Trump keeps hammering her on the border, one of her biggest weaknesses.

But they’re happy with how she got through the economics section, thrilled with her answers on abortion — her campaign adviser David Plouffe said on X that the campaign’s internal numbers showed a 40-point gap among undecided voters while they were talking — and they clearly see a path to success in letting him ramble incoherently while she tries to present herself as a chance to break beyond the messy, unproductive politics of the last decade. Harris’s campaign says that its live-testing of battleground-state undecided voters hit its lowest point when Trump was going on about insisting he won the 2020 race.

At this point during Trump-Biden debate, the president’s team was desperately hoping no one was watching. Right now, Harris’s is praying that everyone’s tuning in and that this is, like ABC keeps saying, the most consequential debate in history.

Trump says Biden hates Harris

In a comment that was even more startling than his description of her as a Marxist, Trump said of Harris that “Biden hates her.” Keep in mind that Biden hand-picked her as his vice-president, then made sure she rather than many other plausible Democrats was his successor when he withdrew from the 2024 race, and then spoke on her behalf at the convention and has been campaigning with her.

So who are you going to believe? Trump or your lying ears and eyes?

Unhappy Republican reactions are pouring in

Congressional Republicans have tuned into the Trump-Harris debate and, so far, they’re not liking what they’re seeing. Several conceded to reporters that Harris successfully forced Trump off his game:

Senator Lindsey Graham, an ally of Trump, took to social media to complain about the moderators who have heavily fact-checked the former president:

Another response:

Trump cites authoritarian Orban as his validator

After Kamala Harris taunted Trump with the contempt with which world leaders held the former president, Trump had one shining example of a foreign fan who is his validator: Hungarian authoritarian Viktor Orban! Aside from the fact that few viewers likely knew who he was talking about, the few who did were probably horrified. Whether you consider Orban a new Franco, or a new Perón, or a new Mussolini, he’s hardly a role model for American leadership.

The betting market says Harris is running away with it

For what it’s worth:

Trump says he had nothing to do with January 6

In an amazing turn of his extraordinarily frequent oscillations on what happened on January 6, Trump now says he had nothing to do with what happened at the Capitol. He just made a speech, and Nancy Pelosi (!) was responsible for what happened. He’s not acknowledging all the steps he took that led up to January 6 or — as David Muir tried to remind him, that he, not Pelosi, not Harris, not Biden — was president that day.

And he tried to answer January 6 question with … immigration

Trump also mentioned the shooting of Ashli Babbitt, a Capitol rioter — then pivoted immediately to immigration yet again. “She is the border czar,” he said falsely of Harris. “What about those people?” he asked. “When are they going to be prosecuted?” He then repeated a line from earlier, saying that crime rates are going down in other countries because criminals are crossing the border. (Violent crime in the United States is down, I should note.) Trump has nothing of substance to say; his only real attack point is immigration, immigration, immigration. He won’t take responsibility for the Capitol Riot and, minutes later, would not admit he lost the election to President Joe Biden before he brought up — you guessed it — immigration. Again.

ABC has fact-checked Trump in surprisingly vigorous fashion

As completely expected, Donald Trump spouted a lot of lies and gross exaggerations during Tuesday’s debate. But unlike on some other debate nights, the network in charge is doing some effective real-time fact-checking. At least three times, one of the two ABC moderators, David Muir and Linsey Davis, have stepped in and corrected Trump after particularly egregious answers.

“There is no state in this country where it’s legal to kill a baby after it’s born,” Muir announced after Trump falsely claimed otherwise during an answer on abortion.

Muir also informed the audience at home that there is no evidence of immigrants eating dogs in Ohio, which Trump claimed, spinning off a popular conservative conspiracy theory that raced across the internet this week.

A shushing attempt, too

Trump tries to blame harris for assassination attempt.

In a shocking moment, Trump seemed to directly accuse Harris of contributing to the attempt on his life, suggesting her rhetoric played a role. “I probably took a bullet to the head because of the things they said about me,” he said.

It appears to be part of a recent trend from Trump and his circle to raise the specter of conspiracy around the July assassination attempt ahead of the November election. On Monday, Trump’s wife, Melania, shared a video suggesting that there was “more to the story” of the shooting.

Harris continues to take hits on her 2019 positions

ABC asked Vice-President Harris about her previous positions calling for a fracking ban, a mandatory assault-weapons buyback, and decriminalizing immigration enforcement.

Harris in her reply says she supports fracking, and notes the Inflation Reduction Act, which she voted for, expanded fracking. But she doesn’t mention the other issues, and retreats into a generalized defense of her values. It’s her weakest response so far.

Trump brings a racist pet-eating hoax to the debate stage

Trump raised a debunked racist hoax on the stage, claiming that immigrants in Springfield, Ohio, are eating and killing residents’ pets. The rumors have been shared by Republican allies of Trump and his running mate, J.D. Vance. “In Springfield, they’re eating the dogs, the people that came in. They’re eating the cats. They’re eating the pets of the people that live there and this is what’s happening in our country,” he said.

Debate moderator David Muir cut in to correct Trump on his claim, but Trump continued on, saying that people have said so on TV.

When it was her turn to speak, Harris seemed pleased with the exchange. “I mean, talk about extreme,” she said with a laugh.

Trump owns our national abortion nightmare

Harris’s powerful abortion answer wasn’t just her strongest rhetorical moment; it was directly responsive to the contemporary hellscape of women sitting in parking lots bleeding out, patients being forced to cross state lines for their procedures, minors being forced to stay pregnant after assault, and IVF in the crosshairs. Before Roe v. Wade was overturned, this probably felt theoretical to a lot of Americans, and polls were all over the place. Now, all of the aforementioned stories are real.

Meanwhile, Trump’s trying to run an old playbook on abortion, the one the Susan B. Anthony List historically encouraged: trying to make Democrats squirm by bringing up later abortions, or as he put it to Hillary Clinton back in 2016, claiming that they support “the baby out of the womb of the mother just prior to the birth of the baby,” and lying about so-called post-birth abortions, which do not exist. (He was distorting comments made about newborn hospice and confusing West Virginia and Virginia in the process.) The problem for him is that he’s talking hypotheticals about a past that he can’t substantiate, while every day, Americans read headlines about real-life consequences that have profoundly affected public opinion.

The first person to bring up Hannibal Lecter was … Harris?

I have good and bad news if you had “Trump shouts out the ‘late great Hannibal Lecter’” on your debate bingo card. Harris brought up Trump’s favorite fictional cannibal as an example of the unhinged things Trump says at his rallies rather than focusing on ways to help the American people.

Harris puts Trump on the defensive on abortion

On abortion, former president Donald Trump offered a rambling, if familiar, answer, saying falsely that liberal states “have abortion in the ninth month.” Then he misspoke in the process of lying: He claimed the the governor of West Virginia wanted to execute babies after birth, when he usually means Ralph Northam, the former governor of Virginia. “For 52 years they’ve been trying to get Roe v. Wade into the states,” he said and praised the “genius, heart and strength” of his chosen Supreme Court justices, who voted to overturn Roe . This is typical Trump: He wants to take credit for killing Roe , a decision that he claims, falsely, is popular, but he doesn’t want to answer direct questions about his own position on issues like Florida’s abortion referendum.

Then Vice-President Kamala Harris swiftly and decisively put him on the defensive, referring to “Trump abortion bans” in conservative-controlled states and describing the human consequences of those bans. Some don’t have exceptions for rape or incest, she said, adding, “That is immoral.” People do not have to “abandon their faith or their deeply held beliefs” in order to oppose the government — and Trump — making reproductive decisions for them.

It was an effective line of attack, and Trump didn’t credibly respond. All he could do was accuse Harris (again, falsely) of supporting abortion as late as the ninth month of pregnancy “and probably after birth.” If Trump thinks he can appeal to moderates and independents by claiming to support certain exceptions to abortion bans, he’s failing. His arsenal contains lies and not much else. As personal stories of harm emerge in states with bans on the books, it’s harder and harder for Trump to distance himself from the world he’s created — and would reinforce if reelected president.

Kamala Harris is definitely winning the reaction game

Harris, baiting trump, flags his rallies, trump dodges question about signing a national abortion ban.

“Will you veto a national abortion ban?” asks a moderator. “Well, I won’t have to,” Trump replies. Trump hems and haws about whether a national abortion law will pass Congress. He is told that J.D. Vance promised he would veto a national abortion ban. Trump replies that he didn’t talk to Vance.

That sure sounds like he wouldn’t veto a ban.

Trump is angry — that wasn’t the plan

Trump’s advisers were preoccupied with two things ahead of the debate: (1) Prevent him from getting angered by whatever Harris says, which they worried would knock him off message. (2) Encourage him to hang back, in the hopes that Harris might be forced to talk more expansively, which they hoped would lead to her producing mangled sentences they could utilize in service of their argument that she speaks incoherently (as opposed to Trump, who is of course famously coherent.)

Just a few minutes in, he’s already angry. It seemed to start with Harris mentioning the Wharton School, which was an artful way to trigger him and it worked instantly. Now he is speaking at a high volume and rather aggressively. He is still on message, but for how long? Harris meanwhile seems to be tailoring her facial expressions for memes. People looking in a befuddled way at Donald Trump is a robust genre already, and I expect she’ll make a meaningful contribution to that trove by the end of the night.

A early miscue

Trump seemed to briefly mix up the two Virginias during a winding response on abortion. While inaccurately claiming that states are performing abortions after nine months, Trump appeared to make a reference to former Virginia governor Ralph Northam, praising his successor and ally Glenn Youngkin, but said West Virginia instead. A mix-up that likely won’t endear him to the commonwealth.

Didn’t take long to get to the red-baiting

After initially claiming that Harris was just a Biden rubber-stamp and then that she had no policies at all, Trump suddenly lurched into a flat assertion that Harris is a Marxist, alluding to the occasional description her father as a “Marxist economist.” He offered no explanation of this claim, but guess Harris is lucky he didn’t call her a “communist” as he often has.

If Harris can even fight Trump to a draw on the economy, that is a win

The economy is one of Trump’s best issues, per polling. The race is basically tied, and Trump’s strength is the perception he is an economic mastermind — a perception that is winning over some voters who otherwise don’t like him. Trump needs to win a clear victory on the economy. I don’t think he did at all, but we’ll see what the viewers think.

Behold skeptical Kamala

Harris blames trump for praising china’s covid response.

Kamala Harris not only quoted Trump praising Xi Jinping’s handling of COVID; she noted China’s lack of transparency on the origins of the pandemic. That is an interesting position for her to take, and a correct one, in my view. But it’s also one conservatives have largely owned, because some progressives have treated the hypothesis that the pandemic emerged from a lab as a conspiracy theory. Harris seems to be taking the other side.

Harris invokes Project 2025 early

One of the Democrats’ most effective attack lines against Donald Trump and other Republicans this year has been Project 2025 , the draconian playbook the Heritage Foundation and conservatives government cooked up for a second Trump presidency. Harris mentioned it early on even though the answer had little to do with the question Harris is asked, but it likely won’t be the first time she hits it tonight.

Trump, on the defensive, claimed ignorance. “I haven’t read it, I don’t want to read it,” he said.

A handshake and then a collision

When the two candidates came out, one question was answered when Kamala Harris approached Trump with a handshake that he awkwardly answered. The first question to Harris reprised the famous 1980 Reagan debate question: “Are [we] better off than four years ago?” She did not answer but instead went into her stock “opportunity economy” message, followed by a brisk denunciation of Trump’s economic agenda of tax cuts and tariffs. Following up, Trump introduced alleged uncontrolled immigration as wrecking the economy, and in a series of follow-ups, the two candidates hammered each other along the lines we expected, with Harris citing Project 2025 and Trump mocking Harris’s policy specifics.

Getting into it, right off the bat

Harris forces a handshake.

After much speculation it wouldn’t happen, there was indeed a handshake, but it didn’t come easily. Harris clearly insisted and had to walk all the way to Trump’s lectern to make it happen. “Kamala Harris. Let’s have a good debate,” she said. Trump replied, “Nice to see you, have fun.” Awkward!

Hell of a rumor!

Will that mass deportation involve barbed wire and cattle cars.

If my colleague Sarah Jones is right that Trump could “get nasty — and racist — fast” on immigration once the debate begins, then Kamala Harris will have a strategic decision to make on how to handle one of Trump’s signature issues. Up until now, she’s basically dealt with immigration by endorsing the bipartisan border-control bill that Trump killed earlier this year and moved on to other issues. But should Trump really go wild, she might consider poking him a bit on the implications of his promise to launch the greatest “mass deportation” in American history, involving every undocumented immigrant. Because of their defensiveness on the issue, Democrats have not raised alarms about the details of this terrible-sounding plan or the implications for Latino citizens and legal immigrants. who may be hassled or even rounded up in such an effort. Trump needs to pay a price for this very un-American America First idea.

Meanwhile in our group chat

Staffer 1: The debate room looks like Avatar. Staffer 2: rather aquatic Staffer 1: the podiums look different height Staffer 4: it’s making me feel a little insane Staffer 1: Trump is going to lose it Staffer 5: Shouldn’t the plural be “podia”? Yet it isn’t, strange Staffer 1: I went to a state school Four minutes later… Staffer 5: FYI, according to this AI Overview, “The plural of the word ‘podium’ is ‘podiums’ or ‘podia’” Six minutes later… Staffer 6: Chiming in to note that these are neither podiums nor podia. They are lecterns . The podium is the thing you stand on, not the thing you stand at.

When Laura Loomer is your debate adviser

Far-right activist Laura Loomer was seen leaving Trump’s plane after it arrived in Philadelphia. Loomer’s presence is notable for her extremism: She has called Islam “a cancer” and celebrated the deaths of migrants who were crossing the Mediterranean. On an extremist podcast in 2017, Loomer, who is Jewish, said, “Someone asked me, ‘Are you pro-white nationalism?’ Yes. I’m pro-white nationalism.”

Nevertheless, Trump supported her failed 2020 congressional race in Florida, she has flown on his plane in the past, and he reportedly wanted to hire her for a campaign role — until aides intervened, the Washington Post reported . With Loomer onboard, Trump may be in a pugnacious mood, especially on immigration. On Truth Social yesterday and today, he repeatedly boosted the viral lie that Haitian immigrants in Springfield, Ohio, are kidnapping and eating pets, and Republicans have tried to link Vice-President Harris to President Biden’s immigration policy with that rumor and in other talking points. The debate has yet to start, but expect Trump to quickly get nasty — and racist — once it begins.

The $64,000 question

At the end of the June 27 debate between Joe Biden and Donald Trump, CNN moderators tried three times to get a clear answer from Trump as to whether he would accept defeat in November. Indeed, not that a single person noticed, but Biden’s last words before the candidates went to closing remarks trolled and mocked Trump for his refusal to answer what turned out to be the $64,000 question of the 2020 election:

You’re a whiner. When you lost the first time, you continued to appeal and appeal to courts all across the country. Not one single court in America said any of your claims had any merit, state or local, none. But you continue to promote this lie about somehow there’s all this misrepresentation, all the stealing. There’s no evidence of that at all. And I tell you what? I doubt whether you’ll accept it because you’re such a whiner.

The odds are very high that the same fraught question will come up tonight and that Trump will again hedge and change the subject. Unless the moderators can find a more precise way to elicit a clear answer, Harris may need to do so herself with a pledge of her own.

There are at least a couple of benchmarks moderators or Harris could suggest for a concrete agreement by the candidates not to let the contest go until another horrifying January in Washington. One would be to accept the results if the election is called by the Associated Press and all the major networks, including Fox News. Another is to accept the results as certified by governors (or the highest election official in each state), which federal law requires by December 11. If Trump rejects a moderator or Harris challenge to go along with any benchmark other than his subjective determination the election is “fair,” it will be safe to conclude he’s planning another election coup.

Trump brought a big entourage to Philly

Politico reports that Trump Force One arrived with a lot of extra passengers, including “Stephen Miller, Natalie Harp, Laura Loomer, Vince Haley, Ross Worthington, John Coale, Steve Witkoff, Lara Trump, Alina Habba, Chris LaCivita, Steven Cheung, Susie Wiles, Corey Lewandowski, Eric Trump, Taylor Budowich, Tulsi Gabbard, Rep. Matt Gaetz, Margo Martin, Jason Miller, Boris Ephsteyn, Walt Nauta and Dan Scavino.”

Will there be fact-checking?

ABC News seems rather noncommittal, per the New York Times :

Rick Klein, ABC News’s political director and a lead organizer of Tuesday’s debate, said in an interview that the moderators, David Muir and Linsey Davis , were “there to facilitate a discussion” and that “the debate belongs to the candidates.” Is there a role for the moderators to fact-check? “I don’t think it’s a ‘yes’ or ‘no’ proposition,” Mr. Klein said. “We’re not making a commitment to fact-check everything, or fact-check nothing, in either direction. We’re there to keep a conversation going, and to facilitate a good solid debate, and that entails a lot of things in terms of asking questions, moving the conversation along, making sure that it’s civilized.”

What Democrats are expecting — and hoping for

Greetings from the extremely air-conditioned press filing center–slash–spin room in Philadelphia, where I just settled in after almost running straight into a very busy looking Marco Rubio at my hotel a few blocks away. (He must be here spinning for Donald Trump.)

I spent most of today checking in with Democrats inside of and close to Kamala Harris’s campaign to see how they’re feeling, what they expect, what they want to see, and what they’re nervous about. I got a lot of different answers, but one thing stuck out: Basically, all of them agreed that more pressure is on Harris tonight, if only because she’s the new character in the race and the one voters are still interested in hearing more from. (The consensus: Voters know exactly who Trump is and don’t need any new information about him, thank you very much .)

Harris knows this, obviously. As I reported over the weekend , she hasn’t been prepping to deliver some sort of devastating knockout blow to Trump but instead has been thinking about the best ways to present herself as representing a new political era. That’s probably going to mean talking plenty about Trump’s record, naturally — but just as much, if not more, about her vision for the economy.

Of course, we’ll see how this all goes to plan or rather how quickly it veers into unexpected territory. As one Democratic pollster told me this afternoon, reliable research about debate audiences shows 70 percent of what matters to voters is the visual and the performance rather than the substance of what the candidates say.

So yes, Harris will be eager to let Trump be Trump, to put it mildly. Her campaign has been trolling Trump on the airwaves and with billboards about, uh, crowd size here in Philly. If he goes unhinged early, they’ll consider it a win. One top Democrat I talked to didn’t disagree that the pressure was on her but said the bar was pretty low after Biden’s performance this summer. Instead, this person suggested, Harris’s job is just to be the normal adult onstage. Isn’t that what exhausted voters want?

It’s a special guest spin-off!

Spin-room drama abounds:

Tim Walz’s pre-frame

At a campaign stop in Arizona, he told supporters that Harris would use the debate to introduce their ticket to more of the country — and the contrast with Trump would be obvious:

Tonight you’re going to watch Vice President Harris lay out a plan for this country, a new way forward. You’re going to hear her talk about an economy that is an opportunity economy where everybody matters. She’s going to talk about education being a path to a better future, not long term student loan debt. She’s going to talk about tackling some of the toughest problems like climate change and doing it in a way that grows our economy. Now if you did a split screen to that, on the other side of that screen, you’re going to see a nearly 80 year old man who’s in it for himself talk about revenge and talk about how bad this country is, and talk us down on everything he does. … Let’s not let a single person, make the case that there is not an absolutely crystal clear difference of a positive forward America, or one that is small, petty, backwards and we’re done with it.

Trump is demanding a government shutdown over mythical noncitizen voting

Hours before the debate, Donald Trump added a surreal note to the event by pitching a fit on Truth Social and demanding that congressional Republicans shut down the federal government at the end of September if Democrats don’t accept a ridiculous and redundant proposal to federalize state election systems in order to address a completely made-up crisis over noncitizen voting:

If Republicans in the House, and Senate, don’t get absolute assurances on Election Security, THEY SHOULD, IN NO WAY, SHAPE, OR FORM, GO FORWARD WITH A CONTINUING RESOLUTION ON THE BUDGET. THE DEMOCRATS ARE TRYING TO “STUFF” VOTER REGISTRATIONS WITH ILLEGAL ALIENS. DON’T LET IT HAPPEN - CLOSE IT DOWN!!!

By way of background, House Republicans earlier this year pushed through the so-called SAVE Act , reflecting Trump’s 100 percent unsubstantiated claims that Democrats are planning to flood the polls with voting by noncitizens. Noncitizen voting is already illegal in all 50 states with prison sentences and deportation the available penalties for the incredibly rare violation.

Congressional Republicans led by House Speaker Mike Johnson understood all along this was a empty “messaging” bill not designed to become law but to underline a MAGA campaign talking point. But now Trump has blown up that harmless if demagogic gesture by demanding that Johnson (and also Senate Republican Leader Mitch McConnell, who is likely to openly mock this gesture) refuse to go along with a stopgap spending plan at the end of the fiscal year that is necessary to keep the federal government operating. There is zero chance the Senate or the White House will go along with this demand, which would require all 50 states completely redo their process for voter registration right before a national election for absolutely no good reason. Johnson agreed to prioritize this dumb legislation in the first place because he needed Trump’s protection from a potential coup by the House Freedom Caucus, which was angry at Johnson for not shutting down the government earlier this year. Now, the dispute could become very real for federal employees and beneficiaries of key federal programs and services.

This is a very old theme for Trump despite its fictional underpinnings. When he won in 2016, he complained that he would have won the national popular vote (which he lost by over 2-and-a-half million votes) if not for “millions of illegal votes.” He offered zero evidence for this claim. He’s brought back the phantom menace of noncitizen voting this year as part of a broader claim that Democrats have opened up the borders to bring in migrants who will immediately be marched to the polls to reelect their socialist benefactors. You can understand how this hoax appeals to Trump since it combines his signature immigration and “stolen election” themes. Either Harris or the debate moderators should consider demanding that Trump cite some actual evidence that any of this is happening, not that hard-core MAGA folk need any for this version of the Great Replacement Theory .

The bets are against a handshake

On the political betting site Polymarket , most bettors don’t think Harris and Trump will shake hands tonight:

The odds are probably even worse than that, since there hasn’t been a presidential debate handshake since the first Trump-Clinton debate in 2016.

What Harris and Trump need to do

From my debate preview this morning, Harris has her work cut out for her:

Without question, [she] has the more complicated task: defining herself to viewers as an agent of change from the Biden-Trump era of politics, and a much safer option than an extremist second Trump administration. This means anticipating and rebutting Trump claims that she is responsible for Biden’s alleged policy failures and is more radical than Biden himself. And it also means casting some light through the fog of endless commentary about Trump to convincingly express concerns about what he will do if restored to power.

Trump, meanwhile, needs to focus on pigeonholing:

Trump’s biggest advantage is the extremely low standard he has set throughout his career for either coherence or civility. Almost anyone else would be afflicted with a dilemma as to whether to accuse Harris of being Biden 2.0 or a “communist,” since Biden is nobody’s idea of a dedicated Marxist-Leninist revolutionary. Trump can blithely pursue both angles of attack simultaneously, because that’s just who he is. Calling Harris a “radical” or a “Marxist” or a “communist” is what passes for a substantive comment from the former president, and he would be wise to stick with ideologically freighted criticism rather than slandering her personally (i.e., he should leave the blatantly racist and sexist patter to MAGA social media). Above all, the 45th president needs to do everything he can to fan doubts about Harris, making her out to be the “risky change” candidate and returning the election to a competition between highly motivated party bases with swing voters ultimately focused on their unhappiness with life as it is.

What time is the debate tonight, and how can you watch it?

It will be broadcast live at 9 p.m. ET on ABC and simulcast on multiple other networks, including C-Span, PBS, MSNBC, and Fox News. The debate will also be streamed live on ABC.com and ABC News’ YouTube channel (for people without cable or streaming-service subscriptions), as well as on ABC News Live, Disney+, and Hulu.

This post has been updated a lot.

Editor’s Picks

What is your email?

This email will be used to sign into all New York sites. By submitting your email, you agree to our Terms and Privacy Policy and to receive email correspondence from us.

Sign In To Continue Reading

Create your free account.

Password must be at least 8 characters and contain:

Lower case letters (a-z)
Upper case letters (A-Z)
Numbers (0-9)
Special Characters (!@#$%^&*)

As part of your account, you’ll receive occasional updates and offers from New York , which you can opt out of anytime.

","

IMAGES

Social Network Analysis for Foundations: Six Ideas to Scale Impact
Social Network Analysis
Using Social Network Analysis to Understand the Perceived Role and
Demystifying Social Network Analysis in Development: Five Key Design
(PDF) Social Network Analysis
Key Research Issues in Online Social Network Analysis

VIDEO

Social Networks Analysis
Social Networks Analysis on Numb3rs
How Can I Get Started with Social Network Analysis?
Social Network Analysis 101
Playing the kids and animal very jnteresting
Compare Social Networks Worldwide

COMMENTS

Social Network Analysis 101: Ultimate Guide
Definition of Social Network Analysis (SNA) Social Network Analysis, or SNA, is a research method used to visualize and analyze relationships and connections between entities or individuals within a network. Imagine mapping the relationships between different departments in a corporation.
Social Network Analysis: A Survey on Process, Tools, and Application
Due to the explosive rise of online social networks, social network analysis (SNA) has emerged as a significant academic field in recent years. Understanding and examining social relationships in networks through network analysis opens up numerous research avenues in sociology, literature, media, biology, computer science, sports, and more.
Social Network Analysis: An Example of Fusion Between Quantitative and
Abstract A quantitative approach to social network analysis involves the application of mathematical and statistical techniques and graphical presentation of results. Nonetheless—as with all sciences—subjectivity is an integral aspect of network analysis, manifested in the selection of measures to describe connection patterns and actors' positions (e.g., choosing a centrality indicator ...
Social Network Analysis: History, Concepts, and Research
1 Introduction. Social network analysis (SNA), in essence, is not a formal theory in social science, but rather an approach for investigating social structures, which is why SNA is often referred to as structural analysis [1]. The most important difference between social network analysis and the traditional or classic social research approach ...
New Developments in Social Network Analysis
This review of social network analysis focuses on identifying recent trends in interpersonal social networks research in organizations, and generating new research directions, with an emphasis on conceptual foundations. It is organized around two broad social network topics: structural holes and brokerage and the nature of ties. New research directions include adding affect, behavior, and ...
Social Network Analysis
Social Network Analysis refers to the study conducted with an awareness of social networks, including connections with other analysts in the field. It involves examining relationships between individuals or groups to understand patterns and dynamics within social structures. AI generated definition based on: Social Networks, 2005.
The four dimensions of social network analysis: An overview of research
Highlights • Up-to-date literature review of basic research and application domains in social networks. • Definition of a new set of metrics to measure the capacity of SNA frameworks and tools. • Quantitative analysis of social network analysis tools and frameworks (SNA). • Evaluation of 20 popular SNA software tools according to the new set of metrics. • SNA software technology ...
The SAGE Handbook of Social Network Analysis
Learn from the experts how to apply social network analysis to various fields and topics with this comprehensive handbook.
(PDF) Social network analysis: An overview
Social network analysis (SNA) is a core pursuit of analyzing social networks today. In addition to the usual statistical techniques of data analysis, these networks are investigated using SNA ...
Research Designs for Social Network Analysis
Research design for social network analysis (SNA), as for any other types of research, is a process during which the research question and set of methods that enable to answer the stated question are described. Social network analysis is a multidisciplinary research area, and in consequence a wide range of approaches to analyze network data exists.
New research methods & algorithms in social network analysis
Highlights • New trends and application of advanced data science and artificial intelligence techniques for knowledge extraction from social networks. • Selected papers related to the application of machine learning, soft computing, and computational intelligence to complex social media-based domains. • Current contributions and challenges in social media analysis, social network ...
Use of social network analysis in health research: a scoping review
Introduction Social networks can affect health beliefs, behaviours and outcomes through various mechanisms, including social support, social influence and information diffusion. Social network analysis (SNA), an approach which emerged from the relational perspective in social theory, has been increasingly used in health research. This paper outlines the protocol for a scoping review of ...
The Role of Social Network Analysis in Social Media Research
Social network analysis uses a variety of mathematical techniques, such as maximum likelihood estimation and p-models, for studying stochastic and dynamic events to investigate the structural features of social media usage, which could be reflective of real contexts with certain levels of chaos, or entropy.
Social network analysis
Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. [1] It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them.
Social Network Analysis
Social Network Analysis Social Network Analysis (SNA) is an analytical method used to study social structures through the use of networks and graph theory. It identifies the relationships between individuals, organizations, or other entities and examines the patterns and implications of these relationships.
Social Network Analysis: From Graph Theory to Applications with Python
Social network analysis is the process of investigating social structures through the use of networks and graph theory. This article introduces data scientists to the theory of social networks, with a short introduction to graph theory and information spread.
Characterizing an online, science-based affinity space using topic
Previous research exploring the affinity space concept in online social networking spaces has focused primarily on health science networks. The research by Sharma et al. (Citation 2021) examined a much smaller network of 158 participants and 514 posts in a diabetes-focused affinity space. When examining the affinity space, Sharma and colleagues ...
Social Network Analysis
Social Network analysis is the study of structure, and how it influences health, and it is based on theoretical constructs of sociology and mathematical foundations of graph theory. Structure refers to the regularities in the patterning of relationships among individuals, groups and/or organizations. When social network analysis is undertaken ...
Home
Social Network Analysis and Mining is a multidisciplinary journal focusing on theoretical and experimental work related to social network analysis and mining. Serves a wide range of researchers from computer science, network science, social sciences, mathematical sciences, medical, biological, financial, management, and political sciences.
Big data analytics meets social media: A systematic review of
Hence, big data analytic techniques and frameworks are commonly exploited in Social Network Analysis (SNA). By the ever-increasing growth of social networks, the analysis of social data, to describe and find communication patterns among users and understand their behaviors, has attracted much attention.
New Book on Social Network Analysis
Professor Song Yang in the Department of Sociology and Criminology published a new book, Social Network Analysis in Action, by Springer. This edited volume includes six cutting-edge chapters on various aspects of social network analysis, starting with basics of social network theories, research designs, data analytics, moving to advanced topics in social network analysis, data mining from ...
Understanding Social Network Analysis: A Complete Guide
Unravel the complexities of Social Network Analysis (SNA) with our complete guide. Explore methodologies, applications, challenges, and ethical considerations.
Social network analysis: New ethical approaches through collective
Research in social network analysis (SNA) faces unprecedented ethical challenges today due to both technological developments ('big' data) and a growi…
Digital Entrepreneurial Orientation and Green Innovation in the VUCA
Based on resource-based view (RBV) and social network theory (SNT), we developed a model to examine the role of cross-organizational improvisation (COI) and social ties (specifically, business and political ties) in the relationship between DEO and green innovation (GI). ... theoretical analysis predominates over empirical research (Hains ...
Twitter Sentiment Analysis with LSTM Neural Networks
This project delves into sentiment analysis on Twitter using Long Short-Term Memory Neural Networks in conjunction with Global Vectors for Word Representation (GloVe) to highlight the potential of LSTM neural networks for sentiment analysis on social media platforms like Twitter. This project delves into sentiment analysis on Twitter using Long Short-Term Memory (LSTM) Neural Networks in ...
Who is Hispanic?
The Census Bureau first asked everybody in the U.S. about Hispanic ethnicity in 1980. But it made some efforts before then to count people who today would be considered Hispanic. The Census Bureau also has a long history of changing labels and shifting categories.In the 1930 census, for example, the race question had a category for "Mexican."
Essential Nutrients, Added Sugar Intake, and Epigenetic ...
The National Heart Lung and Blood Institute Growth and Health Study Research Group. Obesity and cardiovascular disease risk factors in black and white girls: the NHLBI Growth and Health Study. Am J Public Health . 1992;82(12):1613-1620. doi: 10.2105/AJPH.82.12.1613 PubMed Google Scholar Cross
A visual approach to tracking emotional sentiment dynamics in social
Research Objectives In this article, we investigate the following research questions on sentiment analysis and trend prediction in social media comment sections: (RQ1): Can we design a tool to effectively track and analyze the evolution of sentiment trends in social media comment sections across different time frames, focusing on both short-term fluctuations and long-term shifts in public opinion?
Project 2025 partner president makes misleading claims about IVF and
The Family Research Council is a Project 2025 partner. Video file Citation From the September 12, 2024, edition of Family Research Council's Washington Watch
Who Won the Trump-Harris Debate? Highlights, Analysis
From my new report on how the Harris team's debate strategy played out:. It took only a few minutes for Trump to grow flustered by Harris's reference to a negative analysis of his economic ...

Social Network Analysis 101: Ultimate Guide

Table of Contents

Introduction

Definition of Social Network Analysis (SNA)

The Importance of SNA

Brief Historical Overview of SNA

Fundamentals of SNA

Nodes and Edges

Network Types

Network Properties

Dyadic and Triadic Relationships

Homophily and Heterophily

Network Topologies

Theoretical Background of SNA

Strength of Weak Ties Theory

Structural Hole Theory

Small World Network Theory

Barabási–Albert (Scale-Free Network) Model

Data Collection and Preparation

Primary Methods for Collecting SNA Data

Secondary Sources of SNA Data

Ethical Considerations in Data Collection

Preparing Data for Analysis

Network Analysis Methods & Techniques

Basic Technique: Network Centrality

Advanced Techniques: Clusters and Equivalence

Visualizing Networks

Software and Tools for SNA

Introduction to Popular SNA Tools

Choosing the Right Tool for Your Analysis:

PARTNER CPRM: A Community Partner Relationship Management System for Network Mapping

SNA Case Studies

Case Study 1: Leveraging SNA for Program Evaluation

Case Study 2: Empowering Coalition-building

Case Study 3: Boosting Employee Engagement

Challenges and Future Directions in Network Analysis

The Limitations of SNA

Current Trends and Future Predictions

Conclusion: Social Network Analysis 101

Resources and Further Reading

Recommended Books on SNA

Online Resources and Courses

Journals and Research Papers on SNA

Frequently Asked Questions about SNA

Connect with our Team!

Get Involved!

Join our next webinar: Marketing & Communication Strategies & Tactics for Networks & Coalitions

Network Leadership Guide

Ecosystem Mapping Template

Network Strategy Planner

Subscribe to our Network Science Newsletter!

Social Network Analysis: A Survey on Process, Tools, and Application

New Citation Alert!

Information & Contributors

Recommendations

Broad Learning:: An Emerging Area in Social Network Analysis

Understanding user behavior in a local social media platform by social network analysis

Information

Publication History

Contributors

View Options

Full Access

Share this Publication link

Share on social media

Annual Review of Organizational Psychology and Organizational Behavior

Most Read This Month

Social network analysis: An overview

Log in using your username and password

You are here

Statistics from Altmetric.com

STRENGTHS AND LIMITATIONS OF THIS STUDY

Introduction

Social network analysis

Study rationale

Methods and analysis

Patient and public involvement

Step 1: identifying the research question

Step 2: identifying relevant studies

Supplemental material

Step 3: study selection