hypothesis generating vs testing

Society Homepage About Public Health Policy Contact

Data-driven hypothesis generation in clinical research: what we learned from a human subject study, article sidebar.

Submit your own article

Join the Society

The European Society of Medicine is more than a professional association. We are a community. Our members work in countries across the globe, yet are united by a common goal: to promote health and health equity, around the world.

Join Europe’s leading medical society and discover the many advantages of membership, including free article publication.

Main Article Content

Hypothesis generation is an early and critical step in any hypothesis-driven clinical research project. Because it is not yet a well-understood cognitive process, the need to improve the process goes unrecognized. Without an impactful hypothesis, the significance of any research project can be questionable, regardless of the rigor or diligence applied in other steps of the study, e.g., study design, data collection, and result analysis. In this perspective article, the authors provide a literature review on the following topics first: scientific thinking, reasoning, medical reasoning, literature-based discovery, and a field study to explore scientific thinking and discovery. Over the years, scientific thinking has shown excellent progress in cognitive science and its applied areas: education, medicine, and biomedical research. However, a review of the literature reveals the lack of original studies on hypothesis generation in clinical research. The authors then summarize their first human participant study exploring data-driven hypothesis generation by clinical researchers in a simulated setting. The results indicate that a secondary data analytical tool, VIADS—a visual interactive analytic tool for filtering, summarizing, and visualizing large health data sets coded with hierarchical terminologies, can shorten the time participants need, on average, to generate a hypothesis and also requires fewer cognitive events to generate each hypothesis. As a counterpoint, this exploration also indicates that the quality ratings of the hypotheses thus generated carry significantly lower ratings for feasibility when applying VIADS. Despite its small scale, the study confirmed the feasibility of conducting a human participant study directly to explore the hypothesis generation process in clinical research. This study provides supporting evidence to conduct a larger-scale study with a specifically designed tool to facilitate the hypothesis-generation process among inexperienced clinical researchers. A larger study could provide generalizable evidence, which in turn can potentially improve clinical research productivity and overall clinical research enterprise.

Article Details

The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives .

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a or H 1 ).
Collect data in a way designed to test the hypothesis.
Perform an appropriate statistical test .
Decide whether to reject or fail to reject your null hypothesis.
Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Prevent plagiarism. Run a free check.

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

an estimate of the difference in average height between the two groups.
a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

Normal distribution
Descriptive statistics
Measures of central tendency
Correlation coefficient

Methodology

Cluster sampling
Stratified sampling
Types of interviews
Cohort study
Thematic analysis

Research bias

Implicit bias
Cognitive bias
Survivorship bias
Availability heuristic
Nonresponse bias
Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved September 18, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

About the LSE Impact Blog
Comments Policy
Popular Posts
Recent Posts
Subscribe to the Impact Blog
Write for us
LSE comment

February 3rd, 2016

Putting hypotheses to the test: we must hold ourselves accountable to decisions made before we see the data..

5 comments | 2 shares

Estimated reading time: 5 minutes

We are giving $1,000 prizes to 1,000 scholars simply for making clear when data were used to generate or test a hypothesis. Science is the best tool we have for understanding the way the natural world works. Unfortunately, it is in our imperfect hands . Though scientists are curious and can be quite clever , we also fall victim to biases that can cloud our vision. We seek rewards from our community, we ignore information that contradicts what we believe, and we are capable of elaborate rationalizations for our decisions. We are masters of self-deception .

Yet we don’t want to be. Many scientists choose their career because they are curious and want to find real answers to meaningful questions. In its idealized form, science is a process of proposing explanations and then using data to expose their weaknesses and improve them. This process is both elegant and brutal. It is elegant when we find a new way to explain the world, a way that no one has thought of before. It is brutal in a way that is familiar to any graduate student who has proposed an experiment to a committee or to any researcher who has submitted a paper for peer-review. Logical errors, alternative explanations, and falsification are not just common – they are baked into the process.

Image credit: Winnowing Grain Eastman Johnson Museum of Fine Arts, Boston

Using data to generate potential discoveries and using data to subject those discoveries to tests are distinct processes. This distinction is known as exploratory (or hypothesis-generating) research and confirmatory (or hypothesis-testing) research. In the daily practice of doing research, it is easy to confuse which one is being done. But there is a way – preregistration. Preregistration defines how a hypothesis or research question will be tested – the methodology and analysis plan. It is written down in advance of looking at the data, and it maximizes the diagnosticity of the statistical inferences used to test the hypothesis. After the confirmatory test, the data can then be subjected to any exploratory analyses to identify new hypotheses that can be the focus of a new study. In this way, preregistration provides an unambiguous distinction between exploratory and confirmatory research.The two actions, building and tearing down, are both crucial to advancing our knowledge. Building pushes our potential knowledge a bit further than it was before. Tearing down separates the wheat from the chaff. It exposes that new potential explanation to every conceivable test to see if it survives.

To illustrate how confirmatory and exploratory approaches can be easily confused, picture a path through a garden, forking at regular intervals, as it spreads out into a wide tree. Each split in this garden of forking paths is a decision that can be made when analysing a data set. Do you exclude these samples because they are too extreme? Do you control for income/age/height/wealth? Do you use the mean or median of the measurements? Each decision can be perfectly justifiable and seem insignificant in the moment. After a few of these decisions there exists a surprisingly large number of reasonable analyses. One quickly reaches the point where there are so many of these reasonable analyses, that the traditional threshold of statistical significance, p < .05, or 1 in 20, can be obtained by chance alone .

If we don’t have strong reasons to make these decisions ahead of time, we are simply exploring the dataset for the path that tells the most interesting story. Once we find that interesting story, bolstered by the weight of statistical significance, every decision on that path becomes even more justified, and all of the reasonable, alternative paths are forgotten. Without us realizing what we have done, the diagnosticity of our statistical inferences is gone. We have no idea if our significant result is a product of accumulated luck with random error in the data, or if it is revealing a truly unusual result worthy of interpretation.

This is why we must hold ourselves accountable to decisions made before seeing the data. Without putting those reasons into a time-stamped, uneditable plan, it becomes nearly impossible to avoid making decisions that lead to the most interesting story. This is what preregistration does. Without preregistration, we effectively change our hypothesis as we make those decisions along the forking path. The work that we thought was confirmatory becomes exploratory without us even realizing it.

I am advocating for a way to make sure the data we use to create our explanations is separated from the data that we use to test those explanations. Preregistration does not put science in chains . Scientists should be free to explore the garden and to advance knowledge. Novelty, happenstance, and unexpected findings are core elements of discovery. However, when it comes time to put our new explanations to the test, we will make progress more efficiently and effectively by being as rigorous and as free from bias as possible.

Preregistration is effective. After the United States required that all clinical trials of new treatments on human subjects be preregistered, the rate of finding a significant effect on the primary outcome variable fell from 57% to just 8% within a group of 55 cardiovascular studies. This suggests that flexibility in analytical decisions had an enormous effect on the analysis and publication of these large studies. Preregistration is supported by journals and research funders . Taking this step will show that you are taking every reasonable precaution to reach the most robust conclusions possible, and will improve the weight of your assertions.

Most scientists, when testing a hypothesis, do not specify key analytical decisions prior to looking through a dataset. It’s not what we’re trained to do. We at the Center for Open Science want to change that. We will be giving 1,000 researchers $1,000 prizes for publishing the results of preregistered work. You can be one of them. Begin your preregistration by going to https://cos.io/prereg .

Note: This article gives the views of the author(s), and not the position of the LSE Impact blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.

About the Author:

David Mellor is a Project Manager at the Center for Open Science and works to encourage preregistration. He received his PhD from Rutgers University in Ecology and Evolution has been an active researcher in the behavioral ecology and citizen science communities.

About the author

I strongly agree with almost all of this. One question, though. I sometimes take part in studies that use path models. It can happen that a referee suggests an additional pathway that makes sense to us. But this would not have been in the original specification of the model. Come to think of it this kind of thing must happen pretty often. How would you view that?

That is a great point and is a very frequent occurrence. I think that the vast majority of papers come out of peer review with one or more changes in how the data are analyzed. The best way to handle that is with transparency: “The following, additional paths (or tests, interactions, correlations, etc..) were conducted after data collection was complete…” The important distinction is to not present those new pathways as simply part of the a-priori tests or to lump them with the same analyses presented initially and planned ahead of time. This way, the reader will be able to properly situate those new tests in the complete body of evidence presented in the paper. After data collection and initial analysis, any new tests were created by being influenced by the data and are, in essence, a new hypothesis that is now being tested with the same data that was used to create it. That new test can be confirmed with later follow up study using newly collected data.

Doesn’t this just say – we can only be honest by being rigid? It carries hypothetico-deductive ‘logic’ to a silly extreme, ignoring the inherently iterative process of theorization, recognition of interesting phenomena, and data analysis. But, creative research is not like this. How can you formulate meaningful hypotheses without thinking about and recognizing patterning in the data – the two go hand in hand, and are not the same as simply ‘milking’ data for significant results.

Pingback: Testing a Hypothesis? Be Upfront About It and Win $1,000

Hi Patrick, Thank you for commenting. I very much agree that meaningful hypotheses cannot be made without recognizing patterns in the date. That may the best way to make a reasonable hypothesis. However, the same data that are used to create the hypothesis cannot be used to test that same hypothesis, and this is what preregistration does. It makes it clear to ourselves exactly what the hypothesis is before seeing the data, so that the data aren’t then used to subtly change/create a new hypothesis. If it does, fine, great! But that is hypothesis building, not hypothesis testing. That is exploratory work, not confirmatory work.

The research librarian of the future: data scientist and co-investigator

December 14th, 2016.

Collaboration and concerted action are key to making open data a reality

October 29th, 2017.

The radical potential of the Digital Humanities: The most challenging computing problem is the interrogation of power

August 12th, 2015.

Real-time data on global collaboration networks can support new research and create further connections

May 23rd, 2017.

Visit our sister blog LSE Review of Books

Open access
Published: 10 October 2012

Approaches to informed consent for hypothesis-testing and hypothesis-generating clinical genomics research

Flavia M Facio 1 ,
Julie C Sapp 1 ,
Amy Linn 1 , 2 &
Leslie G Biesecker 1

BMC Medical Genomics volume 5 , Article number: 45 ( 2012 ) Cite this article

5673 Accesses

11 Citations

Metrics details

Massively-parallel sequencing (MPS) technologies create challenges for informed consent of research participants given the enormous scale of the data and the wide range of potential results.

We propose that the consent process in these studies be based on whether they use MPS to test a hypothesis or to generate hypotheses. To demonstrate the differences in these approaches to informed consent, we describe the consent processes for two MPS studies. The purpose of our hypothesis-testing study is to elucidate the etiology of rare phenotypes using MPS. The purpose of our hypothesis-generating study is to test the feasibility of using MPS to generate clinical hypotheses, and to approach the return of results as an experimental manipulation. Issues to consider in both designs include: volume and nature of the potential results, primary versus secondary results, return of individual results, duty to warn, length of interaction, target population, and privacy and confidentiality.

The categorization of MPS studies as hypothesis-testing versus hypothesis-generating can help to clarify the issue of so-called incidental or secondary results for the consent process, and aid the communication of the research goals to study participants.

Peer Review reports

Advances in DNA sequencing technologies and concomitant cost reductions have made the use of massively-parallel sequencing (MPS) in clinical research practicable for many researchers. Implementations of MPS include whole genome sequencing and whole exome sequencing, which we consider to be the same, for the purposes of informed consent. A challenge for researchers employing these technologies is to develop appropriate informed consent [ 1 , 2 ], given the enormous amount of information generated for each research participant, and the wide range of medically-relevant genetic results. Most of the informed consent challenges raised by MPS are not novel – what is novel is the scale and scope of genetic interrogation, and the opportunity to develop novel clinical research paradigms.

Massively-parallel sequencing has the capacity to detect nearly any disease-causing gene variant, including late-onset disorders, such as neurologic or cancer-susceptibility syndromes, subclinical disease or endo-phenotypes, such as impaired fasting glucose, and heterozygous carriers of traits inherited in a recessive pattern. Not only is the range of the disorders broad, but the variants have a wide range of relative risks from very high to nearly zero. This is a key distinction of MPS when compared to common SNP variant detection (using so-called gene chips). Because some variants discovered by MPS can be highly penetrant, the detection of such variants can have enormous medical and counseling impact. While many of these informed consent issues have been addressed previously [ 1 , 3 ], the use of MPS in clinical research combines these issues and is on a scale that is orders of magnitude greater than previous study designs.

The initial clinical research uses of MPS were a brute force approach to the identification of mutations for rare mendelian disorders [ 4 ]. This is a variation of positional cloning (also known as gene mapping) and thus a form of classical hypothesis-testing research. The hypothesis is that the phenotype under study is caused by a genetic variant and a suite of techniques is employed (in this case MPS) to identify that causative variant. The application of this technology in this setting is of great promise and will identify causative gene variants for numerous traits, with some predicting that the majority of Mendelian disorders will be elucidated in 5–10 years.

The second of these pathways to discovery is a more novel approach of generating and then sifting MPS results as the raw material to allow the generation of clinical hypotheses, which are in turn used to design clinical experiments to discover the phenotype that is associated with that genotype. This approach we term hypothesis-generating clinical genomics. These hypothesis-generating studies require a consent process that provides the participant with an understanding of scale and scope of the interrogation, which is based on a contextual understanding of the goal and overall organization of the research since specific risks and benefits can be difficult to delineate [ 5 , 6 ]. Importantly, participants need to understand the notion that the researcher is exploring their genomes in an open-ended fashion, that the goal of the experiment is not predictable at the outset, and that the participant will be presented with downstream situations that are not currently foreseeable.

We outline here our approaches to informed consent for our hypothesis-testing and hypothesis-generating MPS research studies. We propose that the consent process be tailored depending on which of these two designs is used, and whether the research aims include study of the return of results.

General issues regarding return of results

Participants in our protocols have the option to learn their potentially clinically relevant genetic variant results. The issue of return of results is controversial and the theoretical arguments for and against the return of results have been extensively debated [ 7 ]. Although an increasing body of literature describes the approaches taken by a few groups no clear consensus exists in either the clinical genomics or bioethics community [ 8 ]. At one end of the spectrum there are those who argue that no results should be returned [ 9 ], and at the other end others contend that the entire sequence should be presented to the research participant [ 10 – 12 ]. In between these extremes lies a qualified or intermediate disclosure policy [ 13 , 14 ]. We take the intermediate position in both of our protocols by giving research participants the choice to receive results, including variants deemed to be clinically actionable [ 3 , 15 ]. Additionally, both protocols are investigating participants’ intentions towards receiving different types of results in order to inform the disclosure policies within the projects and in the broader community [ 16 ]. Because one of our research goals is to study the issues surrounding return of results, it is appropriate and necessary to return results. Thus, the following discussion focuses on issues pertinent to studies that plan to return results.

Issues to consider

Issue #1: primary versus secondary variant results and the open-ended nature of clinical genomics.

In our hypothesis-testing study we distinguish variants as either primary or secondary variants, the distinction reflecting the purpose of the study. A primary variant is a mutation that causes the phenotype that is under study, i.e., the hypothesis that is being tested in the study. A secondary variant is any mutation result not related to the disorder under study, but discovered as part of the quest for the primary variant.

We prefer the term ‘secondary’ to ‘incidental’ because the latter is an adjective indicating chance occurrence, and the discovery of a disease causing mutation by MPS cannot be considered a chance occurrence. The word ‘incidental’ also suggests a lesser degree of importance or impact and it is important to recognize that secondary findings can be of greater medical or personal impact than primary findings.

The consent discussion about results potentially available from participation in a hypothesis-testing study is framed in terms of the study goal, and we assume a high degree of alignment between participants’ goals and the researchers’ aims with respect to primary variants. Participants are, in general, highly motivated to learn the primary variant result and we presume that this motivation contributed to their decision to enroll in the study, similar to motivations for those who have been involved in positional cloning studies. This motivation may not hold for secondary variants, but our approach is to offer them the opportunity to learn secondary and actionable variants that may substantially alter susceptibility to, or reproductive risk for, disease.

In the hypothesis-generating study design no categorical distinction (primary vs. secondary) is made among pathogenic variants, i.e., all variants are treated the same without the label of ‘primary’ or ‘secondary’. This is because we are not using MPS to uncover genetic variants for a specific disease, and any of the variants could potentially be used for hypothesis generation. We suggest that this is the most novel issue with respect to informed consent as the study is open-ended regarding its goals and downstream research activities. This is challenging for informed consent because it is impossible to know what types of hypotheses may be generated at the time of enrollment and consent.

Because the downstream research topics and activities are impossible to predict in hypothesis-generating research, subjects must be consented initially to the open-ended nature of the project. During the course of the study, they must be iteratively re-consented as hypothesis are generated from the genomic data and more specific follow-up studies are designed and proposed to test those newly generated hypotheses. These downstream, iterative consents will vary in their formality, and the degree to which they need to be reviewed and approved. Some general procedures can be approved in advance; for example it may be anticipated that segregation studies would be useful to determine causality for sequence variants or the investigator may simply wish to obtain some additional targeted medical or family history from the research subject. This could be approved prospectively by the IRB with the iterative consent with the subject comprising a verbal discussion of the nature of the trait for which the segregation analysis or additional information is being sought. More specific or more invasive or risky iterative analyses would necessitate review and approval by the IRB with written informed consent.

Informed consent approach

The informed consent process must reflect the fundamental study design distinction of hypothesis-testing versus hypothesis-generating clinical genomics research. For the latter, the challenge is to help the research subjects understand that they are enrolling in a study that could lead to innumerable downstream research activities and goals. The informed consent process must be, like the research, iterative, and involve ongoing communication and consent with respect to those downstream activities.

Issue #2: Volume and nature of information

Whole genome sequencing can elucidate an enormous number of variations for a given individual. A typical whole genome sequence yields ~4,000,000 sequence variations. A whole exome sequence limits the interrogation to the coding regions of genes (about 1–1.5% of the genome) and generates typically 30,000-50,000 gene variants. While most are benign or of unknown consequence, some are associated with a significant increased risk of disease for the individual and/or their family members. For example, the typical human is a carrier for three to five deleterious genetic variants or mutations that cause severe recessive diseases [ 17 , 18 ]. In addition, there are over 30 known cancer susceptibility syndromes, which in aggregate may affect more than 1/500 patients, and the sequence variants that cause these disorders can be readily detected with MPS. These variants can have extremely high relative risks. For some disorders, a rare variant can be associated with a relative risk of greater than 1,000. This is in contrast with common SNP typing which detects variants associated with small relative risks (typically on the order of 1.2-1.5). It is arguable whether the latter type of variant has any clinical utility as an individual test.

Conveying the full scope of genomic interrogation planned for each sample and the volume of information generated for a given participant is impossible. The goal and challenge in this instance is to give the participant as realistic a picture as possible of the likely amount of clinically actionable results the technology can generate. Our approach is two-fold: to give the subjects the clear message that the number and nature of the findings is enormous and literally impossible to describe in a comprehensive manner and to use illustrative examples of the spectrum of these results.

To provide examples, we bin genetic variants into broad categories, as follows: heterozygous carriers of genetic variants implicated in recessive conditions (e.g., CFTR p.Phe508del and cystic fibrosis); variants that cause a treatable disorder that may be present, but asymptomatic or undiagnosed (e.g., LDLR p.Trp87X, familial hypercholesterolemia); variants that predispose to later-onset conditions (e.g., BRCA2 c.5946delT (commonly known as c.6174delT), breast and ovarian cancer susceptibility); variants that predispose to late-onset but untreatable disorders (e.g., frontotemporal dementia MAPT p.Pro301Leu).

Additionally, the scale and scope of the results determines a near certainty that all participants will be found to harbor disease-causing mutations. This is because the interrogation of all genes brings to light the fact that the average human carries 3–5 recessive deleterious genes in addition to the risks for later onset or incompletely penetrant dominant disorders. This reality can be unsettling and surprising to research subjects and we believe it is important to address this early in the process, not downstream in the iterative phase. It is essential for the participants to choose whether MPS research is appropriate for them, taking into account their personal views and values.

Communicate to participants both the overwhelming scale and scope of genetic results they may opt to receive and provide them with specific disease examples that illustrate the kinds of decisions they may need to make as the results become available. These examples should also assist the research subjects in making a decision about whether to participate in the study and if so, the kinds of decisions they may need be making in the future as results become available.

Issue #3: Return of individual genotype results

The return of individual genotype results from MPS presents a new challenge in the clinical research environment, again because of the scale and breadth of the results. The genetic and medical counseling can be challenging because of the volume of results generated, participants’ expectations, the many different categories of results, and the length of time for the information to be available. We suggest that the most reasonable practice is to take a conservative approach and disclose only clinically actionable results. To this end, the absence of a deleterious gene variant (or a negative result) would not be disclosed to research participants. It is our understanding that it is mandatory to validate any individual results that are returned to research subjects in a CLIA-certified laboratory. Using current clinical practice as a standard or benchmark, we suggest that until other approaches are shown to be appropriate and effective, disclosure should take place during a face-to-face encounter involving a multidisciplinary team (geneticist, genetic counselor, and specialists on an ad-hoc basis based on the phenotype in question).

During the initial consent, participants are alerted to the fact that in the future the study team will contact them by telephone and their previously-stated preferences and impressions about receiving primary and secondary variant results will be reviewed. The logistics and details of this future conversation feature prominently in the initial informed consent session, as it is challenging to make and to receive such calls. Participants make a choice to learn or not learn a result each time a result becomes available. Once a participant makes the decision to learn a genotype result, the variant is confirmed in a CLIA lab, and a report is generated. The results are communicated to the participant during a face-to-face meeting with a geneticist and genetic counselor, and with the participation of other specialists depending on the case and the participant’s preferences. These phone discussions are seen as an extension of the initial informed consent process and as opportunities for the participants to make decisions in a more relevant and current context (compared to the original informed consent session). We see this as an iterative approach to consent, also known as circular consent [ 5 ]. Participants who opt not to learn a specific result can still be contacted later if other results become available, unless they choose not to be contacted by us any longer.

This approach to returning results is challenged by the hypothesis-generating genomics research approach. Participants in our hypothesis-testing protocol are not asked to make a decision about learning individual genotype results at the time of consent. This is because we cannot know the nature of the future potential finding at the time of the original consent. Rather, they are engaged in a discussion of what they currently imagine their preferences might be at some future date, again using exemplar disorders and hypothetical scenarios of hypothesis-generating studies.

In the hypothesis-generating study, we have distinct approaches for variants in known disease-causing genes versus variants in genes that are hypothesized to cause disease (the latter being the operative hypothesis generating activity). For the former, the results are handled in a manner quite similar to the hypothesis-testing study. In the latter case, the participant may be asked if they would be willing to return for further phenotyping to help us determine the nature of the variant of uncertain clinical significance (VUCS). The participant is typically informed that they have a sequence variant and that we would like to learn, through clinical research whether this variant has any phenotypic or clinical significance. It is emphasized that current knowledge does not show that the variant causes any phenotype and the chances are high that the variant is benign. However, neither the gene nor the sequence variant is disclosed and the research finding is not confirmed in a CLIA certified lab. This type of VUCS would only be communicated back to the participant if the clinical research showed that the variant was causative, and the return of the result was determined medically appropriate by our Mutation Advisory Committee, and following confirmation in a CLIA-certified laboratory.

For the return of mutations in known, disease causing genes, the initial consent cannot comprehensively inform subjects of the nature of the diseases, because of the scale and scope of the potential results. Instead, exemplars are given to elicit general preferences, which are then affirmed or refined at the time results are available. Hypothesis-generating studies require that subjects receive sufficient information to make an informed choice about participation in the specific follow-up study, with return of individual results only if the cause and effect relationship is established, with appropriate oversight.

Issue #4: Duty to warn

Given the breadth of MPS gene interrogation, it is reasonable to anticipate that occasional participants may have mutations that pose a likely severe negative consequence, which we classify as “panic” results. This models clinical and research practice for the return of results such as a pulmonary mass or high serum potassium level. In contrast to the above-mentioned autosomal recessive carrier states that are expected to be nearly universal, genetic panic results should be uncommon. However, they should not be considered as unanticipated – it is obvious that such variants will be detected and the informed consent process should anticipate these. Examples would be deleterious variants for malignant hyperthermia or Long QT Syndrome, either of which have a substantial risk of sudden death and the risk can be mitigated.

Both our hypothesis-testing and hypothesis-generating studies include mechanisms for the participants to indicate the types of results that they wish to have returned to them. In the hypothesis-testing mode of research this is primarily to respect the autonomy of the participants, but in addition, for the hypothesis-generating study we are assessing the motivations and interests of the subjects in various types of results and manipulating the return of results as an experimental aim. It is our clinical research experience that participants are challenged by making decisions regarding possible future results that are rare, but potentially severe. As well, the medical and social contexts of the subjects evolves over time and the consent that was obtained at enrollment may not be relevant or appropriate at a later time when such a result arises. This is particularly relevant for a research study that is ongoing for substantial periods of time (see also point #7, below).

To address these issues we have consented the subjects to the potential return of “panic” results, irrespective of their preferences at the initial consent session. In effect, the consent process is for some participants a consent to override their preference.

In both hypothesis-testing and hypothesis-generating research it is important to outline circumstances in which researchers’ duty-to-warn may result in a return of results that may be contrary to the preferences of the subject. It is essential that the subjects understand this approach to unusually severe mutation results. Subjects who are uncomfortable with this approach to return of results are encouraged to decline enrollment.

Issue #5: Length of researcher and participant interaction

Approaches to MPS data are evolving rapidly and it is anticipated that this ongoing research into the significance of DNA variants will continue for years or decades. The different purposes of the two study designs lead to different endpoints in terms of researcher’s responsibility to analyze results. In our hypothesis-testing research, discussion of the relationship of the participants to the researchers is framed in terms of the discovery of the primary variant. We ask participants to be willing to interact with us for a period of months or years as it is impossible for to set a specific timeline to determine the cause of the disorder under investigation (if it ever discovered). While attempts to elucidate the primary variant are underway, participants’ genomic data are periodically annotated using the most current bioinformatic methodologies available. We conceptualize our commitment to return re-annotated and updated results to participants as diminishing, but not disappearing, after this initial results’ disclosure. As the primary aim of the study has been accomplished, less attention will be directed to the characterization of ancillary genomic data, yet we believe we retain an obligation to share highly clinically actionable findings with participants should we obtain them.

In the hypothesis-generating study the researcher’s responsibility to annotate participants’ genomes/exomes is ongoing. This is ongoing because, as noted above, one of the experimental aims is to study the motivations and interests of the subjects in these types of results. Determining how this motivation and interest fares over time is an important research goal. During the informed consent discussion it is emphasized that the iterative nature of result interpretation will lead to multiple meetings for the disclosure of clinically actionable results, and that the participant may be contacted months or years after the date of enrollment. Additionally, it is outlined that the participant will make a choice about learning the result each time he/she is re-contacted about the availability of a research finding, and that finding will only be confirmed in a CLIA-certified laboratory if the participant opts to learn the information. Participants who return to discuss results are reminded that they will be contacted in the future if and when other results deemed to be clinically actionable are found for that individual.

Describe nature, mutual commitments, and duration of researcher-participant relationship to participants. For hypothesis-testing studies it is appropriate that the intensity of the clinical annotation of secondary variants may decline when the primary goal of the study is met. For hypothesis-generating studies, such interactions may continue for as long as there are variants to be further evaluated and as long as the subject retains an interest in the participation.

Issue #6: Target population

The informed consent process needs to take into account the target population in terms of their disease phenotype, age, and whether the goal is to enroll individual participants or families. These considerations represent the greatest divergence in approaches to informed consent when comparing hypothesis-testing and hypothesis-generating research. In our two studies, the hypothesis-testing study focuses on rare diseases and often family participation, whereas the hypothesis-generating study focuses on more common diseases and unrelated index cases. There are an infinite number of study designs and investigators may adapt our approaches to informed consent for their own designs.

Our hypothesis-testing protocol enrolls both individual participants and families (most commonly trios), the latter being more common. In hypothesis-testing research, many participants are either affected by a genetic disease or are a close relative (typically a parent) of a person with a genetic disease. The research participants must weigh their hope for, and personal meaning ascribed to, learning the genetic cause for their disorder against the possibility of being in a position to learn a significant amount of unanticipated information. Discussing and addressing the potential discrepancy of the participants’ expectations of the value of their results and what they may realistically stand to learn (both desired and undesired information) is a central component of the informed consent process.

In our hypothesis-testing protocol, when parents are consenting on behalf of a minor child, we review with them the issues surrounding genetic testing of children and discuss their attitudes regarding their child’s autonomy and their parental decision-making values. Because family trios (most often mother-father-child) are enrolled together, we discuss how one individual’s preferences regarding results may be disrupted or superseded by another family member’s choice and communication of that individual’s knowledge.

In contrast, our hypothesis-generating protocol enrolls as probands or primary participants older, unrelated individuals [ 19 ]. Most participants are self-selected in terms of their decision to enroll and are not enrolled because they or a relative have a rare disease. Participants in the hypothesis-generating protocol are consented for future exploration of any and all possible phenotypes. This is a key distinguishing feature of this hypothesis-generating approach to research, which is a different paradigm – going from genotype to phenotype. The participants may be invited for additional phenotyping. In fact, multiple satellite studies are ongoing to evaluate various subsets of participants for different phenotypes. The key with the consent for these subjects is to initially communicate to the subjects the general approach – that their genome will be explored, variations will be identified, and they may be re-contacted for a potential follow-up study to understand the potential relationship of that variant to their phenotype. These subsequent consents for follow-up studies are considered an iterative consent process, which is similar to the Informed Cohort concept [ 20 ].

Hypothesis-generating research is a novel approach to clinical research design and requires an ongoing, iterative approach to informed consent. For hypothesis-testing research a key informed consent issue is for the subjects to balance the desire for information on the primary disease causing mutation with the pros and cons of obtaining possibly undesired information on secondary variants.

Issue #7: Privacy and confidentiality

In MPS studies, privacy and confidentiality is a complex and multifaceted issue. Some potential challenges include: the deposition of genetic and phenotypic data in public databases, the placement of CLIA-validated results in the individual’s medical chart, and the discovery of secondary variants in relatives of affected probands in family-based (typically hypothesis-testing) research.

The field of genomics has a tradition of deposition of data in publicly accessible databases. Participants in our protocols are informed that the goal of sharing de-identified information in public databases is to advance research, and that there are methods in place maximize the privacy and confidentiality of personally identifiable information. However, the deposition of genomic-scale data for an individual participant, such as a MPS sequence, is far above the minimal amount of data to uniquely identify the sample [ 21 , 22 ]. Therefore, the participants should be made aware that the scale of the data could allow analysts to connect sequence data to individuals by matching variants in the deposited research data to other data from that person. As well, the public deposition of data in some cases is an irrevocable decision. Once the data are deposited and distributed, it may be impossible to remove the data from all computer servers, should the subject decide to withdraw from the study.

Additionally, participants are informed that once a result is CLIA-certified, that result is placed in the individual’s medical chart of the clinical research institution and may be accessible by third parties. Although there are state and federal laws to protect individuals against genetic discrimination, including GINA, this law has not yet been tested in the courts. This is explained to participants up front at the time of enrollment and a more detailed discussion takes place at the time of results disclosure. To offer additional protection in the event of a court subpoena, a Certificate of Confidentiality has been obtained in the hypothesis-testing and hypothesis-generating protocols. The discussion surrounding privacy and confidentiality is approached in a similar manner in both protocols.

The third issue regarding confidentiality is that MPS can generate many results in each individual and it is highly likely that some, if not all, of the variants detected in one research participant may be present in another research participant (e.g., a parent). This is again a consequence of the scale and breadth of MPS in that the large number of variants that can be detected in each participant makes it exceedingly likely that their relatives share many of these variants and that their genetic risks of rare diseases may be measurably altered. It is important to communicate to the participants that it is likely that such variants can be detected and that they may have implications for other members of the family, and that the consented individuals, or their parent may need to communicate those results to other members of the family.

The informed consent should include discussion of public deposition of data, the entry of CLIA-validated results into medical records, and the likely discovery of variants with implications for family members.

We describe an approach to the informed consent process as a mutual opportunity for researchers and participants to assess one another’s goals in MPS protocols that employ both hypothesis-generating and hypothesis-testing methodologies. The use of MPS in clinical research requires adaptation of established processes of human subjects protections. The potentially overwhelming scale of information generated by MPS necessitates that investigators and IRBs adapt traditional approaches to consent the subjects. Because nearly all subjects will have a clinically actionable result, investigators must implement thoughtful plan for consent regarding results disclosure, including setting a threshold for the types of information that should be disclosed to the participants.

While some of the informed consent issues for MPS are independent of the study design, others should be adapted based on whether the research study is employing MPS to test a hypothesis (i.e., find the cause of a rare condition in an affected cohort), or to generate hypotheses (i.e., find deleterious or potentially deleterious variants that warrant participant follow-up and further investigation). For example, the health-related attributes of the study cohort (healthy individuals versus disease patients) are likely to influence participants’ motivations and expectations of MPS, and in the case of a disease cohort create the need to dichotomize the genetic variants into primary and secondary. Conversely, issues inherent to MPS technology are central to the informed consent approach in both types of studies. The availability of MPS allows a paradigm shift in genetics research – no longer are investigators constrained to long-standing approaches of hypothesis-testing modes of research. The scale of MPS allows investigators to proceed from genotype to phenotype, and leads to new challenges for genetic and medical counseling. Research participants receiving results from MPS might not present with a personal and/or family history suggestive of conditions revealed by their genotypic variants, and consequently might not perceive their a priori risk to be elevated for those conditions.

Participants’ motivations to have whole genome/exome sequencing at this early stage are important to take into consideration in the informed consent process. Initial qualitative data suggest that individuals enroll in the hypothesis-generating study because of altruism in promoting research, and a desire to learn about genetic factors that contribute to their own health and disease risk [ 23 ]. Most participants expect that genomic information will improve the overall knowledge of disease causes and treatments. Moreover, data on research participants’ preferences to receive different types of genetic results suggest that they have strong intentions to receive all types of results [ 16 ]. However, they are able to discern between the types and quality of information they could learn, and demonstrate stronger attitudes to learn clinically actionable and carrier status results when compared to results that are uncertain or not clinically actionable. These findings provide initial insights into the value these early adopters place on information generated by high-throughput sequencing studies, and help us tailor the informed consent process to this group of individuals. However, more empirical data are needed to guide the informed consent process, including data on research participants’ ability to receive results for multiple disorders and traits.

Participants in both types of studies are engaged in a discussion of the complex and dynamic nature of genomic annotation so that they may make an informed decision about participation and may be aware of the need to revisit results learned at additional time points in the future. As well, we advocate a process whereby investigators retain some latitude with respect to the most serious, potentially life-threatening mutations. While it is mandatory to respect the autonomy of research subjects, this does not mean that investigators must accede to the research subject’s views of these “panic” results. In a paradoxical way, the research participant and the researcher can agree that the latter can maintain a small, but initially ambiguous degree of latitude with respect to these most serious variants. In the course of utilizing MPS technology for further elucidation of the genetic architecture of health and disease, it is imperative that research participants and researchers be engaged in a continuous discussion about the state of scientific knowledge and the types of information that could potentially be learned from MPS. Although resource-intensive, this “partnership model” [ 2 ] or informed cohort approach to informed consent promotes respect for participants, and allows evaluation of the benefits and harms of disclosure in a more timely and relevant manner.

We have here proposed a categorization of massively-parallel clinical genomics research studies as hypothesis-testing versus hypothesis-generating to help clarify the issue of so-called incidental or secondary results for the consent process, and aid the communication of the research goals to study participants. By using this categorization approach and considering seven important features of this kind of research (Primary versus secondary variant results and the open-ended nature of clinical genomics, Volume and nature of information, Return of individual genotype results, Duty to warn, Length of researcher and participant interaction, Target population, and Privacy and confidentiality) researchers can design an informed consent process that is open, transparent, and appropriately balances risks and benefits of this exciting approach to heritable disease research.

This study was supported by funding from the Intramural Research Program of the National Human Genome Research Institute. The authors have no conflicts to declare.

Netzer C, Klein C, Kohlhase J, Kubisch C: New challenges for informed consent through whole genome array testing. J Med Genet. 2009, 46: 495-496. 10.1136/jmg.2009.068015.

Article CAS PubMed Google Scholar

McGuire AL, Beskow LM: Informed consent in genomics and genetic research. Annu Rev Genomics Hum Genet. 2010, 11: 361-381. 10.1146/annurev-genom-082509-141711.

Article CAS PubMed PubMed Central Google Scholar

Bookman EB, Langehorne AA, Eckfeldt JH, Glass KC, Jarvik GP, Klag M, Koski G, Motulsky A, Wilfond B, Manolio TA, Fabsitz RR, Luepker RV, NHLBI Working Group: Reporting genetic results in research studies: Summary and recommendations of an NHLBI Working Group. Am J Med Genet A. 2006, 140: 1033-1040.

Article PubMed PubMed Central Google Scholar

Ng PC, Kirkness EF: Whole genome sequencing. Methods Mol Biol. 2010, 628: 215-226. 10.1007/978-1-60327-367-1_12.

Mascalzoni D, Hicks A, Pramstaller P, Wjst M: Informed consent in the genomics era. PLoS Med. 2008, 5: e192-10.1371/journal.pmed.0050192.

Rotimi CN, Marshall PA: Tailoring the process of informed consent in genetic and genomic research. Genome Med. 2010, 2: 20-10.1186/gm141.

Bredenoord AL, Kroes HY, Cuppen E, Parker M, van Delden JJ: Disclosure of individual genetic data to research participants: the debate reconsidered. Trends Genet. 2011, 27: 41-47. 10.1016/j.tig.2010.11.004.

Kronenthal C, Delaney SK, Christman MF: Broadening research consent in the era of genome-informed medicine. Genet Med. 2012, 14: 432-436. 10.1038/gim.2011.76.

Article PubMed Google Scholar

Forsberg JS, Hansson MG, Eriksson S: Changing perspectives in biobank research: from individual rights to concerns about public health regarding the return of results. Eur J Hum Genet. 2009, 17: 1544-1549. 10.1038/ejhg.2009.87.

Shalowitz DI, Miller FG: Disclosing individual results of clinical research: implications of respect for participants. JAMA. 2005, 294: 737-740. 10.1001/jama.294.6.737.

Fernandez CV, Kodish E, Weijer C: Informing study participants of research results: an ethical imperative. IRB. 2003, 25: 12-19.

McGuire AL, Lupski JR: Personal genome research: what should the participant be told?. Trends Genet. 2010, 26: 199-201. 10.1016/j.tig.2009.12.007.

Wolf SM, Lawrenz FP, Nelson CA, Kahn JP, Cho MK, Clayton EW, Fletcher JG, Georgieff MK, Hammerschmidt D, Hudson K, Illes J, Kapur V, Keane MA, Koenig BA, Leroy BS, McFarland EG, Paradise J, Parker LS, Terry SF, Van Ness B, Wilfond BS: Managing incidental findings in human subjects research: analysis and recommendations. J Law Med Ethics. 2008, 36: 219-248. 10.1111/j.1748-720X.2008.00266.x.

Kohane IS, Taylor PL: Multidimensional results reporting to participants in genomic studies: Getting it right. Sci Transl Med. 2010, 2: 37cm19-10.1126/scitranslmed.3000809.

Fabsitz RR, McGuire A, Sharp RR, Puggal M, Beskow LM, Biesecker LG, Bookman E, Burke W, Burchard EG, Church G, Clayton EW, Eckfeldt JH, Fernandez CV, Fisher R, Fullerton SM, Gabriel S, Gachupin F, James C, Jarvik GP, Kittles R, Leib JR, O'Donnell C, O'Rourke PP, Rodriguez LL, Schully SD, Shuldiner AR, Sze RK, Thakuria JV, Wolf SM, Burke GL, National Heart, Lung, and Blood Institute working group: Ethical and practical guidelines for reporting genetic research results to study participants: updated guidelines from a national heart, lung, and blood institute working group. Circ Cardiovasc Genet. 2010, 3: 574-580. 10.1161/CIRCGENETICS.110.958827.

Facio FM, Fisher T, Eidem H, Brooks S, Linn A, Biesecker LG, Biesecker BB: Intentions to receive individual results from whole-genome sequencing among participants in the ClinSeqTM study. Eu J Hum Genet. in press

Morton NE: The detection and estimation of linkage between the genes for elliptocytosis and the Rh blood type. Am J Hum Genet. 1956, 8: 80-96.

CAS PubMed PubMed Central Google Scholar

Morton NE: The mutational load due to detrimental genes in man. Am J Hum Genet. 1960, 12: 348-364.

Biesecker LG, Mullikin JC, Facio FM, Turner C, Cherukuri PF, Blakesley RW, Bouffard GG, Chines PS, Cruz P, Hansen NF, Teer JK, Maskeri B, Young AC, Manolio TA, Wilson AF, Finkel T, Hwang P, Arai A, Remaley AT, Sachdev V, Shamburek R, Cannon RO, Green ED, NISC Comparative Sequencing Program: The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine. Genome Res. 2009, 19: 1665-1674. 10.1101/gr.092841.109.

Kohane IS, Mandl KD, Taylor PL, Holm IA, Nigrin DJ, Kunkel LM: Medicine. Reestablishing the researcher-patient compact. Science. 2007, 316: 836-837. 10.1126/science.1135489.

Lin Z, Owen AB, Altman RB: Genomic Research and Human Subject Privacy. Science. 2004, 305: 183-10.1126/science.1095019.

Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, Pearson JV, Stephan DA, Nelson SF, Craig DW: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008, 29: e1000167.

Article Google Scholar

Facio FM, Brooks S, Loewenstein J, Green S, Biesecker LG, Biesecker BB: Motivators for participation in a whole-genome sequencing study: implications for translational genomics research. Eur J Hum Genet. 2011, 19: 1213-1217. 10.1038/ejhg.2011.123.

Pre-publication history

The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1755-8794/5/45/prepub

Download references

Author information

Authors and affiliations.

National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA

Flavia M Facio, Julie C Sapp, Amy Linn & Leslie G Biesecker

Kennedy Krieger Institute, Baltimore, MD, USA

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leslie G Biesecker .

Additional information

Competing interests.

LGB is an uncompensated consultant to, and collaborates with, the Illumina Corp.

Authors’ contributions

FMF and JCS drafted the initial manuscript. LGB Organized and edited the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Facio, F.M., Sapp, J.C., Linn, A. et al. Approaches to informed consent for hypothesis-testing and hypothesis-generating clinical genomics research. BMC Med Genomics 5 , 45 (2012). https://doi.org/10.1186/1755-8794-5-45

Download citation

Received : 07 November 2011

Accepted : 05 October 2012

Published : 10 October 2012

DOI : https://doi.org/10.1186/1755-8794-5-45

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Whole genome sequencing
Whole exome sequencing
Informed consent

BMC Medical Genomics

ISSN: 1755-8794

General enquiries: [email protected]

Search Menu

Sign in through your institution

Browse content in A - General Economics and Teaching
Browse content in A1 - General Economics
A11 - Role of Economics; Role of Economists; Market for Economists
Browse content in B - History of Economic Thought, Methodology, and Heterodox Approaches
Browse content in B4 - Economic Methodology
B49 - Other
Browse content in C - Mathematical and Quantitative Methods
Browse content in C0 - General
C00 - General
C01 - Econometrics
Browse content in C1 - Econometric and Statistical Methods and Methodology: General
C10 - General
C11 - Bayesian Analysis: General
C12 - Hypothesis Testing: General
C13 - Estimation: General
C14 - Semiparametric and Nonparametric Methods: General
C18 - Methodological Issues: General
Browse content in C2 - Single Equation Models; Single Variables
C21 - Cross-Sectional Models; Spatial Models; Treatment Effect Models; Quantile Regressions
C23 - Panel Data Models; Spatio-temporal Models
C26 - Instrumental Variables (IV) Estimation
Browse content in C3 - Multiple or Simultaneous Equation Models; Multiple Variables
C30 - General
C31 - Cross-Sectional Models; Spatial Models; Treatment Effect Models; Quantile Regressions; Social Interaction Models
C32 - Time-Series Models; Dynamic Quantile Regressions; Dynamic Treatment Effect Models; Diffusion Processes; State Space Models
C35 - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions
Browse content in C4 - Econometric and Statistical Methods: Special Topics
C40 - General
Browse content in C5 - Econometric Modeling
C52 - Model Evaluation, Validation, and Selection
C53 - Forecasting and Prediction Methods; Simulation Methods
C55 - Large Data Sets: Modeling and Analysis
Browse content in C6 - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling
C63 - Computational Techniques; Simulation Modeling
C67 - Input-Output Models
Browse content in C7 - Game Theory and Bargaining Theory
C71 - Cooperative Games
C72 - Noncooperative Games
C73 - Stochastic and Dynamic Games; Evolutionary Games; Repeated Games
C78 - Bargaining Theory; Matching Theory
C79 - Other
Browse content in C8 - Data Collection and Data Estimation Methodology; Computer Programs
C83 - Survey Methods; Sampling Methods
Browse content in C9 - Design of Experiments
C90 - General
C91 - Laboratory, Individual Behavior
C92 - Laboratory, Group Behavior
C93 - Field Experiments
C99 - Other
Browse content in D - Microeconomics
Browse content in D0 - General
D00 - General
D01 - Microeconomic Behavior: Underlying Principles
D02 - Institutions: Design, Formation, Operations, and Impact
D03 - Behavioral Microeconomics: Underlying Principles
D04 - Microeconomic Policy: Formulation; Implementation, and Evaluation
Browse content in D1 - Household Behavior and Family Economics
D10 - General
D11 - Consumer Economics: Theory
D12 - Consumer Economics: Empirical Analysis
D13 - Household Production and Intrahousehold Allocation
D14 - Household Saving; Personal Finance
D15 - Intertemporal Household Choice: Life Cycle Models and Saving
D18 - Consumer Protection
Browse content in D2 - Production and Organizations
D20 - General
D21 - Firm Behavior: Theory
D22 - Firm Behavior: Empirical Analysis
D23 - Organizational Behavior; Transaction Costs; Property Rights
D24 - Production; Cost; Capital; Capital, Total Factor, and Multifactor Productivity; Capacity
Browse content in D3 - Distribution
D30 - General
D31 - Personal Income, Wealth, and Their Distributions
D33 - Factor Income Distribution
Browse content in D4 - Market Structure, Pricing, and Design
D40 - General
D41 - Perfect Competition
D42 - Monopoly
D43 - Oligopoly and Other Forms of Market Imperfection
D44 - Auctions
D47 - Market Design
D49 - Other
Browse content in D5 - General Equilibrium and Disequilibrium
D50 - General
D51 - Exchange and Production Economies
D52 - Incomplete Markets
D53 - Financial Markets
D57 - Input-Output Tables and Analysis
Browse content in D6 - Welfare Economics
D60 - General
D61 - Allocative Efficiency; Cost-Benefit Analysis
D62 - Externalities
D63 - Equity, Justice, Inequality, and Other Normative Criteria and Measurement
D64 - Altruism; Philanthropy
D69 - Other
Browse content in D7 - Analysis of Collective Decision-Making
D70 - General
D71 - Social Choice; Clubs; Committees; Associations
D72 - Political Processes: Rent-seeking, Lobbying, Elections, Legislatures, and Voting Behavior
D73 - Bureaucracy; Administrative Processes in Public Organizations; Corruption
D74 - Conflict; Conflict Resolution; Alliances; Revolutions
D78 - Positive Analysis of Policy Formulation and Implementation
Browse content in D8 - Information, Knowledge, and Uncertainty
D80 - General
D81 - Criteria for Decision-Making under Risk and Uncertainty
D82 - Asymmetric and Private Information; Mechanism Design
D83 - Search; Learning; Information and Knowledge; Communication; Belief; Unawareness
D84 - Expectations; Speculations
D85 - Network Formation and Analysis: Theory
D86 - Economics of Contract: Theory
D89 - Other
Browse content in D9 - Micro-Based Behavioral Economics
D90 - General
D91 - Role and Effects of Psychological, Emotional, Social, and Cognitive Factors on Decision Making
D92 - Intertemporal Firm Choice, Investment, Capacity, and Financing
Browse content in E - Macroeconomics and Monetary Economics
Browse content in E0 - General
E00 - General
E01 - Measurement and Data on National Income and Product Accounts and Wealth; Environmental Accounts
E02 - Institutions and the Macroeconomy
E03 - Behavioral Macroeconomics
Browse content in E1 - General Aggregative Models
E10 - General
E12 - Keynes; Keynesian; Post-Keynesian
E13 - Neoclassical
Browse content in E2 - Consumption, Saving, Production, Investment, Labor Markets, and Informal Economy
E20 - General
E21 - Consumption; Saving; Wealth
E22 - Investment; Capital; Intangible Capital; Capacity
E23 - Production
E24 - Employment; Unemployment; Wages; Intergenerational Income Distribution; Aggregate Human Capital; Aggregate Labor Productivity
E25 - Aggregate Factor Income Distribution
Browse content in E3 - Prices, Business Fluctuations, and Cycles
E30 - General
E31 - Price Level; Inflation; Deflation
E32 - Business Fluctuations; Cycles
E37 - Forecasting and Simulation: Models and Applications
Browse content in E4 - Money and Interest Rates
E40 - General
E41 - Demand for Money
E42 - Monetary Systems; Standards; Regimes; Government and the Monetary System; Payment Systems
E43 - Interest Rates: Determination, Term Structure, and Effects
E44 - Financial Markets and the Macroeconomy
Browse content in E5 - Monetary Policy, Central Banking, and the Supply of Money and Credit
E50 - General
E51 - Money Supply; Credit; Money Multipliers
E52 - Monetary Policy
E58 - Central Banks and Their Policies
Browse content in E6 - Macroeconomic Policy, Macroeconomic Aspects of Public Finance, and General Outlook
E60 - General
E62 - Fiscal Policy
E66 - General Outlook and Conditions
Browse content in E7 - Macro-Based Behavioral Economics
E71 - Role and Effects of Psychological, Emotional, Social, and Cognitive Factors on the Macro Economy
Browse content in F - International Economics
Browse content in F0 - General
F00 - General
Browse content in F1 - Trade
F10 - General
F11 - Neoclassical Models of Trade
F12 - Models of Trade with Imperfect Competition and Scale Economies; Fragmentation
F13 - Trade Policy; International Trade Organizations
F14 - Empirical Studies of Trade
F15 - Economic Integration
F16 - Trade and Labor Market Interactions
F18 - Trade and Environment
Browse content in F2 - International Factor Movements and International Business
F20 - General
F21 - International Investment; Long-Term Capital Movements
F22 - International Migration
F23 - Multinational Firms; International Business
Browse content in F3 - International Finance
F30 - General
F31 - Foreign Exchange
F32 - Current Account Adjustment; Short-Term Capital Movements
F34 - International Lending and Debt Problems
F35 - Foreign Aid
F36 - Financial Aspects of Economic Integration
Browse content in F4 - Macroeconomic Aspects of International Trade and Finance
F40 - General
F41 - Open Economy Macroeconomics
F42 - International Policy Coordination and Transmission
F43 - Economic Growth of Open Economies
F44 - International Business Cycles
Browse content in F5 - International Relations, National Security, and International Political Economy
F50 - General
F51 - International Conflicts; Negotiations; Sanctions
F52 - National Security; Economic Nationalism
F55 - International Institutional Arrangements
Browse content in F6 - Economic Impacts of Globalization
F60 - General
F61 - Microeconomic Impacts
F62 - Macroeconomic Impacts
F63 - Economic Development
Browse content in G - Financial Economics
Browse content in G0 - General
G00 - General
G01 - Financial Crises
G02 - Behavioral Finance: Underlying Principles
Browse content in G1 - General Financial Markets
G10 - General
G11 - Portfolio Choice; Investment Decisions
G12 - Asset Pricing; Trading volume; Bond Interest Rates
G14 - Information and Market Efficiency; Event Studies; Insider Trading
G15 - International Financial Markets
G18 - Government Policy and Regulation
G19 - Other
Browse content in G2 - Financial Institutions and Services
G20 - General
G21 - Banks; Depository Institutions; Micro Finance Institutions; Mortgages
G22 - Insurance; Insurance Companies; Actuarial Studies
G23 - Non-bank Financial Institutions; Financial Instruments; Institutional Investors
G24 - Investment Banking; Venture Capital; Brokerage; Ratings and Ratings Agencies
G28 - Government Policy and Regulation
Browse content in G3 - Corporate Finance and Governance
G30 - General
G31 - Capital Budgeting; Fixed Investment and Inventory Studies; Capacity
G32 - Financing Policy; Financial Risk and Risk Management; Capital and Ownership Structure; Value of Firms; Goodwill
G33 - Bankruptcy; Liquidation
G34 - Mergers; Acquisitions; Restructuring; Corporate Governance
G38 - Government Policy and Regulation
Browse content in G4 - Behavioral Finance
G40 - General
G41 - Role and Effects of Psychological, Emotional, Social, and Cognitive Factors on Decision Making in Financial Markets
Browse content in G5 - Household Finance
G50 - General
G51 - Household Saving, Borrowing, Debt, and Wealth
Browse content in H - Public Economics
Browse content in H0 - General
H00 - General
Browse content in H1 - Structure and Scope of Government
H10 - General
H11 - Structure, Scope, and Performance of Government
Browse content in H2 - Taxation, Subsidies, and Revenue
H20 - General
H21 - Efficiency; Optimal Taxation
H22 - Incidence
H23 - Externalities; Redistributive Effects; Environmental Taxes and Subsidies
H24 - Personal Income and Other Nonbusiness Taxes and Subsidies; includes inheritance and gift taxes
H25 - Business Taxes and Subsidies
H26 - Tax Evasion and Avoidance
Browse content in H3 - Fiscal Policies and Behavior of Economic Agents
H31 - Household
Browse content in H4 - Publicly Provided Goods
H40 - General
H41 - Public Goods
H42 - Publicly Provided Private Goods
H44 - Publicly Provided Goods: Mixed Markets
Browse content in H5 - National Government Expenditures and Related Policies
H50 - General
H51 - Government Expenditures and Health
H52 - Government Expenditures and Education
H53 - Government Expenditures and Welfare Programs
H54 - Infrastructures; Other Public Investment and Capital Stock
H55 - Social Security and Public Pensions
H56 - National Security and War
H57 - Procurement
Browse content in H6 - National Budget, Deficit, and Debt
H63 - Debt; Debt Management; Sovereign Debt
Browse content in H7 - State and Local Government; Intergovernmental Relations
H70 - General
H71 - State and Local Taxation, Subsidies, and Revenue
H73 - Interjurisdictional Differentials and Their Effects
H75 - State and Local Government: Health; Education; Welfare; Public Pensions
H76 - State and Local Government: Other Expenditure Categories
H77 - Intergovernmental Relations; Federalism; Secession
Browse content in H8 - Miscellaneous Issues
H81 - Governmental Loans; Loan Guarantees; Credits; Grants; Bailouts
H83 - Public Administration; Public Sector Accounting and Audits
H87 - International Fiscal Issues; International Public Goods
Browse content in I - Health, Education, and Welfare
Browse content in I0 - General
I00 - General
Browse content in I1 - Health
I10 - General
I11 - Analysis of Health Care Markets
I12 - Health Behavior
I13 - Health Insurance, Public and Private
I14 - Health and Inequality
I15 - Health and Economic Development
I18 - Government Policy; Regulation; Public Health
Browse content in I2 - Education and Research Institutions
I20 - General
I21 - Analysis of Education
I22 - Educational Finance; Financial Aid
I23 - Higher Education; Research Institutions
I24 - Education and Inequality
I25 - Education and Economic Development
I26 - Returns to Education
I28 - Government Policy
Browse content in I3 - Welfare, Well-Being, and Poverty
I30 - General
I31 - General Welfare
I32 - Measurement and Analysis of Poverty
I38 - Government Policy; Provision and Effects of Welfare Programs
Browse content in J - Labor and Demographic Economics
Browse content in J0 - General
J00 - General
J01 - Labor Economics: General
J08 - Labor Economics Policies
Browse content in J1 - Demographic Economics
J10 - General
J11 - Demographic Trends, Macroeconomic Effects, and Forecasts
J12 - Marriage; Marital Dissolution; Family Structure; Domestic Abuse
J13 - Fertility; Family Planning; Child Care; Children; Youth
J14 - Economics of the Elderly; Economics of the Handicapped; Non-Labor Market Discrimination
J15 - Economics of Minorities, Races, Indigenous Peoples, and Immigrants; Non-labor Discrimination
J16 - Economics of Gender; Non-labor Discrimination
J18 - Public Policy
Browse content in J2 - Demand and Supply of Labor
J20 - General
J21 - Labor Force and Employment, Size, and Structure
J22 - Time Allocation and Labor Supply
J23 - Labor Demand
J24 - Human Capital; Skills; Occupational Choice; Labor Productivity
J26 - Retirement; Retirement Policies
Browse content in J3 - Wages, Compensation, and Labor Costs
J30 - General
J31 - Wage Level and Structure; Wage Differentials
J33 - Compensation Packages; Payment Methods
J38 - Public Policy
Browse content in J4 - Particular Labor Markets
J40 - General
J42 - Monopsony; Segmented Labor Markets
J44 - Professional Labor Markets; Occupational Licensing
J45 - Public Sector Labor Markets
J48 - Public Policy
J49 - Other
Browse content in J5 - Labor-Management Relations, Trade Unions, and Collective Bargaining
J50 - General
J51 - Trade Unions: Objectives, Structure, and Effects
J53 - Labor-Management Relations; Industrial Jurisprudence
Browse content in J6 - Mobility, Unemployment, Vacancies, and Immigrant Workers
J60 - General
J61 - Geographic Labor Mobility; Immigrant Workers
J62 - Job, Occupational, and Intergenerational Mobility
J63 - Turnover; Vacancies; Layoffs
J64 - Unemployment: Models, Duration, Incidence, and Job Search
J65 - Unemployment Insurance; Severance Pay; Plant Closings
J68 - Public Policy
Browse content in J7 - Labor Discrimination
J71 - Discrimination
J78 - Public Policy
Browse content in J8 - Labor Standards: National and International
J81 - Working Conditions
J88 - Public Policy
Browse content in K - Law and Economics
Browse content in K0 - General
K00 - General
Browse content in K1 - Basic Areas of Law
K14 - Criminal Law
K2 - Regulation and Business Law
Browse content in K3 - Other Substantive Areas of Law
K31 - Labor Law
K36 - Family and Personal Law
Browse content in K4 - Legal Procedure, the Legal System, and Illegal Behavior
K40 - General
K41 - Litigation Process
K42 - Illegal Behavior and the Enforcement of Law
Browse content in L - Industrial Organization
Browse content in L0 - General
L00 - General
Browse content in L1 - Market Structure, Firm Strategy, and Market Performance
L10 - General
L11 - Production, Pricing, and Market Structure; Size Distribution of Firms
L13 - Oligopoly and Other Imperfect Markets
L14 - Transactional Relationships; Contracts and Reputation; Networks
L15 - Information and Product Quality; Standardization and Compatibility
L16 - Industrial Organization and Macroeconomics: Industrial Structure and Structural Change; Industrial Price Indices
L19 - Other
Browse content in L2 - Firm Objectives, Organization, and Behavior
L21 - Business Objectives of the Firm
L22 - Firm Organization and Market Structure
L23 - Organization of Production
L24 - Contracting Out; Joint Ventures; Technology Licensing
L25 - Firm Performance: Size, Diversification, and Scope
L26 - Entrepreneurship
Browse content in L3 - Nonprofit Organizations and Public Enterprise
L33 - Comparison of Public and Private Enterprises and Nonprofit Institutions; Privatization; Contracting Out
Browse content in L4 - Antitrust Issues and Policies
L40 - General
L41 - Monopolization; Horizontal Anticompetitive Practices
L42 - Vertical Restraints; Resale Price Maintenance; Quantity Discounts
Browse content in L5 - Regulation and Industrial Policy
L50 - General
L51 - Economics of Regulation
Browse content in L6 - Industry Studies: Manufacturing
L60 - General
L62 - Automobiles; Other Transportation Equipment; Related Parts and Equipment
L63 - Microelectronics; Computers; Communications Equipment
L66 - Food; Beverages; Cosmetics; Tobacco; Wine and Spirits
Browse content in L7 - Industry Studies: Primary Products and Construction
L71 - Mining, Extraction, and Refining: Hydrocarbon Fuels
L73 - Forest Products
Browse content in L8 - Industry Studies: Services
L81 - Retail and Wholesale Trade; e-Commerce
L83 - Sports; Gambling; Recreation; Tourism
L84 - Personal, Professional, and Business Services
L86 - Information and Internet Services; Computer Software
Browse content in L9 - Industry Studies: Transportation and Utilities
L91 - Transportation: General
L93 - Air Transportation
L94 - Electric Utilities
Browse content in M - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics
Browse content in M1 - Business Administration
M11 - Production Management
M12 - Personnel Management; Executives; Executive Compensation
M14 - Corporate Culture; Social Responsibility
Browse content in M2 - Business Economics
M21 - Business Economics
Browse content in M3 - Marketing and Advertising
M31 - Marketing
M37 - Advertising
Browse content in M4 - Accounting and Auditing
M42 - Auditing
M48 - Government Policy and Regulation
Browse content in M5 - Personnel Economics
M50 - General
M51 - Firm Employment Decisions; Promotions
M52 - Compensation and Compensation Methods and Their Effects
M53 - Training
M54 - Labor Management
Browse content in N - Economic History
Browse content in N0 - General
N00 - General
N01 - Development of the Discipline: Historiographical; Sources and Methods
Browse content in N1 - Macroeconomics and Monetary Economics; Industrial Structure; Growth; Fluctuations
N10 - General, International, or Comparative
N11 - U.S.; Canada: Pre-1913
N12 - U.S.; Canada: 1913-
N13 - Europe: Pre-1913
N17 - Africa; Oceania
Browse content in N2 - Financial Markets and Institutions
N20 - General, International, or Comparative
N22 - U.S.; Canada: 1913-
N23 - Europe: Pre-1913
Browse content in N3 - Labor and Consumers, Demography, Education, Health, Welfare, Income, Wealth, Religion, and Philanthropy
N30 - General, International, or Comparative
N31 - U.S.; Canada: Pre-1913
N32 - U.S.; Canada: 1913-
N33 - Europe: Pre-1913
N34 - Europe: 1913-
N36 - Latin America; Caribbean
N37 - Africa; Oceania
Browse content in N4 - Government, War, Law, International Relations, and Regulation
N40 - General, International, or Comparative
N41 - U.S.; Canada: Pre-1913
N42 - U.S.; Canada: 1913-
N43 - Europe: Pre-1913
N44 - Europe: 1913-
N45 - Asia including Middle East
N47 - Africa; Oceania
Browse content in N5 - Agriculture, Natural Resources, Environment, and Extractive Industries
N50 - General, International, or Comparative
N51 - U.S.; Canada: Pre-1913
Browse content in N6 - Manufacturing and Construction
N63 - Europe: Pre-1913
Browse content in N7 - Transport, Trade, Energy, Technology, and Other Services
N71 - U.S.; Canada: Pre-1913
Browse content in N8 - Micro-Business History
N82 - U.S.; Canada: 1913-
Browse content in N9 - Regional and Urban History
N91 - U.S.; Canada: Pre-1913
N92 - U.S.; Canada: 1913-
N93 - Europe: Pre-1913
N94 - Europe: 1913-
Browse content in O - Economic Development, Innovation, Technological Change, and Growth
Browse content in O1 - Economic Development
O10 - General
O11 - Macroeconomic Analyses of Economic Development
O12 - Microeconomic Analyses of Economic Development
O13 - Agriculture; Natural Resources; Energy; Environment; Other Primary Products
O14 - Industrialization; Manufacturing and Service Industries; Choice of Technology
O15 - Human Resources; Human Development; Income Distribution; Migration
O16 - Financial Markets; Saving and Capital Investment; Corporate Finance and Governance
O17 - Formal and Informal Sectors; Shadow Economy; Institutional Arrangements
O18 - Urban, Rural, Regional, and Transportation Analysis; Housing; Infrastructure
O19 - International Linkages to Development; Role of International Organizations
Browse content in O2 - Development Planning and Policy
O23 - Fiscal and Monetary Policy in Development
O25 - Industrial Policy
Browse content in O3 - Innovation; Research and Development; Technological Change; Intellectual Property Rights
O30 - General
O31 - Innovation and Invention: Processes and Incentives
O32 - Management of Technological Innovation and R&D
O33 - Technological Change: Choices and Consequences; Diffusion Processes
O34 - Intellectual Property and Intellectual Capital
O38 - Government Policy
Browse content in O4 - Economic Growth and Aggregate Productivity
O40 - General
O41 - One, Two, and Multisector Growth Models
O43 - Institutions and Growth
O44 - Environment and Growth
O47 - Empirical Studies of Economic Growth; Aggregate Productivity; Cross-Country Output Convergence
Browse content in O5 - Economywide Country Studies
O52 - Europe
O53 - Asia including Middle East
O55 - Africa
Browse content in P - Economic Systems
Browse content in P0 - General
P00 - General
Browse content in P1 - Capitalist Systems
P10 - General
P16 - Political Economy
P17 - Performance and Prospects
P18 - Energy: Environment
Browse content in P2 - Socialist Systems and Transitional Economies
P26 - Political Economy; Property Rights
Browse content in P3 - Socialist Institutions and Their Transitions
P37 - Legal Institutions; Illegal Behavior
Browse content in P4 - Other Economic Systems
P48 - Political Economy; Legal Institutions; Property Rights; Natural Resources; Energy; Environment; Regional Studies
Browse content in P5 - Comparative Economic Systems
P51 - Comparative Analysis of Economic Systems
Browse content in Q - Agricultural and Natural Resource Economics; Environmental and Ecological Economics
Browse content in Q1 - Agriculture
Q10 - General
Q12 - Micro Analysis of Farm Firms, Farm Households, and Farm Input Markets
Q13 - Agricultural Markets and Marketing; Cooperatives; Agribusiness
Q14 - Agricultural Finance
Q15 - Land Ownership and Tenure; Land Reform; Land Use; Irrigation; Agriculture and Environment
Q16 - R&D; Agricultural Technology; Biofuels; Agricultural Extension Services
Browse content in Q2 - Renewable Resources and Conservation
Q25 - Water
Browse content in Q3 - Nonrenewable Resources and Conservation
Q32 - Exhaustible Resources and Economic Development
Q34 - Natural Resources and Domestic and International Conflicts
Browse content in Q4 - Energy
Q41 - Demand and Supply; Prices
Q48 - Government Policy
Browse content in Q5 - Environmental Economics
Q50 - General
Q51 - Valuation of Environmental Effects
Q53 - Air Pollution; Water Pollution; Noise; Hazardous Waste; Solid Waste; Recycling
Q54 - Climate; Natural Disasters; Global Warming
Q56 - Environment and Development; Environment and Trade; Sustainability; Environmental Accounts and Accounting; Environmental Equity; Population Growth
Q58 - Government Policy
Browse content in R - Urban, Rural, Regional, Real Estate, and Transportation Economics
Browse content in R0 - General
R00 - General
Browse content in R1 - General Regional Economics
R11 - Regional Economic Activity: Growth, Development, Environmental Issues, and Changes
R12 - Size and Spatial Distributions of Regional Economic Activity
R13 - General Equilibrium and Welfare Economic Analysis of Regional Economies
Browse content in R2 - Household Analysis
R20 - General
R23 - Regional Migration; Regional Labor Markets; Population; Neighborhood Characteristics
R28 - Government Policy
Browse content in R3 - Real Estate Markets, Spatial Production Analysis, and Firm Location
R30 - General
R31 - Housing Supply and Markets
R38 - Government Policy
Browse content in R4 - Transportation Economics
R40 - General
R41 - Transportation: Demand, Supply, and Congestion; Travel Time; Safety and Accidents; Transportation Noise
R48 - Government Pricing and Policy
Browse content in Z - Other Special Topics
Browse content in Z1 - Cultural Economics; Economic Sociology; Economic Anthropology
Z10 - General
Z12 - Religion
Z13 - Economic Sociology; Economic Anthropology; Social and Economic Stratification
Advance Articles
Editor's Choice
Author Guidelines
Submission Site
Open Access Options
Self-Archiving Policy
Why Submit?
About The Quarterly Journal of Economics
Editorial Board
Advertising and Corporate Services
Journals Career Network
Dispatch Dates
Journals on Oxford Academic
Books on Oxford Academic

< Previous

Machine Learning as a Tool for Hypothesis Generation

Article contents
Figures & tables
Supplementary Data

Jens Ludwig, Sendhil Mullainathan, Machine Learning as a Tool for Hypothesis Generation, The Quarterly Journal of Economics , Volume 139, Issue 2, May 2024, Pages 751–827, https://doi.org/10.1093/qje/qjad055

Permissions Icon Permissions

While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge decisions about whom to jail. We begin with a striking fact: the defendant’s face alone matters greatly for the judge’s jailing decision. In fact, an algorithm given only the pixels in the defendant’s mug shot accounts for up to half of the predictable variation. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: they are not explained by demographics (e.g., race) or existing psychology research, nor are they already known (even if tacitly) to people or experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional data set (e.g., cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our article is that hypothesis generation is a valuable activity, and we hope this encourages future work in this largely “prescientific” stage of science.

Personal account

Sign in with email/username & password
Get email alerts
Save searches
Purchase content
Activate your purchase/trial code
Add your ORCID iD

Institutional access

Sign in with username/password
Recommend to your librarian
Institutional account management
Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institutionâ€™s website and Oxford Academic.

Click Sign in through your institution.
Select your institution from the list provided, which will take you to your institution's website to sign in.
When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institutionâ€™s website, please contact your librarian or administrator.

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see â€˜Sign in through society siteâ€™ in the sign in pane within a journal:

Click Sign in through society site.
When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

View your signed in personal account and access account management features.
View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Short-term Access

To purchase short-term access, please sign in to your personal account above.

Don't already have a personal account? Register

Month:	Total Views:
January 2024	927
February 2024	626
March 2024	509
April 2024	1,754
May 2024	896
June 2024	618
July 2024	386
August 2024	370
September 2024	235

Email alerts

Citing articles via.

Recommend to Your Librarian

Affiliations

Online ISSN 1531-4650
Print ISSN 0033-5533
Copyright © 2024 President and Fellows of Harvard College
About Oxford Academic
Publish journals with us
University press partners
What we publish
New features
Open access
Rights and permissions
Accessibility
Advertising
Media enquiries
Oxford University Press
Oxford Languages
University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

Copyright © 2024 Oxford University Press
Cookie settings
Cookie policy
Privacy policy
Legal notice

This Feature Is Available To Subscribers Only

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings
My Bibliography
Collections
Citation manager

Save citation to file

Email citation, add to collections.

Create a new collection
Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

Search in PubMed
Search in NLM Catalog
Add to Search

Hypothesis-generating research and predictive medicine

Affiliation.

1 National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA. [email protected]
PMID: 23817045
PMCID: PMC3698497
DOI: 10.1101/gr.157826.113

Genomics has profoundly changed biology by scaling data acquisition, which has provided researchers with the opportunity to interrogate biology in novel and creative ways. No longer constrained by low-throughput assays, researchers have developed hypothesis-generating approaches to understand the molecular basis of nature-both normal and pathological. The paradigm of hypothesis-generating research does not replace or undermine hypothesis-testing modes of research; instead, it complements them and has facilitated discoveries that may not have been possible with hypothesis-testing research. The hypothesis-generating mode of research has been primarily practiced in basic science but has recently been extended to clinical-translational work as well. Just as in basic science, this approach to research can facilitate insights into human health and disease mechanisms and provide the crucially needed data set of the full spectrum of genotype-phenotype correlations. Finally, the paradigm of hypothesis-generating research is conceptually similar to the underpinning of predictive genomic medicine, which has the potential to shift medicine from a primarily population- or cohort-based activity to one that instead uses individual susceptibility, prognostic, and pharmacogenetic profiles to maximize the efficacy and minimize the iatrogenic effects of medical interventions.

PubMed Disclaimer

Publication types

Search in MeSH

LinkOut - more resources

Full text sources.

Europe PubMed Central
PubMed Central

Other Literature Sources

scite Smart Citations
Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

A-Z Publications

Annual Review of Psychology

Volume 48, 1997, review article, creative hypothesis generating in psychology: some useful heuristics.

William J. McGuire 1
View Affiliations Hide Affiliations Affiliations: Department of Psychology, Yale University, 2 Hillhouse Avenue, New Haven, Connecticut P.O. Box 208205, 06520-8205
Vol. 48:1-30 (Volume publication date February 1997) https://doi.org/10.1146/annurev.psych.48.1.1
© Annual Reviews

To correct a common imbalance in methodology courses, focusing almost entirely on hypothesis-testing issues to the neglect of hypothesis-generating issues which are at least as important, 49 creative heuristics are described, divided into 5 categories and 14 subcategories. Each of these heuristics has often been used to generate hypotheses in psychological research, and each is teachable to students. The 49 heuristics range from common sense perceptiveness of the oddity of natural occurrences to use of sophisticated quantitative data analyses in ways that provoke new insights.

Article metrics loading...

Full text loading...

Literature Cited

Abelson RP , Aronson E , McGuire WJ , Newcomb TM , Rosenberg MJ , Tannenbaum PH . eds 1968 . Theories of Cognitive Consistency . Chicago: Rand-McNally [Google Scholar]
Anderson NH . 1982 . Methods of Information Integration Theory . New York: Academic [Google Scholar]
Anzieu D . 1986 . Freud's Self-Analysis . Madison, CT: Int. Univ. Press [Google Scholar]
Argylle M , Cook M . 1976 . Gaze and Mutual Gaze . Cambridge: Cambridge Univ. Press [Google Scholar]
Breckler SJ . 1984 . Empirical validation of affect, behavior, and cognition as distinct components of attitude.. J. Pers. Soc. Psychol. 47 : 1191– 205 [Google Scholar]
Brock TC . 1965 . Communicator-recipient similarity and decision change.. J. Pers. Soc. Psychol. 1 : 650– 54 [Google Scholar]
Bush RR , Mosteller F . 1955 . Stochastic Models for Learning . New York: Wiley [Google Scholar]
Byrne D . 1971 . The Attraction Paradigm . New York: Academic [Google Scholar]
Campbell DT . 1963 . Social attitudes and other acquired behavioral dispositions. In Psychology: A Study of a Science , ed. S Koch 6 94– 172 New York: McGraw-Hill [Google Scholar]
Cialdini RB . 1993 . Influence: Science and Practice . New York: Harper Collins. 3rd ed [Google Scholar]
Collins BE , Hoyt MF . 1972 . Personal responsibility-for-consequences: an integration and extension of the “forced compliance” literature.. J. Exp. Soc. Psychol. 8 : 558– 93 [Google Scholar]
Deaux K . 1972 . To err is humanizing: but sex makes a difference.. Represent. Res. Soc. Psychol. 3 : 20– 28 [Google Scholar]
Eagly AH . 1974 . Comprehensibility of persuasive arguments as a determinant of opinion change.. J. Pers. Soc. Psychol. 29 : 758– 73 [Google Scholar]
Eagly AH , Carli LL . 1981 . Sex of researchers and sex-typed communications as determinants of sex differences in influenceability: a meta-analysis of social influence studies.. Psychol. Bull. 90 : 1– 20 [Google Scholar]
Estes WK . 1950 . Toward a statistical theory of learning.. Psychol. Rev. 57 : 94– 107 [Google Scholar]
Festinger L . 1957 . A Theory of Cognitive Dissonance . Stanford, CA: Stanford Univ. Press [Google Scholar]
Festinger L . 1964 . Conflict, Decision, and Dissonance . Stanford, CA: Stanford Univ. Press [Google Scholar]
Greenwald AG , Pratkanis AR , Leippe MR , Baumgardner MH . 1986 . Under what conditions does theory obstruct research progress?. Psychol. Rev. 93 : 216– 29 [Google Scholar]
Heilbron JL . 1986 . The Dilemmas of an Up-right Man: Max Planck as Spokesman for German Science . Berkeley, CA: Univ. Calif. Press [Google Scholar]
Hornstein HA , LaKind E , Frankel G , Manne S . 1975 . Effects of knowledge about remote social events on prosocial behavior, social conception, and mood.. J. Pers. Soc. Psychol. 32 : 1038– 46 [Google Scholar]
Hovland CI . 1952 . A “communication analysis” of concept learning.. Psychol. Rev. 59 : 461– 72 [Google Scholar]
Hovland CI . 1959 . Reconciling conflicting results derived from experimental and field studies of attitude change.. Am. Psychol. 14 : 8– 17 [Google Scholar]
Hovland CI , Lumsdaine AA , Sheffield FD . 1949 . Studies in Social Psychology in World War II , Vol. 3, Experiments on Mass Communication . Princeton, NJ: Princeton Univ. Press [Google Scholar]
Hull CL . 1933 . Hypnosis and Suggestibility . New York: Appleton-Century [Google Scholar]
Hull CL . 1952 . A Behavior System . New Haven, CT: Yale Univ. Press [Google Scholar]
Hull CL , Hovland CI , Ross RT , Hall M , Perkins DT , Fitch FB . 1940 . Mathematico-deductive Theory of Rote Learning . New Haven, CT: Yale Univ. Press [Google Scholar]
Johnson BT , Eagly AH . 1990 . Involvement and persuasion: types, traditions, and the evidence.. Psychol. Bull. 107 : 375– 84 [Google Scholar]
McClelland DC . 1961 . The Achieving Society . Princeton, NJ: Van Nostrand [Google Scholar]
McGuire AM . 1994 . Helping behaviors in the natural environment: dimensions and correlates of helping.. Pers. Soc. Psychol. Bull. 20 : 45– 56 [Google Scholar]
McGuire WJ . 1964 . Inducing resistance to persuasion. In Advances in Experimental Social Psychology , ed. L Berkowitz 1 191– 229 New York: Academic [Google Scholar]
McGuire WJ . 1968 . Personality and susceptibility to social influence. In Handbook of Personality Theory and Research , ed. EF Borgatta, WW Lambert 1130– 87 Chicago: Rand-McNally [Google Scholar]
McGuire WJ . 1973 . The yin and yang of progress in social psychology: seven koan.. J. Pers. Soc. Psychol. 26 : 446– 56 [Google Scholar]
McGuire WJ . 1983 . A contextualist theory of knowledge: its implications for innovation and reform in psychological research. In Advances in Experimental Social Psychology , ed. L Berkowitz 16 1– 47 New York: Academic [Google Scholar]
McGuire WJ . 1984 . Search for the self: going beyond self-esteem and the reactive self. In Personality and the Prediction of Behavior , ed. RA Zucker, J Aronoff, AI Rabin 73– 120 New York: Academic [Google Scholar]
McGuire WJ . 1985 . Attitudes and attitude change. In Handbook of Social Psychology , ed. G Lindsey, E Aronson pp. 3 233– 346 New York: Random House. 3rd ed [Google Scholar]
McGuire WJ . 1986 . The vicissitudes of attitudes and similar representational constructs in twentieth century psychology.. Eur. J. Soc. Psychol. 16 : 89– 130 [Google Scholar]
McGuire WJ . 1989 . A perspectivist approach to the strategic planning of programmatic scientific research. In The Psychology of Science: Contributions to Metascience , ed. B Gholson, A Houts, R Neimeyer, WR Shadish 214– 45 New York: Cambridge Univ. Press [Google Scholar]
Milgram S . 1976 . Interview. In The Making of Psychology , ed. RI Evans 187– 97 New York: Knopf [Google Scholar]
Neustadt RE , May ER . 1986 . Thinking in Time: the Uses of History for Decision Makers . New York: Free Press [Google Scholar]
Nisbett RE , Wilson TD . 1977 . Telling more than we can know: verbal report on mental processes.. Psychol. Rev. 84 : 231– 59 [Google Scholar]
Ostrom TM . 1988 . Computer simulation: the third symbol system.. J. Exp. Soc. Psychol. 24 : 381– 92 [Google Scholar]
Petty RE , Cacioppo J . 1986 . Communication and Persuasion: Central and Peripheral Routes to Attitude Change . New York: Springer-Verlag [Google Scholar]
Pratkanis AR , Greenwald AG , Leippe MR , Baumgardner MH . 1988 . In search of reliable persuasion effects. III. The sleeper effect is dead. Long live the sleeper effect J. Pers. Soc. Psychol. 54 : 203– 18 [Google Scholar]
Rokeach M . 1973 . The Nature of Human Values . New York: Free Press [Google Scholar]
Rumelhart DE , McClelland JL . 1986 . On learning the past tenses of English verbs. In Parallel Distributed Processing , ed. DE Rumelhart, JL McClelland 2 216– 71 Cambridge, MA: MIT Press [Google Scholar]

Data & Media loading...

Article Type: Review Article

Most Read This Month

Most cited most cited rss feed, job burnout, executive functions, social cognitive theory: an agentic perspective, on happiness and human potentials: a review of research on hedonic and eudaimonic well-being, sources of method bias in social science research and recommendations on how to control it, mediation analysis, missing data analysis: making it work in the real world, grounded cognition, personality structure: emergence of the five-factor model, motivational beliefs, values, and goals.

Data, AI, & Machine Learning
Managing Technology
Social Responsibility
Workplace, Teams, & Culture
AI & Machine Learning
Hybrid Work
Big ideas Research Projects
Artificial Intelligence and Business Strategy
Responsible AI
Future of the Workforce
Future of Leadership
All Research Projects
AI in Action
Most Popular
Coaching for the Future-Forward Leader
Measuring Culture

MIT SMR ’s fall 2024 issue highlights the need for personal and organizational resilience amid global uncertainty.

Past Issues
Upcoming Events
Video Archive
Me, Myself, and AI
Three Big Points

Why Hypotheses Beat Goals

Developing Strategy
Skills & Learning

Not long ago, it became fashionable to embrace failure as a sign of a company’s willingness to take risks. This trend lost favor as executives recognized that what they wanted was learning, not necessarily failure. Every failure can be attributed to a raft of missteps, and many failures do not automatically contribute to future success.

Certainly, if companies want to aggressively pursue learning, they must accept that failures will happen. But the practice of simply setting goals and then being nonchalant if they fail is inadequate.

Instead, companies should focus organizational energy on hypothesis generation and testing. Hypotheses force individuals to articulate in advance why they believe a given course of action will succeed. A failure then exposes an incorrect hypothesis — which can more reliably convert into organizational learning.

What Exactly Is a Hypothesis?

When my son was in second grade, his teacher regularly introduced topics by asking students to state some initial assumptions. For example, she introduced a unit on whales by asking: How big is a blue whale? The students all knew blue whales were big, but how big? Guesses ranged from the size of the classroom to the size of two elephants to the length of all the students in class lined up in a row. Students then set out to measure the classroom and the length of the row they formed, and they looked up the size of an elephant. They compared their results with the measurements of the whale and learned how close their estimates were.

Note that in this example, there is much more going on than just learning the size of a whale. Students were learning to recognize assumptions, make intelligent guesses based on those assumptions, determine how to test the accuracy of their guesses, and then assess the results.

This is the essence of hypothesis generation. A hypothesis emerges from a set of underlying assumptions. It is an articulation of how those assumptions are expected to play out in a given context. In short, a hypothesis is an intelligent, articulated guess that is the basis for taking action and assessing outcomes.

Get Updates on Transformative Leadership

Evidence-based resources that can help you lead your team more effectively, delivered to your inbox monthly.

Please enter a valid email address

Thank you for signing up

Hypothesis generation in companies becomes powerful if people are forced to articulate and justify their assumptions. It makes the path from hypothesis to expected outcomes clear enough that, should the anticipated outcomes fail to materialize, people will agree that the hypothesis was faulty.

Building a culture of effective hypothesizing can lead to more thoughtful actions and a better understanding of outcomes. Not only will failures be more likely to lead to future successes, but successes will foster future successes.

Why Is Hypothesis Generation Important?

Digital technologies are creating new business opportunities, but as I’ve noted in earlier columns , companies must experiment to learn both what is possible and what customers want. Most companies are relying on empowered, agile teams to conduct these experiments. That’s because teams can rapidly hypothesize, test, and learn.

Hypothesis generation contrasts starkly with more traditional management approaches designed for process optimization. Process optimization involves telling employees both what to do and how to do it. Process optimization is fine for stable business processes that have been standardized for consistency. (Standardized processes can usually be automated, specifically because they are stable.) Increasingly, however, companies need their people to steer efforts that involve uncertainty and change. That’s when organizational learning and hypothesis generation are particularly important.

Shifting to a culture that encourages empowered teams to hypothesize isn’t easy. Established hierarchies have developed managers accustomed to directing employees on how to accomplish their objectives. Those managers invariably rose to power by being the smartest person in the room. Such managers can struggle with the requirements for leading empowered teams. They may recognize the need to hold teams accountable for outcomes rather than specific tasks, but they may not be clear about how to guide team efforts.

Some newer companies have baked this concept into their organizational structure. Leaders at the Swedish digital music service Spotify note that it is essential to provide clear missions to teams . A clear mission sets up a team to articulate measurable goals. Teams can then hypothesize how they can best accomplish those goals. The role of leaders is to quiz teams about their hypotheses and challenge their logic if those hypotheses appear to lack support.

A leader at another company told me that accountability for outcomes starts with hypotheses. If a team cannot articulate what it intends to do and what outcomes it anticipates, it is unlikely that team will deliver on its mission. In short, the success of empowered teams depends upon management shifting from directing employees to guiding their development of hypotheses. This is how leaders hold their teams accountable for outcomes.

Members of empowered teams are not the only people who need to hone their ability to hypothesize. Leaders in companies that want to seize digital opportunities are learning through their experiments which strategies hold real promise for future success. They must, in effect, hypothesize about what will make the company successful in a digital economy. If they take the next step and articulate those hypotheses and establish metrics for assessing the outcomes of their actions, they will facilitate learning about the company’s long-term success. Hypothesis generation can become a critical competency throughout a company.

How Does a Company Become Proficient at Hypothesizing?

Most business leaders have embraced the importance of evidence-based decision-making. But developing a culture of evidence-based decision-making by promoting hypothesis generation is a new challenge.

For one thing, many hypotheses are sloppy. While many people naturally hypothesize and take actions based on their hypotheses, their underlying assumptions may go unexamined. Often, they don’t clearly articulate the premise itself. The better hypotheses are straightforward and succinctly written. They’re pointed about the suppositions they’re based on. And they’re shared, allowing an audience to examine the assumptions (are they accurate?) and the postulate itself (is it an intelligent, articulated guess that is the basis for taking action and assessing outcomes?).

Seven-Eleven Japan offers a case in how do to hypotheses right.

For over 30 years, Seven-Eleven Japan was the most profitable retailer in Japan. It achieved that stature by relying on each store’s salesclerks to decide what items to stock on that store’s shelves. Many of the salesclerks were part-time, but they were each responsible for maximizing turnover for one part of the store’s inventory, and they received detailed reports so they could monitor their own performance.

The language of hypothesis formulation was part of their process. Each week, Seven-Eleven Japan counselors visited the stores and asked salesclerks three questions:

What did you hypothesize this week? (That is, what did you order?)
How did you do? (That is, did you sell what you ordered?)
How will you do better next week? (That is, how will you incorporate the learning?)

By repeatedly asking these questions and checking the data for results, counselors helped people throughout the company hypothesize, test, and learn. The result was consistently strong inventory turnover and profitability.

How can other companies get started on this path? Evidence-based decision-making requires data — good data, as the Seven-Eleven Japan example shows. But rather than get bogged down with the limits of a company’s data, I would argue that companies can start to change their culture by constantly exposing individual hypotheses. Those hypotheses will highlight what data matters most — and the need of teams to test hypotheses will help generate enthusiasm for cleaning up bad data. A sense of accountability for generating and testing hypotheses then fosters a culture of evidence-based decision-making.

The uncertainties and speed of change in the current business environment render traditional management approaches ineffective. To create the agile, evidence-based, learning culture your business needs to succeed in a digital economy, I suggest that instead of asking What is your goal? you make it a habit to ask What is your hypothesis?

About the Author

Jeanne Ross is principal research scientist for MIT’s Center for Information Systems Research . Follow CISR on Twitter @mit_cisr .

More Like This

Add a comment cancel reply.

You must sign in to post a comment. First time here? Sign up for a free account : Comment on articles and get access to many more articles.

Comment (1)

Richard jones.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

Advanced Search
Journal List
v.19(7); 2019 Jul

Hypothesis tests

Associated data.

• Hypothesis tests are used to assess whether a difference between two samples represents a real difference between the populations from which the samples were taken.
• A null hypothesis of ‘no difference’ is taken as a starting point, and we calculate the probability that both sets of data came from the same population. This probability is expressed as a p -value.
• When the null hypothesis is false, p- values tend to be small. When the null hypothesis is true, any p- value is equally likely.

Learning objectives

By reading this article, you should be able to:

• Explain why hypothesis testing is used.
• Use a table to determine which hypothesis test should be used for a particular situation.
• Interpret a p- value.

A hypothesis test is a procedure used in statistics to assess whether a particular viewpoint is likely to be true. They follow a strict protocol, and they generate a ‘ p- value’, on the basis of which a decision is made about the truth of the hypothesis under investigation. All of the routine statistical ‘tests’ used in research— t- tests, χ 2 tests, Mann–Whitney tests, etc.—are all hypothesis tests, and in spite of their differences they are all used in essentially the same way. But why do we use them at all?

Comparing the heights of two individuals is easy: we can measure their height in a standardised way and compare them. When we want to compare the heights of two small well-defined groups (for example two groups of children), we need to use a summary statistic that we can calculate for each group. Such summaries (means, medians, etc.) form the basis of descriptive statistics, and are well described elsewhere. 1 However, a problem arises when we try to compare very large groups or populations: it may be impractical or even impossible to take a measurement from everyone in the population, and by the time you do so, the population itself will have changed. A similar problem arises when we try to describe the effects of drugs—for example by how much on average does a particular vasopressor increase MAP?

To solve this problem, we use random samples to estimate values for populations. By convention, the values we calculate from samples are referred to as statistics and denoted by Latin letters ( x ¯ for sample mean; SD for sample standard deviation) while the unknown population values are called parameters , and denoted by Greek letters (μ for population mean, σ for population standard deviation).

Inferential statistics describes the methods we use to estimate population parameters from random samples; how we can quantify the level of inaccuracy in a sample statistic; and how we can go on to use these estimates to compare populations.

Sampling error

There are many reasons why a sample may give an inaccurate picture of the population it represents: it may be biased, it may not be big enough, and it may not be truly random. However, even if we have been careful to avoid these pitfalls, there is an inherent difference between the sample and the population at large. To illustrate this, let us imagine that the actual average height of males in London is 174 cm. If I were to sample 100 male Londoners and take a mean of their heights, I would be very unlikely to get exactly 174 cm. Furthermore, if somebody else were to perform the same exercise, it would be unlikely that they would get the same answer as I did. The sample mean is different each time it is taken, and the way it differs from the actual mean of the population is described by the standard error of the mean (standard error, or SEM ). The standard error is larger if there is a lot of variation in the population, and becomes smaller as the sample size increases. It is calculated thus:

where SD is the sample standard deviation, and n is the sample size.

As errors are normally distributed, we can use this to estimate a 95% confidence interval on our sample mean as follows:

We can interpret this as meaning ‘We are 95% confident that the actual mean is within this range.’

Some confusion arises at this point between the SD and the standard error. The SD is a measure of variation in the sample. The range x ¯ ± ( 1.96 × SD ) will normally contain 95% of all your data. It can be used to illustrate the spread of the data and shows what values are likely. In contrast, standard error tells you about the precision of the mean and is used to calculate confidence intervals.

One straightforward way to compare two samples is to use confidence intervals. If we calculate the mean height of two groups and find that the 95% confidence intervals do not overlap, this can be taken as evidence of a difference between the two means. This method of statistical inference is reasonably intuitive and can be used in many situations. 2 Many journals, however, prefer to report inferential statistics using p -values.

Inference testing using a null hypothesis

In 1925, the British statistician R.A. Fisher described a technique for comparing groups using a null hypothesis , a method which has dominated statistical comparison ever since. The technique itself is rather straightforward, but often gets lost in the mechanics of how it is done. To illustrate, imagine we want to compare the HR of two different groups of people. We take a random sample from each group, which we call our data. Then:

(i) Assume that both samples came from the same group. This is our ‘null hypothesis’.
(ii) Calculate the probability that an experiment would give us these data, assuming that the null hypothesis is true. We express this probability as a p- value, a number between 0 and 1, where 0 is ‘impossible’ and 1 is ‘certain’.
(iii) If the probability of the data is low, we reject the null hypothesis and conclude that there must be a difference between the two groups.

Formally, we can define a p- value as ‘the probability of finding the observed result or a more extreme result, if the null hypothesis were true.’ Standard practice is to set a cut-off at p <0.05 (this cut-off is termed the alpha value). If the null hypothesis were true, a result such as this would only occur 5% of the time or less; this in turn would indicate that the null hypothesis itself is unlikely. Fisher described the process as follows: ‘Set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.’ 3 This probably remains the most succinct description of the procedure.

A question which often arises at this point is ‘Why do we use a null hypothesis?’ The simple answer is that it is easy: we can readily describe what we would expect of our data under a null hypothesis, we know how data would behave, and we can readily work out the probability of getting the result that we did. It therefore makes a very simple starting point for our probability assessment. All probabilities require a set of starting conditions, in much the same way that measuring the distance to London needs a starting point. The null hypothesis can be thought of as an easy place to put the start of your ruler.

If a null hypothesis is rejected, an alternate hypothesis must be adopted in its place. The null and alternate hypotheses must be mutually exclusive, but must also between them describe all situations. If a null hypothesis is ‘no difference exists’ then the alternate should be simply ‘a difference exists’.

Hypothesis testing in practice

The components of a hypothesis test can be readily described using the acronym GOST: identify the Groups you wish to compare; define the Outcome to be measured; collect and Summarise the data; then evaluate the likelihood of the null hypothesis, using a Test statistic .

When considering groups, think first about how many. Is there just one group being compared against an audit standard, or are you comparing one group with another? Some studies may wish to compare more than two groups. Another situation may involve a single group measured at different points in time, for example before or after a particular treatment. In this situation each participant is compared with themselves, and this is often referred to as a ‘paired’ or a ‘repeated measures’ design. It is possible to combine these types of groups—for example a researcher may measure arterial BP on a number of different occasions in five different groups of patients. Such studies can be difficult, both to analyse and interpret.

In other studies we may want to see how a continuous variable (such as age or height) affects the outcomes. These techniques involve regression analysis, and are beyond the scope of this article.

The outcome measures are the data being collected. This may be a continuous measure, such as temperature or BMI, or it may be a categorical measure, such as ASA status or surgical specialty. Often, inexperienced researchers will strive to collect lots of outcome measures in an attempt to find something that differs between the groups of interest; if this is done, a ‘primary outcome measure’ should be identified before the research begins. In addition, the results of any hypothesis tests will need to be corrected for multiple measures.

The summary and the test statistic will be defined by the type of data that have been collected. The test statistic is calculated then transformed into a p- value using tables or software. It is worth looking at two common tests in a little more detail: the χ 2 test, and the t -test.

Categorical data: the χ 2 test

The χ 2 test of independence is a test for comparing categorical outcomes in two or more groups. For example, a number of trials have compared surgical site infections in patients who have been given different concentrations of oxygen perioperatively. In the PROXI trial, 4 685 patients received oxygen 80%, and 701 patients received oxygen 30%. In the 80% group there were 131 infections, while in the 30% group there were 141 infections. In this study, the groups were oxygen 80% and oxygen 30%, and the outcome measure was the presence of a surgical site infection.

The summary is a table ( Table 1 ), and the hypothesis test compares this table (the ‘observed’ table) with the table that would be expected if the proportion of infections in each group was the same (the ‘expected’ table). The test statistic is χ 2 , from which a p- value is calculated. In this instance the p -value is 0.64, which means that results like this would occur 64% of the time if the null hypothesis were true. We thus have no evidence to reject the null hypothesis; the observed difference probably results from sampling variation rather than from an inherent difference between the two groups.

Table 1

Summary of the results of the PROXI trial. Figures are numbers of patients.

		Group
		Oxygen 80%	Oxygen 30%
Outcome	Infection	131	141
Outcome	No infection	554	560
Total		685	701

Continuous data: the t- test

The t- test is a statistical method for comparing means, and is one of the most widely used hypothesis tests. Imagine a study where we try to see if there is a difference in the onset time of a new neuromuscular blocking agent compared with suxamethonium. We could enlist 100 volunteers, give them a general anaesthetic, and randomise 50 of them to receive the new drug and 50 of them to receive suxamethonium. We then time how long it takes (in seconds) to have ideal intubation conditions, as measured by a quantitative nerve stimulator. Our data are therefore a list of times. In this case, the groups are ‘new drug’ and suxamethonium, and the outcome is time, measured in seconds. This can be summarised by using means; the hypothesis test will compare the means of the two groups, using a p- value calculated from a ‘ t statistic’. Hopefully it is becoming obvious at this point that the test statistic is usually identified by a letter, and this letter is often cited in the name of the test.

The t -test comes in a number of guises, depending on the comparison being made. A single sample can be compared with a standard (Is the BMI of school leavers in this town different from the national average?); two samples can be compared with each other, as in the example above; or the same study subjects can be measured at two different times. The latter case is referred to as a paired t- test, because each participant provides a pair of measurements—such as in a pre- or postintervention study.

A large number of methods for testing hypotheses exist; the commonest ones and their uses are described in Table 2 . In each case, the test can be described by detailing the groups being compared ( Table 2 , columns) the outcome measures (rows), the summary, and the test statistic. The decision to use a particular test or method should be made during the planning stages of a trial or experiment. At this stage, an estimate needs to be made of how many test subjects will be needed. Such calculations are described in detail elsewhere. 5

Table 2

The principle types of hypothesis test. Tests comparing more than two samples can indicate that one group differs from the others, but will not identify which. Subsequent ‘post hoc’ testing is required if a difference is found.

Type of data	Number of groups
Type of data	1 (comparison with a standard)	1 (before and after)	2	More than 2	Measured over a continuous range
Categorical	Binomial test	McNemar's test	χ test, or Fisher's exact (2×2 tables), or comparison of proportions	χ test	Logistic regression

Continuous (normal)	One-sample -test	Paired -test	Independent samples -test	Analysis of variance (ANOVA)	Regression analysis, correlation

Continuous (non-parametric)	Sign test (for median)	Sign test, or Wilcoxon matched-pairs test	Mann–Whitney test	Kruskal–Wallis test	Spearman's rank correlation

Controversies surrounding hypothesis testing

Although hypothesis tests have been the basis of modern science since the middle of the 20th century, they have been plagued by misconceptions from the outset; this has led to what has been described as a crisis in science in the last few years: some journals have gone so far as to ban p -value s outright. 6 This is not because of any flaw in the concept of a p -value, but because of a lack of understanding of what they mean.

Possibly the most pervasive misunderstanding is the belief that the p- value is the chance that the null hypothesis is true, or that the p- value represents the frequency with which you will be wrong if you reject the null hypothesis (i.e. claim to have found a difference). This interpretation has frequently made it into the literature, and is a very easy trap to fall into when discussing hypothesis tests. To avoid this, it is important to remember that the p- value is telling us something about our sample , not about the null hypothesis. Put in simple terms, we would like to know the probability that the null hypothesis is true, given our data. The p- value tells us the probability of getting these data if the null hypothesis were true, which is not the same thing. This fallacy is referred to as ‘flipping the conditional’; the probability of an outcome under certain conditions is not the same as the probability of those conditions given that the outcome has happened.

A useful example is to imagine a magic trick in which you select a card from a normal deck of 52 cards, and the performer reveals your chosen card in a surprising manner. If the performer were relying purely on chance, this would only happen on average once in every 52 attempts. On the basis of this, we conclude that it is unlikely that the magician is simply relying on chance. Although simple, we have just performed an entire hypothesis test. We have declared a null hypothesis (the performer was relying on chance); we have even calculated a p -value (1 in 52, ≈0.02); and on the basis of this low p- value we have rejected our null hypothesis. We would, however, be wrong to suggest that there is a probability of 0.02 that the performer is relying on chance—that is not what our figure of 0.02 is telling us.

To explore this further we can create two populations, and watch what happens when we use simulation to take repeated samples to compare these populations. Computers allow us to do this repeatedly, and to see what p- value s are generated (see Supplementary online material). 7 Fig 1 illustrates the results of 100,000 simulated t -tests, generated in two set of circumstances. In Fig 1 a , we have a situation in which there is a difference between the two populations. The p- value s cluster below the 0.05 cut-off, although there is a small proportion with p >0.05. Interestingly, the proportion of comparisons where p <0.05 is 0.8 or 80%, which is the power of the study (the sample size was specifically calculated to give a power of 80%).

The p- value s generated when 100,000 t -tests are used to compare two samples taken from defined populations. ( a ) The populations have a difference and the p- value s are mostly significant. ( b ) The samples were taken from the same population (i.e. the null hypothesis is true) and the p- value s are distributed uniformly.

Figure 1 b depicts the situation where repeated samples are taken from the same parent population (i.e. the null hypothesis is true). Somewhat surprisingly, all p- value s occur with equal frequency, with p <0.05 occurring exactly 5% of the time. Thus, when the null hypothesis is true, a type I error will occur with a frequency equal to the alpha significance cut-off.

Figure 1 highlights the underlying problem: when presented with a p -value <0.05, is it possible with no further information, to determine whether you are looking at something from Fig 1 a or Fig 1 b ?

Finally, it cannot be stressed enough that although hypothesis testing identifies whether or not a difference is likely, it is up to us as clinicians to decide whether or not a statistically significant difference is also significant clinically.

Hypothesis testing: what next?

As mentioned above, some have suggested moving away from p -values, but it is not entirely clear what we should use instead. Some sources have advocated focussing more on effect size; however, without a measure of significance we have merely returned to our original problem: how do we know that our difference is not just a result of sampling variation?

One solution is to use Bayesian statistics. Up until very recently, these techniques have been considered both too difficult and not sufficiently rigorous. However, recent advances in computing have led to the development of Bayesian equivalents of a number of standard hypothesis tests. 8 These generate a ‘Bayes Factor’ (BF), which tells us how more (or less) likely the alternative hypothesis is after our experiment. A BF of 1.0 indicates that the likelihood of the alternate hypothesis has not changed. A BF of 10 indicates that the alternate hypothesis is 10 times more likely than we originally thought. A number of classifications for BF exist; greater than 10 can be considered ‘strong evidence’, while BF greater than 100 can be classed as ‘decisive’.

Figures such as the BF can be quoted in conjunction with the traditional p- value, but it remains to be seen whether they will become mainstream.

Declaration of interest

The author declares that they have no conflict of interest.

The associated MCQs (to support CME/CPD activity) will be accessible at www.bjaed.org/cme/home by subscribers to BJA Education .

Jason Walker FRCA FRSS BSc (Hons) Math Stat is a consultant anaesthetist at Ysbyty Gwynedd Hospital, Bangor, Wales, and an honorary senior lecturer at Bangor University. He is vice chair of his local research ethics committee, and an examiner for the Primary FRCA.

Matrix codes: 1A03, 2A04, 3J03

Supplementary data to this article can be found online at https://doi.org/10.1016/j.bjae.2019.03.006 .

Supplementary material

The following is the Supplementary data to this article:

Machine Learning as a Tool for Hypothesis Generation

While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge decisions about who to jail. We begin with a striking fact: The defendant’s face alone matters greatly for the judge’s jailing decision. In fact, an algorithm given only the pixels in the defendant’s mugshot accounts for up to half of the predictable variation. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: They are not explained by demographics (e.g. race) or existing psychology research; nor are they already known (even if tacitly) to people or even experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional dataset (e.g. cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our paper is that hypothesis generation is in and of itself a valuable activity, and hope this encourages future work in this largely “pre-scientific” stage of science.

This is a revised version of Chicago Booth working paper 22-15 “Algorithmic Behavioral Science: Machine Learning as a Tool for Scientific Discovery.” We gratefully acknowledge support from the Alfred P. Sloan Foundation, Emmanuel Roman, and the Center for Applied Artificial Intelligence at the University of Chicago. For valuable comments we thank Andrei Shliefer, Larry Katz and five anonymous referees, as well as Marianne Bertrand, Jesse Bruhn, Steven Durlauf, Joel Ferguson, Emma Harrington, Supreet Kaur, Matteo Magnaricotte, Dev Patel, Betsy Levy Paluck, Roberto Rocha, Evan Rose, Suproteem Sarkar, Josh Schwartzstein, Nick Swanson, Nadav Tadelis, Richard Thaler, Alex Todorov, Jenny Wang and Heather Yang, as well as seminar participants at Bocconi, Brown, Columbia, ETH Zurich, Harvard, MIT, Stanford, the University of California Berkeley, the University of Chicago, the University of Pennsylvania, the 2022 Behavioral Economics Annual Meetings and the 2022 NBER summer institute. For invaluable assistance with the data and analysis we thank Cecilia Cook, Logan Crowl, Arshia Elyaderani, and especially Jonas Knecht and James Ross. This research was reviewed by the University of Chicago Social and Behavioral Sciences Institutional Review Board (IRB20-0917) and deemed exempt because the project relies on secondary analysis of public data sources. All opinions and any errors are of course our own. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

MARC RIS BibTeΧ

Download Citation Data

Published Versions

Jens Ludwig & Sendhil Mullainathan, 2024. " Machine Learning as a Tool for Hypothesis Generation, " The Quarterly Journal of Economics, vol 139(2), pages 751-827.

Working Groups

Conferences, more from nber.

In addition to working papers , the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter , the NBER Digest , the Bulletin on Retirement and Disability , the Bulletin on Health , and the Bulletin on Entrepreneurship — as well as online conference reports , video lectures , and interviews .

2024, 16th Annual Feldstein Lecture, Cecilia E. Rouse," Lessons for Economists from the Pandemic" cover slide

Feldstein Lecture
Presenter: Cecilia E. Rouse

2024 Methods Lecture, Susan Athey, "Analysis and Design of Multi-Armed Bandit Experiments and Policy Learning"

Methods Lectures
Presenter: Susan Athey

2024, Economics of Social Security Panel, "Earnings Inequality and Payroll Tax Revenues"

Panel Discussion
Presenters: Karen Dynan , Karen Glenn, Stephen Goss, Fatih Guvenen & James Pearce

COMMENTS

Hypothesis Testing and Hypothesis Generating Research: An ...
generation represent two distinct research objectives. In hypothesis testing research, the researcher specifies one or more a priori hypotheses, based on existing theory and/or data, and then puts these hypotheses to an empirical test with a new set of data. In hypothesis generating research, the researcher explores a set of data searching
Formulating Hypotheses for Different Study Designs
Formulating Hypotheses for Different Study Designs. Generating a testable working hypothesis is the first step towards conducting original research. Such research may prove or disprove the proposed hypothesis. Case reports, case series, online surveys and other observational studies, clinical trials, and narrative reviews help to generate ...
Research: Articulating Questions, Generating Hypotheses, and Choosing
The hypothesis is a tentative prediction of the nature and direction of relationships between sets of data, phrased as a declarative statement. ... Studies that seek to answer descriptive research questions do not test hypotheses, but they can be used for hypothesis generation. Those hypotheses would then be tested in subsequent studies.
General Principles of Preclinical Study Design
1. An Overview. Broadly, preclinical research can be classified into two distinct categories depending on the aim and purpose of the experiment, namely, "hypothesis generating" (exploratory) and "hypothesis testing" (confirmatory) research (Fig. 1).Hypothesis generating studies are often scientifically-informed, curiosity and intuition-driven explorations which may generate testable ...
Data-Driven Hypothesis Generation in Clinical Research: What We Learned
Hypothesis generation is an early and critical step in any hypothesis-driven clinical research project. Because it is not yet a well-understood cognitive process, the need to improve the process goes unrecognized. Without an impactful hypothesis, the significance of any research project can be questionable, regardless of the rigor or diligence applied in other steps of the study, e.g., study ...
Hypothesis Testing
Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.
PDF Scientific hypothesis generation process in clinical research: a
Background: Scientific hypothesis generation is a critical step in scientific research that determines the direction and impact of any investigation. Despite its vital role, we have limited ... After formulating a scientific hypothesis, researchers design studies to test the scientific hypothesis to determine the answer to research questions 2,4.
Putting hypotheses to the test: We must hold ourselves accountable to
Using data to generate potential discoveries and using data to subject those discoveries to tests are distinct processes. This distinction is known as exploratory (or hypothesis-generating) research and confirmatory (or hypothesis-testing) research. In the daily practice of doing research, it is easy to confuse which one is being done.
A Practical Guide to Writing Quantitative and Qualitative Research
Hypothesis-generating (Qualitative hypothesis-generating research) - Qualitative research uses inductive reasoning. - This involves data collection from study participants or the literature regarding a phenomenon of interest, using the collected data to develop a formal hypothesis, and using the formal hypothesis as a framework for testing the ...
Hypothesis Generation for Data Science Projects
Hypothesis Generation vs. Hypothesis Testing. This is a very common mistake data science beginners make. Hypothesis generation is a process beginning with an educated guess whereas hypothesis testing is a process to conclude that the educated guess is true/false or the relationship between the variables is statistically significant or not.
Hypothesis-generating and confirmatory studies, Bonferroni correction
testing 50 null hypotheses, which would have required a cor-rected significance level of .05/50 = 0.001. In a confirmatory study, it is mandatory to show that the ... Hypothesis-generating studies are much more common than confirmatory, because the latter are logistically more complex, more laborious, more time-consuming, more expensive, and
Hypothesis Testing in Data Science
Hypothesis Testing vs Hypothesis Generation . In the world of Data Science, there are two parts to consider when putting together a hypothesis. Hypothesis Testing is when the team builds a strong hypothesis based on the available dataset. This will help direct the team and plan accordingly throughout the data science project.
Approaches to informed consent for hypothesis-testing and hypothesis
The purpose of our hypothesis-generating study is to test the feasibility of using MPS to generate clinical hypotheses, and to approach the return of results as an experimental manipulation. Issues to consider in both designs include: volume and nature of the potential results, primary versus secondary results, return of individual results ...
Machine Learning as a Tool for Hypothesis Generation*
Abstract. While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not.
Hypothesis-generating research and predictive medicine
The paradigm of hypothesis-generating research does not replace or undermine hypothesis-testing modes of research; instead, it complements them and has facilitated discoveries that may not have been possible with hypothesis-testing research. The hypothesis-generating mode of research has been primarily practiced in basic science but has ...
Hypothesis-generating research and predictive medicine
The paradigm of hypothesis-generating research does not replace or undermine hypothesis-testing modes of research; instead, it complements them and has facilitated discoveries that may not have been possible with hypothesis-testing research. The hypothesis-generating mode of research has been primarily practiced in basic science but has ...
CREATIVE HYPOTHESIS GENERATING IN PSYCHOLOGY: Some Useful Heuristics
Abstract To correct a common imbalance in methodology courses, focusing almost entirely on hypothesis-testing issues to the neglect of hypothesis-generating issues which are at least as important, 49 creative heuristics are described, divided into 5 categories and 14 subcategories. Each of these heuristics has often been used to generate hypotheses in psychological research, and each is ...
Exploratory hypothesis tests can be more compelling than confirmatory
The replication crisis may be partly explained by scientists' overconfidence in the replicability of their results. It has been argued that one source of this overconfidence is the false portrayal of exploratory hypothesis tests as confirmatory hypothesis tests (Nosek et al., Citation 2018; Wagenmakers et al., Citation 2012). Exploratory hypothesis tests involve unplanned tests of post hoc ...
Why Hypotheses Beat Goals
Instead, companies should focus organizational energy on hypothesis generation and testing. Hypotheses force individuals to articulate in advance why they believe a given course of action will succeed. A failure then exposes an incorrect hypothesis — which can more reliably convert into organizational learning. What Exactly Is a Hypothesis?
Hypothesis-generating and confirmatory studies, Bonferroni correction
Presenting an outcome from a hypothesis-generating study as if it had been produced in a confirmatory study is misleading and represents methodological ignorance or scientific misconduct. Hypothesis-generating studies differ methodologically from confirmatory studies. A generated hypothesis must be confirmed in a new study.
Hypothesis tests
A hypothesis test is a procedure used in statistics to assess whether a particular viewpoint is likely to be true. They follow a strict protocol, and they generate a 'p-value', on the basis of which a decision is made about the truth of the hypothesis under investigation.All of the routine statistical 'tests' used in research—t-tests, χ 2 tests, Mann-Whitney tests, etc.—are all ...
Machine Learning as a Tool for Hypothesis Generation
While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge ...

Society Homepage About Public Health Policy Contact

Submit your own article

Join the Society

Main Article Content

Article Details

Have a language expert improve your writing

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Table of contents

Prevent plagiarism. Run a free check.

Here's why students love Scribbr's proofreading services

Cite this Scribbr article

Is this article helpful?

Rebecca Bevans

February 3rd, 2016

Image credit: Winnowing Grain Eastman Johnson Museum of Fine Arts, Boston

About the author

Leave a Comment Cancel reply

Related Posts

The research librarian of the future: data scientist and co-investigator

Collaboration and concerted action are key to making open data a reality

The radical potential of the Digital Humanities: The most challenging computing problem is the interrogation of power

Real-time data on global collaboration networks can support new research and create further connections

Approaches to informed consent for hypothesis-testing and hypothesis-generating clinical genomics research

General issues regarding return of results

Issues to consider

Informed consent approach

Issue #2: Volume and nature of information

Issue #3: Return of individual genotype results

Issue #4: Duty to warn

Issue #5: Length of researcher and participant interaction

Issue #6: Target population

Issue #7: Privacy and confidentiality

Pre-publication history

Author information

Corresponding author

Additional information

Authors’ contributions

Rights and permissions

About this article

Share this article

BMC Medical Genomics

Sign in through your institution

Machine Learning as a Tool for Hypothesis Generation

Personal account

Institutional access

IP based access

Society Members

Sign in through society site

Sign in using a personal account

Viewing your signed in accounts

Signed in but can't access content

Short-term Access

Email alerts

Affiliations

This Feature Is Available To Subscribers Only

Save citation to file

Add to My Bibliography

Hypothesis-generating research and predictive medicine

Similar articles

Publication types

LinkOut - more resources

Other Literature Sources

Annual Review of Psychology

Most Read This Month

Why Hypotheses Beat Goals

What Exactly Is a Hypothesis?

Get Updates on Transformative Leadership

Why Is Hypothesis Generation Important?

How Does a Company Become Proficient at Hypothesizing?

Related Articles

About the Author

More Like This

Comment (1)

Hypothesis tests

Learning objectives

Sampling error

Inference testing using a null hypothesis

Hypothesis testing in practice

Categorical data: the χ 2 test

Table 1