SUDAAN

Last updated

SUDAAN is a proprietary statistical software package for the analysis of correlated data, including correlated data encountered in complex sample surveys. SUDAAN originated in 1972 at RTI International (the trade name of Research Triangle Institute). Individual commercial licenses are sold for $1,460 a year, or $3,450 permanently.

Proprietary software, also known as closed-source software, is a non-free computer software for which the software's publisher or another person retains intellectual property rights—usually copyright of the source code, but sometimes patent rights.

RTI International

RTI International is a nonprofit organization headquartered in the Research Triangle Park in North Carolina. RTI provides research and technical services. It was founded in 1958 with $500,000 in funding from local businesses and the three North Carolina universities that form the Research Triangle. RTI research has covered topics like HIV/AIDS, healthcare, education curriculum and the environment, among others. The US Agency for International Development accounts for about 35 percent of RTI's research revenue.

Contents

Current version

SUDAAN Release 11.0.3, released in May 2018, is a single program consisting of a family of thirteen analytic procedures used to analyze data from complex sample surveys and other observational and experimental studies involving repeated measures and cluster-correlated data. It provides estimates that account for complex design features of a study, including:

Analytic and enumerative statistical studies are two types of scientific studies:

A weight function is a mathematical device used when performing a sum, integral, or average to give some elements more "weight" or influence on the result than other elements in the same set. The result of this application of a weight function is a weighted sum or weighted average. Weight functions occur frequently in statistics and analysis, and are closely related to the concept of a measure. Weight functions can be employed in both discrete and continuous settings. They can be used to construct systems of calculus called "weighted calculus" and "meta-calculus".

Stratified sampling

In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations.

Example fields of use

SUDAAN enables the analysis of correlated data encountered in various fields of statistical research, including:

Random digit dialing (RDD) is a method for selecting people for involvement in telephone statistical surveys by generating telephone numbers at random. Random digit dialing has the advantage that it includes unlisted numbers that would be missed if the numbers were selected from a phone book. In populations where there is a high telephone-ownership rate, it can be a cost efficient way to get complete coverage of a geographic area.

Clinical trials are experiments or observations done in clinical research. Such prospective biomedical or behavioral research studies on human participants are designed to answer specific questions about biomedical or behavioral interventions, including new treatments and known interventions that warrant further study and comparison. Clinical trials generate data on safety and efficacy. They are conducted only after they have received health authority/ethics committee approval in the country where approval of the therapy is sought. These authorities are responsible for vetting the risk/benefit ratio of the trial – their approval does not mean that the therapy is 'safe' or effective, only that the trial may be conducted.

Toxicology branch of biology, chemistry, and medicine

Toxicology is a scientific discipline, overlapping with biology, chemistry, pharmacology, and medicine, that involves the study of the adverse effects of chemical substances on living organisms and the practice of diagnosing and treating exposures to toxins and toxicants. The relationship between dose and its effects on the exposed organism is of high significance in toxicology. Factors that influence chemical toxicity include the dosage, duration of exposure, route of exposure, species, age, sex, and environment. Toxicologists are experts on poisons and poisoning. There is a movement for evidence-based toxicology as part of the larger movement towards evidence-based practices.

Strengths

SUDAAN's strength lies in its ability to compute standard errors of ratio estimates, means, totals, regression coefficients, and other statistics in accordance with the sample design, greatly increasing the accuracy and validity of results. Many, if not most, data sets require attention to correlation and weighting, but few statistical software packages offer the user the opportunity to specify how data are correlated and weighted. For many years, SUDAAN remained the only broadly applicable software for analysis of correlated and weighted data. Currently Mplus offers similar capacities for a much broader set of models.

There are several kinds of means in various branches of mathematics.

Regression analysis set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.

A data set is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. Data sets can also consist of a collection of documents or files.

Currently, all nine of SUDAAN's analytic procedures offer three popular robust variance estimation methods:

Taylor series representation of a function

In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point.

In statistics, resampling is any of a variety of methods for doing one of the following:

  1. Estimating the precision of sample statistics by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)
  2. Exchanging labels on data points when performing significance tests
  3. Validating models by using random subsets

Operating systems

SUDAAN functions on many computing platforms—including Windows 7/10, DOS, and LINUX—either as a stand-alone statistical software tool, or in SAS-callable format (SAS Version 9).

Related Research Articles

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. The application of multivariate statistics is multivariate analysis.

Psychological statistics is application of formulas, theorems, numbers and laws to psychology. Statistical Methods for psychology include development and application statistical theory and methods for modeling psychological data. These methods include psychometrics, Factor analysis, Experimental Designs, Multivariate Behavioral Research. The article also discusses journals in the same field Wilcox, R. (2012).

Statistics is a field of inquiry that studies the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used and misused for making informed decisions in all areas of business and government.

Meta-analysis statistical method that summarizes data from multiple sources

A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analysis can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting measurements that are expected to have some degree of error. The aim then is to use approaches from statistics to derive a pooled estimate closest to the unknown common truth based on how this error is perceived. Existing methods for meta-analysis yield a weighted average from the results of the individual studies, and what differs is the manner in which these weights are allocated and also the manner in which the uncertainty is computed around the point estimate thus generated. In addition to providing an estimate of the unknown common truth, meta-analysis has the capacity to contrast results from different studies and identify patterns among study results, sources of disagreement among those results, or other interesting relationships that may come to light in the context of multiple studies.

Most of the terms listed in Wikipedia glossaries are already defined and explained within Wikipedia itself. However, glossaries like this one are useful for looking up, comparing and reviewing large numbers of terms together. You can help enhance this page by adding new terms or writing definitions for existing ones.

Spatial analysis Formal techniques which study entities using their topological, geometric, or geographic properties

Spatial analysis or spatial statistics includes any of the formal techniques which study entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early development, using different analytic approaches and applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is the technique applied to structures at the human scale, most notably in the analysis of geographic data.

Genstat

Genstat is a statistical software package with data analysis capabilities, particularly in the field of agriculture.

Multilevel model

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

Mixed model statistical model containing both fixed effects and random effects

A mixed model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. They are particularly useful in settings where repeated measurements are made on the same statistical units, or where measurements are made on clusters of related statistical units. Because of their advantage in dealing with missing values, mixed effects models are often preferred over more traditional approaches such as repeated measures ANOVA.

Galton's problem, named after Sir Francis Galton, is the problem of drawing inferences from cross-cultural data, due to the statistical phenomenon now called autocorrelation. The problem is now recognized as a general one that applies to all nonexperimental studies and to experimental design as well. It is most simply described as the problem of external dependencies in making statistical estimates when the elements sampled are not statistically independent. Asking two people in the same household whether they watch TV, for example, does not give you statistically independent answers. The sample size, n, for independent observations in this case is one, not two. Once proper adjustments are made that deal with external dependencies, then the axioms of probability theory concerning statistical independence will apply. These axioms are important for deriving measures of variance, for example, or tests of statistical significance.

Bootstrapping (statistics) Statistical method

In statistics, bootstrapping is any test or metric that relies on random sampling with replacement. Bootstrapping allows assigning measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods. Generally, it falls in the broader class of resampling methods.

The Unistat computer program is a statistical data analysis tool featuring two modes of operation: The stand-alone user interface is a complete workbench for data input, analysis and visualization while the Microsoft Excel add-in mode extends the features of the mainstream spreadsheet application with powerful analytical capabilities.

In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social research. It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct. As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959).

One application of multilevel modeling (MLM) is the analysis of repeated measures data. Multilevel modeling for repeated measures data is most often discussed in the context of modeling change over time ; however, it may also be used for repeated measures data in which time is not a factor.

Linear regression statistical approach for modeling the relationship between a scalar dependent variable and one or more explanatory variables

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Imputation and Variance Estimation Software analytics software

Imputation and Variance Estimation Software (IVEware) is a collection of routines written under various platforms and packaged to perform multiple imputations, variance estimation and, in general, draw inferences from incomplete data. It can also be used to perform analysis without any missing data. IVEware defaults to assuming a simple random sample, but uses the Jackknife Repeated Replication or Taylor Series Linearization techniques for analyzing data from complex surveys.