## Statistics Training

The following statistics courses are offered:

- Statistics in Practice (3 days)
- Design of Experiments - A step-by-step approach (3 one-day modules)
- Split-plot experiments: design and data analysis (2 days)
- Multivariate analysis / Datamining (2 days)

**Statistics in Practice **

Statistics doesn't really have a good reputation. Too mathematical, strange logic or obscure recipes sum it up for most people. Yet, many decisions are taken based on data, and that's what modern statistics is all about: understanding data and coming to meaningful conclusions. In this training, the focus is on graphical analysis and "common sense" statistics. The amount of mathematics is strongly reduced as compared to a typical stats course. What is not reduced is the emphasis on applications of statistics in every day's practice in business and industry.

**Course objective**

As a result of this course, you'll develop a real "feel" for statistics. You'll be able to choose an appropriate technique for the most common types of problems, and interpret the results correctly.

**Prior knowledge**

No prior knowledge is required.

** Course contents**

Module 1 (2 days)

- Descriptive statistics
- Graphical techniques: scatter plots, histogram, dotplot, boxplot, normal probability plot
- Descriptive statistics: means, median, variance, IQR, ...
- Describing the similarity between variables: covariance & correlation
- Autocorrelation

- Good data collection practice
- Sampling strategies
- Paired comparisons

- Dealing with random variables (probability distributions)
- Properties of distributions of random variables
- Distributions for discrete and continuous variables: Binomial, Poisson, normal distribution, Weibull, ...

- Functions of random variables: the z-distribution, c2, t and the F-distribution
- Confidence intervals for means, difference in means, variances, proportions, capability indices, ...
- Hypothesis testing
- Hypothesis testing with confidence intervals
- Classical hypothesis testing
- Statistical significant versus practical relevant
- Type I and Type II errors
- Power and sample size calculations

- One-way ANOVA
- Simple Linear Regression

** **** **Module 2 (1 day)

- Two-way ANOVA
- Random effects and Nested ANOVA - Variance Components Analysis (R&r study)
- Polynomial Regression

Some cases & applications:

- detecting a change in a process
- judging the difference between two products or systems
- calculating the number of data needed to detect a certain improvement
- investigating the effect of different types of a constituent on the product properties
- investigating the effect of a process parameter on a characteristic

**Course set-up**

During the first two days all basic statistical concepts and techniques are treated, which will guide the participants through a correct statistical analysis of their results, originating from experiments or other sources. The third day expands the statistical toolbox with such methods as two-way ANOVA, nested designs for the identification of the most important sources of variation (e.g. for an R&r study) and polynomial regression.

Theory will alternate with hands-on computer exercises.

** **

#

Design of Experiments – A step-by-step approach

Whatever you want to investigate or optimize, the optimal way of doing it is called Design of Experiments (DOE), a.k.a. Experimental Design, Statistical Design of Experiments (SDE) or Multi-Variable Testing (MVT).

**Course objective **

At the end of the course the participants will be able to formalise a problem, find the appropriate design type and, except for complex problems, construct this design. The participants will also master the statistical analysis of standard designs for continuous variables.

**Prior knowledge**

For Module I, no prior knowledge is assumed. For modules II and III a basic knowledge of statistics - as treated in the course Statistics in Practice - is recommended.

** Course contents**

- One Variable At a Time versus Experimental Design
- The concept of interacting variables
- Replication, 2-level blocking variables and randomisation
- 2-level designs: Full Factorial, Fractional Factorial, Minimum-Run designs, Foldover designs, Confounding, Resolution
- Multi-level Response-Surface-Model designs
- Power-analysis: which is the smallest significant effect I can find, how many experiments will it cost to find an effect of a particular size
- Analysing the results with Analysis of Variance
- Residual analysis and graphical validation
- Visualisation of the results
- Response transformation
- Multi-response optimisation

**Course set-up**

During the first three days the topics listed above will be theoretically treated, and illustrated with exercises. Day four is actually an Experimental Design "game": the participants go through all phases of a project: from problem analysis over choosing a design, up to analysing simulated data and reporting the results.

# Split-plot experiments: design and data analysis

According to the textbooks, designed experiments should be completely randomized. Many industrial experiments, however, involve restrictions in randomization due to cost and time constraints. The presence of restrictions in randomization necessitates the use of split-plot designs. This leads to correlated responses, so that generalized least squares is the preferred estimation technique for the factor effects. This course discusses the analysis of data from split-plot experiments in detail. In addition, the course will discuss how to generate designs for split-plot experiments for a variety of practical scenarios, ranging from screening experiments to response surface experiments to mixture-process variable experiments.

**Course objective **

At the end of the course, participants will recognize restrictions in randomizations and understand their consequences for the design and analysis of experiments. Participants will be able to analyze data from split-plot experiments and draw conclusions for the analysis. In addition, participants will learn how to set up their own split-plot experiments in an optimal fashion and compare different designs in terms of power for detecting active effects and in terms of prediction performance. Participants will also learn that strip-plot and split-split-plot designs, which are often used for studying multi-stage processes, are straightforward extensions of split-plot designs, and that their design and analysis is hardly more complicated than that of split-plot designs.

Throughout the course, all concepts, analyses, design generations and design comparisons will be performed in JMP, which is the only software capable of generating efficient split-plot designs, strip-plot and split-split-plot designs for any practical situations, and to analyze the resulting data in a user-friendly way.

**Prior knowledge**

Having taken a first course in design of experiments is required, as well as some familiarity with hypothesis testing and regression analysis.

**Course contents**

Introduction to split-plot design (agricultural origin, industrial applications)

Split-plot model

- Generalized least squares
- Restricted maximum likelihood
- Kenward-Roger degrees of freedom
- Hypothesis testing
- Ordinary versus generalized least squares

Optimal experimental design

- D-optimal designs
- I-optimal designs
- An algorithm for constructing optimal split-plot designs
- Diagnostics (relative variance of factor effect estimates, power, fraction of design space plots, simulation study)

Examples

- Split-plot screening designs
- Wind-tunnel experiment
- A mixture experiment involving process variables

Extensions

- Split-split-plot design: a cheese-making experiment
- Strip-plot design: a battery-cell experiment

**Course set-up**

Throughout this hands-on course, all concepts are illustrated immediately in the software JMP. It is a two-day course.

**Trainer**

This course is taught by Prof Peter Goos of the University of Antwerp. Prof Goos is a leading expert in the field of split-plot design and analysis.

** **

# Multivariate analysis / Datamining

Massive amounts of data are collected and often stowed away in databases without further analysis. Or only simple graphical analyses are performed, that may be insufficient to bring the valuable information to the surface. If there are a large number of factors at play, a multivariate datamining approach is the only alternative. This course is guaranteed to bring you some fascinating eye-openers and mind-boggling insights

**Course objective **

This course will open your mind to multivariate thinking and introduce you to a class of more advanced methods.

**Prior knowledge**

Although prior knowledge is strictly speaking not required, most participants will have previously attended statistics or DOE courses or have some experience in analyzing data.

** Course contents**

**Day 1: Exploratory multivariate analysis**

- Visualisation of big datasets
- Principal Component Analysis (PCA)
- Cluster analysis: searching for groups of similar samples

**Day 2: Quantitative analysis: in search of cause-effect relations**

- Multiple Linear Regression (MLR) with uncorrelated variables
- Multiple Linear Regression (MLR) with correlated variables
- Stepwise regression
- The collinearity problem
- An overview of the pitfalls

- Principal Component Regression (PCR)
- Partial Least Squares (PLS)
- Interpretation of PCR and PLS models
- Validation of regression models
- Detection of outliers and non-linearities
- Prediction with regression models

- Some alternatives

** **

**Day 3: Quantitative analysis: the sequel + specific applications**

- Feasibility study: does a quantitative analysis make sense?
- Classification (supervised pattern recognition): predicting class membership
- Linear Discriminant Analysis (LDA)
- Soft Independent Modeling of Class Analogy (SIMCA)
- PLS-DA

- Specific applications:
- QSAR / QSPR (Quantitative Structure Activity / Property Relations)
- Multivariate SPC (M-SPC)
- Principal Properties Design
- ......

**Course set-up**

During day 1 qualitative aspects of multivariate data analysis will be treated: exploring the data, searching for correlations, clusters, outliers, ...

In day 2 we come to the model building part: searching for relations between groups of variables. Emphasis will be put on correctly applying and interpreting the different techniques, and not on underlying theory. The course matter can immediately be applied with real-life exercises on PC.