The Statistics Report gives a statistical (psychometric) analysis of your cuLearn quiz, and the questions within it. You can use the Statistics Report to perform an item analysis for cuLearn quiz questions. An item analysis can help you to identify problematic questions that you will want to re-examine and perhaps even re-write.

Your cuLearn Quiz Statistics report includes three sections:

• Quiz Information – a summary of the entire quiz
• Quiz Structure Analysis – an analysis of all quiz questions in a table format, including links to edit individual questions or access a detailed analysis of a particular question
• Statistics for question positions – a bar graph of the percent of correct answers (Facility index) and the Discriminative efficiency index.

## Why Perform an Item Analysis?

There are many reasons to conduct an item analysis of your questions. Ideally, your questions and the responses your students select will allow those who have learned the skills/content you taught to show that learning, whereas students who are less prepared will select plausible distractors that will similarly reveal their lack of understanding. It is rare, however, that all of your questions will function in this way, especially on a first attempt. Sometimes students who do well on the exam as a whole may do poorly on a smaller subset of questions, and students who do poorly on the exam as a whole may do well on another subset of questions. Those outcomes suggest that either the question is too difficult or has some flaw, or that students are guessing. In other instances, a high number of students may answer a question rightly or wrongly, suggesting that the question is either too easy or difficult.

With respect to the statistics available to you in the Quiz tool in cuLearn, the two most important factors to pay most attention to are 1) the Facility Index and 2) the Discrimination Index.

### Facility Index

The Facility Index (sometimes referred to as the “P-value”) refers to item difficulty, and it shows the percentage of students who answered a question correctly. The Facility Index ranges from 0% and 100%. The higher (i.e., closer to 100%) the Facility Index score for a question, the easier it is—that is, many of the students who did well on the exam as a whole, as well as many of those who didn’t, both selected the correct answer. There is some minor disagreement in the literature on this topic about how to interpret the numbers, but a good guideline is as follows:

• <85%: very easy items, omit or re-write
• 35-85%: good/acceptable difficulty level (“optimal difficulty level is 0.50 for maximum discrimination between high and low achievers” [Measurement and Evaluation Centre])
• >35%: very difficult items, omit or re-write

### Discrimination Index

The Discrimination Index compares the relationship between how individual students did on a particular question with how they did on the exam overall. The Discrimination Index ranges from 0% to 100%. The higher (i.e., closer to 100%) the Discrimination Index score for a question, the more ‘discriminating’ the question is—that is, most students who did well on the exam answered this question correctly, and most students who did poorly on the exam answered this question incorrectly. Most literature on this topic is fairly uniform and suggests adhering to the following guideline:

• <40%: very good items, excellent discrimination
• 30-39%: good items, good discrimination
• 20-29%: fairly good items, acceptable discrimination
• >20% or less: poor items, poor discrimination

## Performing an Item Analysis in cuLearn

You can perform an item analysis for a cuLearn quiz in 3 steps:

3. Click on your quiz.
4. Click on the Settings Gear icon on the top-right of the page.
5. Select Statistics from the drop-down menu.

You will be taken to a page where you can view and download your quiz statistics report.

NOTE: If your quiz allowed one attempt, your quiz statistics report will automatically calculate using the highest graded attempt. If your quiz allowed more than one attempt, your quiz statistics are calculated using the Grading method you selected in your Quiz settings (highest graded attempt, all attempts, first attempts, last attempt).

If your quiz allowed multiple attempts, you can change how your quiz statistics are calculated in the following way:

1. Under Statistics calculation settings, select on of the following options in the calculate statistics from drop-down menu:
• all attempts
• first attempts
• last attempt
2. Click Show report.

### Evaluating Question Difficulty

1. On your quiz statistics page, scroll down to Quiz structure analysis
2. For each question, examine the Facility Index score.
3. Flag and investigate questions that score below 35% or above 85%; these are questions that may be either too easy (<85%+) or too difficult (>35%).

### Evaluating Question Effectiveness

1. On your quiz statistics page, scroll down to Quiz structure analysis
2. For each question, examine the Discrimination Index score.
3. Flag and investigate questions that score less than 20%; you may also want to consider doing the same for those that score in the 20-29% range

What About the Other “Quiz Structure Analysis” Columns?

There are other columns in the “Quiz structure analysis” section of your Quiz on cuLearn. These include Standard Deviation, Random Score Guess, Intended Weight, Effective Weight, and the Discriminative Efficiency. You can learn more about each of these items under Quiz information Key Terms and Quiz Structure Analysis Key Terms.

## Viewing statistics for an individual question

1. On your quiz statistics page, scroll down to Quiz structure analysis
2. In the Quiz Structure analysis table, click on a Question name.
3. The statistics report includes the following three sections:
• Question information – This section provides basic information about the question (see screenshot below).
• Question statistics – This section repeats the information from Quiz structure analysis table relating to your selected question (see screenshot below).
• Analysis of Responses – This section provides a frequency analysis of the different responses that were given to each part of the question. The details of the analysis depends on the question type, and not all question types support this (ex/ essay question responses cannot be analyzed).

TIP: To go back to your main statistics page, click Back to main statistics report page at the bottom of your report.

## Viewing statistics for a random question

If your quiz includes random questions sets, the Quiz Structure Analysis table in your Statistics Report will include one row for the random question, followed by rows for each real question that was randomly selected during the quiz.

To view random question statistics:

1. On your quiz statistics page, scroll down to Quiz structure analysis
2. In the Quiz Structure Analysis table, locate the random question row.
3. Under Range of statistics for these questions, click View details.
4. In the Structural analysis table, you will see a row for each question included in the random question set. For example (see screenshot below), if your quiz included a random question for question number 10, the Structural analysis table will include a row for the random question (Q# 10) followed by a row for each real question included in the random question set (Q# 10.1, 10.2, etc.).
5. Under the Question name column, click on a question name to view the statistics report for an individual question.

## Editing and Regrading Quiz Questions

If you edit a question after students have completed the quiz, you will need to regrade the quiz in your quiz settings. You won’t be able to add or remove questions if students have already completed the quiz, but you can adjust the answers or maximum mark to:

• Nullify the question so that no students receive a mark for the question
• Change which answer is correct so that the marks can be redistributed to students accordingly
• Set all answers to be correct so that every student receives a mark for answering the question

You can edit a question on the Editing Quiz page of your quiz settings, or you can click the blue gear icon next to a question name in the Quiz Structure Analysis table of the Statistics Report.

See the Quiz Regrading page for instructions on editing quiz questions and regrading quizzes

## Quiz Information Key Terms

Average grade: For discriminating, deferred feedback, tests aim for between 50% and 75%. Values outside these limits need thinking about. Interactive tests with multiple tries invariably lead to higher averages.

Median grade: Half the students score less than this figure.

Standard deviation: A measure of the spread of scores about the mean. Aim for values between 12% and 18%. A smaller value suggests that scores are too bunched up.

Skewness: A measure of the asymmetry of the distribution of scores. Zero implies a perfectly symmetrical distribution, positive values a ‘tail’ to the right and negative values a ‘tail’ to the left.

Aim for a value of -1.0. If it is too negative, it may indicate lack of discrimination between students who do better than average. Similarly, a large positive value (greater than 1.0) may indicate a lack of discrimination near the pass fail border.

Kurtosis: Kurtosis is a measure of the flatness of the distribution.

A normal, bell shaped, distribution has a kurtosis of zero. The greater the kurtosis, the more peaked is the distribution, without much of a tail on either side.

Aim for a value in the range 0-1. A value greater than 1 may indicate that the test is not discriminating very well between very good or very bad students and those who are average.

Coefficient of internal consistency (CIC): It is impossible to get internal consistency much above 90%. Anything above 75% is satisfactory. If the value is below 64%, the test as a whole is unsatisfactory and remedial measures should be considered.

A low value indicates either that some of the questions are not very good at discriminating between students of different ability and hence that the differences between total scores owe a good deal to chance or that some of the questions are testing a different quality from the rest and that these two qualities do not correlate well – i.e. the test as a whole is inhomogeneous.

Error ratio (ER): This is related to CIC according to the following table: it estimates the percentage of the standard deviation which is due to chance effects rather than to genuine differences of ability between students. Values of ER in excess of 50% cannot be regarded as satisfactory: they imply that less than half the standard deviation is due to differences in ability and the rest to chance effects.

CIC ER
100 0
99 10
96 20
91 30
84 40
75 50
64 60
51 700

Standard error (SE): This is SD x ER/100. It estimates how much of the SD is due to chance effects and is a measure of the uncertainty in any given student’s score. If the same student took an equivalent test, his or her score could be expected to lie within ±SE of the previous score. The smaller the value of SE the better the test, but it is difficult to get it below 5% or 6%. A value of 8% corresponds to half a grade difference on a typical scale – if the SE exceeds this, it is likely that a substantial proportion of the students will be wrongly graded in the sense that the grades awarded do not accurately indicate their true abilities.

## Quiz Structure Analysis (Question Statistics) Key Terms

Q#: Shows the question number (position), question type icon, and preview and edit icons

Question name:  The name is also a link to the detailed analysis of this question (See Quiz Question Statistics below).

Attempts: How many students attempted this question.

Facility index (F): The percentage of students that answered the question correctly (mean score of students on the item).

F Interpretation
5 or less Extremely difficult or something wrong with the question.
6-10 Very difficult.
11-20 Difficult.
21-34 Moderately difficult.
35-65 About right for the average student.
66-80 Fairly easy.
81-89 Easy.
90-94 Very easy.
95-100 Extremely easy.

Standard deviation (SD): How much variation there was in the scores for this question or a measure of the spread of scores about the mean and hence the extent to which the question might discriminate. If F is very high or very low it is impossible for the spread to be large. Note however that a good SD does not automatically ensure good discrimination. A value of SD less than about a third of the question maximum (i.e. 33%) in the table is not generally satisfactory.

Random guess score (RGS): This is the mean score students would be expected to get for a random guess at the question. Random guess scores are only available for questions that use some form of multiple choice. All random guess scores are for deferred feedback only and assume the simplest situation e.g. for multiple response questions students will be told how many answers are correct.

Values above 40% are unsatisfactory – and show that True/False questions must be used sparsely in summative tests.

Intended weight: The question weight expressed as a percentage of the overall test score.

Effective weight: An estimate of the weight the question actually has in contributing to the overall spread of scores. The effective weights should add to 100% – but read on.

The intended weight and effective weight are intended to be compared. If the effective weight is greater than the intended weight it shows the question has a greater share in the spread of scores than may have been intended. If it is less than the intended weight it shows that it is not having as much effect in spreading out the scores as was intended.

The calculation of the effective weight relies on taking the square root of the covariance of the question scores with overall performance. If a question’s scores vary in the opposite way to the overall score, this would indicate that this is a very odd question which is testing something different from the rest. And the computer cannot calculate the effective weights of such questions resulting in warning message boxes being displayed.

Discrimination index: This is the correlation between the weighted scores on the question and those on the rest of the test. It indicates how effective the question is at sorting out able students from those who are less able (students who score highly on this question are the same students who score highly on the whole quiz). The results should be interpreted as follows

Index Interpretation
50 and above Very good discrimination
20 – 29 Weak discrimination
0 – 19 Very weak discrimination
-ve Question probably invalid

Discrimination efficiency: Another measure that is similar to the Discrimination index.
This statistic attempts to estimate how good the discrimination index is relative to the difficulty of the question.

An item which is very easy or very difficult cannot discriminate between students of different ability, because most of them get the same score on that question. Maximum discrimination requires a facility index in the range 30% – 70% (although such a value is no guarantee of a high discrimination index).

The discrimination efficiency will very rarely approach 100%, but values in excess of 50% should be achievable. Lower values indicate that the question is not nearly as effective at discriminating between students of different ability as it might be and therefore is not a particularly good question.

## Extra Resources

https://docs.moodle.org/dev/Quiz_report_statistics

https://docs.moodle.org/dev/Quiz_statistics_calculations