X

Get awesome marketing content related to Hiring & L&D in your inbox each week

Stay up-to-date with the latest marketing, sales, and service tips and news
Norming-Reliability-and-Validity-of-Psychometric-Tests

Talent Assessment | 2 Min Read

Psychometric Properties of a Test: Reliability, Validity and Norming

Psychometric Properties: An Introduction

Evaluation of human qualities, such as attitude, competency, proficiency, accomplishment and belief, among other constructs, is routinely conducted by administering psychometric tests, which are formulated and applied using standardized protocols. Test-takers are usually concerned with the results of a test administered to them as they don’t have a predisposition to assess the technical aspects or characteristics of the tool. Despite this, most consider the tests’ internal components because they realize that the results interpretation’s relevance and usefulness are dependent on its core features. In technical terms, these internal test attributes are termed as psychometric properties. 

The psychometric properties of a test are associated with the data garnered from the assessment to determine how well it evaluates the interest construct. The development of a valid test is conditional on the fact that it has been subjected to statistical analyses, which ascertains that it has adequate psychometric properties.

A good psychometric test must have three fundamental properties- reliability, validity, and norming. Be it hiring or developing employees, choosing the correct set of assessments is pivotal in making or breaking a business.

Besides the psychometric validity and reliability of tests, the standardization of the assessments normed for various aspects, such as age, gender, education, profession, employability, etc. determines the properties of the tests.

Why Are Psychometric Properties Important?

Different psychometric properties provide distinct insights into a test’s meaningfulness, appropriateness and usefulness (or rather, its validity). Let’s say a test is publicized as an important measure for diagnosing a mental disorder such as bipolar disorder. The psychometric properties of a test present the test creators and users with satisfactory evidence of whether the tool performs as portrayed. 

The psychometric property of a test focuses on its particular feature. Some psychometric characteristics speak volumes about the quality of the whole test, while others give weight to its constituent parts, sections and even individual items. For instance, when considered in totality, the psychometric properties of a test could reveal whether the test assesses a single construct or multiple constructs.

The psychometric characteristic of a test, analyzing only one dimension or multiple dimensions, is the psychometric property of the instrument. Another psychometric property of the test could point out whether the instrument evaluates the target construct reasonably well for both men and women. We can call this a psychometric property of gender equality. Yet other psychometric properties furnish evidence whether a test assesses a construct consistently (reliability). 

Psychometric properties are most often expressed quantitatively. Numerical quantities such as a coefficient or an index are used to represent the property. For example, the reliability coefficient is a numerical value with which most students and professionals are familiar. Even though psychometric reliability is mentioned as one of the features of a test, it can be expressed in quantitative value. 

Likewise, many other psychometric properties of a test are expressed numerically. Meanwhile, a quantifiable value is not often the best means to convey a specific psychometric property. For example, validity, being a hard fact, cannot be suggestively reduced to a single value or index. It is an encompassing psychometric property, but an exhaustive discussion that synopsizes a substantial body of evidence is required to explain test validity.

One must explore and learn about the different psychometric properties of tests for two key reasons:

  • The knowledge enables makers to create useful tests. Psychometricians and other experts who create tests must analyze and describe their functionality to build them to a predefined quality level.
  • The awareness about the different psychometric properties of a test ensures that the information gained using the instrument could provide a firm foundation for making the right decisions. It stands to reason that counselors, psychologists, policy personnel, educators and several other professionals often formulate their decisions on the data collected from the tests.

What Are Psychometric Properties of Standardized Tests?

A standardized test is administered and scored in a consistent or “standard” manner. They are designed to stabilize the questions, conditions for administering, scoring procedures and interpretations as consistent.

Standardized testing could consist of true-false, multiple-choice, authentic assessments or essays. It’s possible to shape any form of assessment into standardized tests. When creating psychometric evaluations, questions are measured in scales. And these also are often most valid with standardization post-creation.

Here are the three psychometric characteristics one must consider when creating/standardizing psychometric tests:

 psychometric characteristics

What Is Reliability in Psychometrics?

Psychometric reliability refers to the level to which test scores are accurate and free from measurement mistakes. In other words, does the test measure what it is supposed to measure? It is the consistency in the measurement tool to produce scores by which interpolations can be made.

For instance, a test measuring intelligence should yield the same score for the same person after completing the test each time within a short period in between (provided the test-taker has not changed in terms of their intelligence).

A test is reliable as long as it produces similar results over time, repeated administration, or similar circumstances.

If you were to use a professional dart player as an example, their ability to hit the designated target consistently, but not the bull’s eye under specified conditions, would classify them as an excellent and reliable player. However, this does not account for psychometric validity. Compared to psychometric assessments, a reliable test is better known for producing stable results over time.

Over the years, scholars and researchers uncovered multiple ways to check for psychometric reliability. Some include testing the same participants at different points of time or presenting the participants with varying versions of the same test to evaluate their consistency levels. An assessment must demonstrate excellent reliability to qualify for validity.

What Are the Four Types of Reliability?

The four types of psychometric reliability are:

  • Parallel Forms Reliability: The two different tests use the same content but separate procedures or equipment, and yield the same result for each test-taker.
  • Internal Consistency Reliability: Items within the test are examined to see if they appear to measure what the test measures. Internal reliability between test items is referred to as internal consistency.
  • Inter-Rater Reliability: When two raters score the psychometric test in the same manner, inter-scorer consistency is high.
  • Test-Retest Reliability: This is when the same test is conducted over time, and the test-taker displays consistency in scores over multiple administrations of the same test.

Is Psychometric Testing Reliable?

Psychometric tests are as reliable as any other medical test, sometimes more. However, there can be minor discrepancies in psychometric reliability due to individuals having different thoughts, feelings, or ideas at various points in the time, leading to variance in scores. Several factors (both stable traits and momentary issues) can result in variation in test scores.

Stable traits include weight, height, and other such characteristics. Momentary inconsistency is attributed to different things such as the health of the test-takers, an understanding of a particular test item and so forth.

Why Psychometric Test Reliability Counts?

Reliability is essential for psychometric tests. After all, it is irrelevant to have the same test yield different results each time, especially if scores can affect employee selection, retention and promotion.

Errors in Reliability

Psychometricians identify two different categories of errors:

  • Systematic errors: These are factors that impact test construction and are inbuilt in the test.
  • Unsystematic errors: These are errors resulting from random factors such as how the test is given or taken.

Numerous factors influence the psychometric reliability of tests. The timing between two test sessions affects test-retest and alternate/parallel forms reliability. The similarity of content and expectations of subjects regarding different testing elements affects only the latter type of reliability along with split half and internal consistency.

Changes in subjects over time, such as their environment, physical state, emotional and mental well-being, must also be considered while assessing the reliability of psychometric tests. Test-based factors such as inadequate testing instructions, biased scoring, lacking objectivity and guessing on the part of the test-taker also influence the psychometric reliability of tests. Tests can generate reliable estimates sometimes and not so stable results other times (Geisinger, 2013).

The reliability of your test depends on the following factors:

Construction of Items/ Questions

Test designers construct questions of the psychometric test to assess mental quality (for example, motivation). The test questions’ difficulty level or the confusion they create through ambiguity can negatively influence reliability. Biases in interpreting the items and the errors in question construction can only be corrected if test instructions are properly implemented, and the redesign and research process is active and ongoing.

Administration

Administration of the test is another area where systemic errors can occur. Instructions accompanying the analysis should be precise and well-defined. Errors in the guidance provided to the test-takers or the administrators can have several adverse effects on the reliability of the test. Guidelines that affect accurate interpretation could lower test reliability.

Scoring

Psychometric reliability also means that the test has a particular scoring system, by which interpreting the results is possible. All tests comprise instructions on scoring. Errors such as conclusions without basis or substantial proof can lower the reliability of the test. Test construction is associated with research to provide evidence for the conclusions drawn. If there is a systematic error in the test design phase, this can also impact reliability.

Environmental Factors

Extremes in temperature or audio-visual distractions can influence test scores’ reliability. Errors in administering the psychometric test can also impact the reliability of the scores obtained. Human error is equally possible, and interpretation or scoring can be influenced by the examiner’s attitude toward the test-taker.

Test-Taker

The person being examined may suffer from social desirability concerns and give answers that do not reflect actual choices. Other factors that influence the test-takers include anxiety, bias and physical factors such as illness or sleep deprivation.

How to Overcome Psychometric Test Reliability Issues?

  • Test Length: Increasing the test length can improve reliability. The longer the test, the greater the reliability.
  • Speed Test: Speed versus power in psychometric testing is an age-old debate. Speed tests are designed to ensure all students cannot complete the items. Power tests provide items of average difficulty and ensure that students have ample opportunity to complete the psychometric test. Test-takers can be evaluated with reliability if a test has items that can be completed. Speed tests cannot be measured using internal consistency, parallel form or the test-retest method.
  • Group Homogeneity: This is another factor whereby the more heterogeneous the test scores, the more reliable will be their measure.
  • Item Difficulty: When there is low variability among test scores, reliability decreases. An easy test that can be completed by every test-taker does not serve as a measure of individual differences.</span

What Is Psychometric Validity?

Psychometric validity is qualitatively defined as the test’s efficacy to measure what it claims to measure. Suffice to say, a test with high validity ensures the test items (questions) remain closely linked with the test’s intended focus.

It is understandable to expect a test used by organizations to inform on how a candidate would perform in a particular job. Considering the same, it is essential to reiterate the difference between psychometric reliability and validity, with the former being a prerequisite to the latter.

Let’s consider the same dart player. In repeated trials, they continue to miss the mark consistently by about two inches. Of course, this implies a reliable aim. Each shot hits the board in a region two inches from the target. It’s difficult not to question their validity as a professional – considering they don’t hit the bull’s eye as is the aim of all professional dart players – compared to their peers.

Psychometric validity and reliability together, but reliability by no means indicates the validity of a test. As our example suggests, having the first without the second hints at high but inaccurate consistency.

psychometric reliability, psychometric validity

Even with a test that is both reliable and valid, the results are in question. An assessment fails without quantifiable results, but as often stated – human beings are far from measurable.

Psychometric validity is subjectively defined as the test’s capacity to measure what it claims to measure. It’s imperative to say that the high validity of a test guarantees the items remain firmly connected with the test’s intended core interest.

What Are the Four Types of Validity?

The four types of validity are:

  • Content Validity: Is the content appropriate and characterizes all aspects of the construct.
  • Construct Validity: How well the test measures a particular construct that it is designed to measure.
  • Face Validity: Does the test appear to measure what it intends to measure, even on the surface?
  • Criterion Validity: Do the results of the test correspond to a benchmark test?

The Importance of Norming in Psychometric Tests

Psychometric tests are often normed against groups for comparison. It also avoids evaluating individual items or questions and instead observes the total score of an individual as compared to a representative sample for the same.

A representative sample means using a group of children when developing a test for children and an adult group when developing a test for adults. Based on the population, samples are generally made representative based on demographic factors such as age, gender, education, religion, etc.

This is primarily a standard practice because a psychometric test score of say 30 correct out of 40 is meaningless unless compared to the performance of others at a similar level on the same test. The practice of using relative scores gains more importance when interpreting ability test results.

When you get the 94th percentile on a trait such as extraversion, you know that you are more extraverted than 94% of the sample group from whom the test-makers derived the normal distribution.
Conversely, if you scored 94% on a math test, it implies that you marked about 94 in every 100 questions correctly.

However, it’s important to note that every test has an appropriate norm group. Data is better developed when the psychometrics is also within the context of the role. For example, if the role possessed numerical work, but without the time pressure in real-world scenarios, someone with below average results on numerical reasoning tests may benefit from the doubt.

Where possible, it also makes sense to take the candidate’s response style in interpreting percentile scores. It has to do with speed and accuracy, meaning some people may prefer a slower approach through ability tests, which are part of psychometrics, emphasizing precision. Others may cover ground on several items with lowered accuracy.

Psychological constructs such as personality have no right or wrong answers and can not be marked using percentages. This is why academics and researchers resort to norming, among other methods, to make sense of scores on personality assessments.

You can also read Understanding the Science Behind Psychometric Tests.

How Mercer | Mettl Can Help

Mercer | Mettl helps organizations and recruiters make well-informed decisions about recruitment, training and development of potential and existing employees using reliable and valid psychometric tests.

Mercer | Mettl Tools

Originally published April 12 2018, Updated September 22 2020

Would you like to comment?

X

Please write a comment before submitting

X

Thanks for submitting the comment. We’ll post the comment once its verified.

Related posts

Get awesome marketing content related to Hiring & L&D in your inbox each week

Stay up-to-date with the latest marketing, sales, and service tips and news