Mettl’s assessments have been the biggest filter in our recruitment process. Their platform has helped us reach out to a higher volume our applicant numbers. Mettl constantly keeps innovating on their products and tries to introduce a new aspect to everything.
Evaluation of human qualities, such as attitude, competency, proficiency, accomplishment and belief, among other constructs, is routinely conducted by administering psychometric tests, which are formulated and applied using standardized protocols. Test-takers are usually concerned with the results of a test administered to them as they don’t have a predisposition to assess the technical aspects or characteristics of the tool. Despite this, most consider the tests’ internal components because they realize that the results interpretation’s relevance and usefulness are dependent on its core features. In technical terms, these internal test attributes are termed as psychometric properties.
The psychometric properties of a test are associated with the data garnered from the assessment to determine how well it evaluates the interest construct. The development of a valid test is conditional on the fact that it has been subjected to statistical analyses, which ascertains that it has adequate psychometric properties.
A good psychometric test must have three fundamental properties- reliability, validity, and norming. Be it hiring or developing employees, choosing the correct set of assessments is pivotal in making or breaking a business.
Besides the psychometric validity and reliability of tests, the standardization of the assessments normed for various aspects, such as age, gender, education, profession, employability, etc. determines the properties of the tests.
Different psychometric properties provide distinct insights into a test’s meaningfulness, appropriateness and usefulness (or rather, its validity). Let’s say a test is publicized as an important measure for diagnosing a mental disorder such as bipolar disorder. The psychometric properties of a test present the test creators and users with satisfactory evidence of whether the tool performs as portrayed.
The psychometric property of a test focuses on its particular feature. Some psychometric characteristics speak volumes about the quality of the whole test, while others give weight to its constituent parts, sections and even individual items. For instance, when considered in totality, the psychometric properties of a test could reveal whether the test assesses a single construct or multiple constructs.
The psychometric characteristic of a test, analyzing only one dimension or multiple dimensions, is the psychometric property of the instrument. Another psychometric property of the test could point out whether the instrument evaluates the target construct reasonably well for both men and women. We can call this a psychometric property of gender equality. Yet other psychometric properties furnish evidence whether a test assesses a construct consistently (reliability).
Psychometric properties are most often expressed quantitatively. Numerical quantities such as a coefficient or an index are used to represent the property. For example, the reliability coefficient is a numerical value with which most students and professionals are familiar. Even though psychometric reliability is mentioned as one of the features of a test, it can be expressed in quantitative value.
Likewise, many other psychometric properties of a test are expressed numerically. Meanwhile, a quantifiable value is not often the best means to convey a specific psychometric property. For example, validity, being a hard fact, cannot be suggestively reduced to a single value or index. It is an encompassing psychometric property, but an exhaustive discussion that synopsizes a substantial body of evidence is required to explain test validity.
You can also read The Advantages & Disadvantages of Psychometric Testing.
A standardized test is administered and scored in a consistent or “standard” manner. They are designed to stabilize the questions, conditions for administering, scoring procedures and interpretations as consistent.
Standardized testing could consist of true-false, multiple-choice, authentic assessments or essays. It’s possible to shape any form of assessment into standardized tests. When creating psychometric evaluations, questions are measured in scales. And these also are often most valid with standardization post-creation.
Here are the three psychometric characteristics one must consider when creating/standardizing psychometric tests:
Psychometric reliability refers to the level to which test scores are accurate and free from measurement mistakes. In other words, does the test measure what it is supposed to measure? It is the consistency in the measurement tool to produce scores by which interpolations can be made.
For instance, a test measuring intelligence should yield the same score for the same person after completing the test each time within a short period in between (provided the test-taker has not changed in terms of their intelligence).
A test is reliable as long as it produces similar results over time, repeated administration, or similar circumstances.
If you were to use a professional dart player as an example, their ability to hit the designated target consistently, but not the bull’s eye under specified conditions, would classify them as an excellent and reliable player. However, this does not account for psychometric validity. Compared to psychometric assessments, a reliable test is better known for producing stable results over time.
Over the years, scholars and researchers uncovered multiple ways to check for psychometric reliability. Some include testing the same participants at different points of time or presenting the participants with varying versions of the same test to evaluate their consistency levels. An assessment must demonstrate excellent reliability to qualify for validity.
The four types of psychometric reliability are:
Psychometric tests are as reliable as any other medical test, sometimes more. However, there can be minor discrepancies in psychometric reliability due to individuals having different thoughts, feelings, or ideas at various points in the time, leading to variance in scores. Several factors (both stable traits and momentary issues) can result in variation in test scores.
Stable traits include weight, height, and other such characteristics. Momentary inconsistency is attributed to different things such as the health of the test-takers, an understanding of a particular test item and so forth.
Reliability is essential for psychometric tests. After all, it is irrelevant to have the same test yield different results each time, especially if scores can affect employee selection, retention and promotion.
Numerous factors influence the psychometric reliability of tests. The timing between two test sessions affects test-retest and alternate/parallel forms reliability. The similarity of content and expectations of subjects regarding different testing elements affects only the latter type of reliability along with split half and internal consistency.
Changes in subjects over time, such as their environment, physical state, emotional and mental well-being, must also be considered while assessing the reliability of psychometric tests. Test-based factors such as inadequate testing instructions, biased scoring, lacking objectivity and guessing on the part of the test-taker also influence the psychometric reliability of tests. Tests can generate reliable estimates sometimes and not so stable results other times (Geisinger, 2013).
The reliability of your test depends on the following factors:
Test designers construct questions of the psychometric test to assess mental quality (for example, motivation). The test questions’ difficulty level or the confusion they create through ambiguity can negatively influence reliability. Biases in interpreting the items and the errors in question construction can only be corrected if test instructions are properly implemented, and the redesign and research process is active and ongoing.
Administration of the test is another area where systemic errors can occur. Instructions accompanying the analysis should be precise and well-defined. Errors in the guidance provided to the test-takers or the administrators can have several adverse effects on the reliability of the test. Guidelines that affect accurate interpretation could lower test reliability.
Psychometric reliability also means that the test has a particular scoring system, by which interpreting the results is possible. All tests comprise instructions on scoring. Errors such as conclusions without basis or substantial proof can lower the reliability of the test. Test construction is associated with research to provide evidence for the conclusions drawn. If there is a systematic error in the test design phase, this can also impact reliability.
Extremes in temperature or audio-visual distractions can influence test scores’ reliability. Errors in administering the psychometric test can also impact the reliability of the scores obtained. Human error is equally possible, and interpretation or scoring can be influenced by the examiner’s attitude toward the test-taker.
The person being examined may suffer from social desirability concerns and give answers that do not reflect actual choices. Other factors that influence the test-takers include anxiety, bias and physical factors such as illness or sleep deprivation.
Psychometric validity is qualitatively defined as the test’s efficacy to measure what it claims to measure. Suffice to say, a test with high validity ensures the test items (questions) remain closely linked with the test’s intended focus.
It is understandable to expect a test used by organizations to inform on how a candidate would perform in a particular job. Considering the same, it is essential to reiterate the difference between psychometric reliability and validity, with the former being a prerequisite to the latter.
Let’s consider the same dart player. In repeated trials, they continue to miss the mark consistently by about two inches. Of course, this implies a reliable aim. Each shot hits the board in a region two inches from the target. It’s difficult not to question their validity as a professional – considering they don’t hit the bull’s eye as is the aim of all professional dart players – compared to their peers.
Psychometric validity and reliability together, but reliability by no means indicates the validity of a test. As our example suggests, having the first without the second hints at high but inaccurate consistency.
Even with a test that is both reliable and valid, the results are in question. An assessment fails without quantifiable results, but as often stated – human beings are far from measurable.
Psychometric validity is subjectively defined as the test’s capacity to measure what it claims to measure. It’s imperative to say that the high validity of a test guarantees the items remain firmly connected with the test’s intended core interest.
Psychometric tests are often normed against groups for comparison. It also avoids evaluating individual items or questions and instead observes the total score of an individual as compared to a representative sample for the same.
A representative sample means using a group of children when developing a test for children and an adult group when developing a test for adults. Based on the population, samples are generally made representative based on demographic factors such as age, gender, education, religion, etc.
This is primarily a standard practice because a psychometric test score of say 30 correct out of 40 is meaningless unless compared to the performance of others at a similar level on the same test. The practice of using relative scores gains more importance when interpreting ability test results.
When you get the 94th percentile on a trait such as extraversion, you know that you are more extraverted than 94% of the sample group from whom the test-makers derived the normal distribution.
Conversely, if you scored 94% on a math test, it implies that you marked about 94 in every 100 questions correctly.
However, it’s important to note that every test has an appropriate norm group. Data is better developed when the psychometrics is also within the context of the role. For example, if the role possessed numerical work, but without the time pressure in real-world scenarios, someone with below average results on numerical reasoning tests may benefit from the doubt.
Where possible, it also makes sense to take the candidate’s response style in interpreting percentile scores. It has to do with speed and accuracy, meaning some people may prefer a slower approach through ability tests, which are part of psychometrics, emphasizing precision. Others may cover ground on several items with lowered accuracy.
Psychological constructs such as personality have no right or wrong answers and can not be marked using percentages. This is why academics and researchers resort to norming, among other methods, to make sense of scores on personality assessments.
You can also read Understanding the Science Behind Psychometric Tests.
Mercer | Mettl helps organizations and recruiters make well-informed decisions about recruitment, training and development of potential and existing employees using reliable and valid psychometric tests.
Originally published April 12 2018, Updated September 22 2020