Mettl’s assessments have been the biggest filter in our recruitment process. Their platform has helped us reach out to a higher volume our applicant numbers. Mettl constantly keeps innovating on their products and tries to introduce a new aspect to everything.
A good psychometric test must have three fundamental properties- reliability, validity, and norming. Be it hiring or developing employees, choosing the correct set of assessments is pivotal in making or breaking a business.
In addition to the validity and reliability of tests, the standardization of the tests normed concerning several aspects, such as age, gender, occupation, employability, education, etc. also determines the properties of the assessments.
The reliability of a psychometric test refers to the consistency of scores obtained under the repeated testing of the same individual on the same test under identical conditions (including no changes within the person).
Since this ideal is impossible to meet, one aspires to collect evidence of reliability and express it in the form of a correlation coefficient that can range from .00 to 1.00. A perfectly reliable test would have a reliability coefficient (Cronbach’s alpha) of 1.00, and a completely unreliable test would have a reliability coefficient of .00.
For instance, a batsman at 4th position scored a century in each of the matches of a series. He, of course, is a reliable player, for he is consistent, however, can’t assure the victory.
The reliability of this test can be evaluated in two broad ways:
While taking the test, the test-takers might have noticed that several items aim to measure the same competency in this case. Well, this is intentional. This methodology ensures accuracy while measuring the concept.
For instance, to evaluate the level of satisfaction of your customers via your customer services, you should measure the overall satisfaction. Options should be on Likert scaling, varying from Strongly Agree to Strongly Disagree. Some of the specific items could portray:
If the survey turns out to be reliable or has good internal consistency, the answers would be the same for all the questions, be it strongly agree or disagree with all the three.
Unlike the former one, test-retest reliability is more of a time-dependent way to measure the reliability of an assessment. It makes sure that candidates respond to the items the same way each time they take the test. It uses a correlation (using Pearson coefficient) of scores from the first test and then the second test over time.
For instance, the IQ of a person does not suddenly change or experience a drastic jump. So, these IQ tests taken over each month would provide almost similar results on the same set of candidates. This will extract out the test-retest reliability of IQ tests.
Validity is subjectively defined as the test’s capacity to measure what it claims to measure. It’s imperative to say that a psychometric test with high validity guarantees the items remain firmly connected with the test’s intended core interest.
Let’s take the case of a weighing machine. A person weighing 75 kg each time ensures the machine is reliable; validity concerns if he is 75 kgs in weight.
Let’s go through the most important types of psychometric validity:
This tells us how precise a tool is at predicting a certain outcome and is the highest possible extent of validation. In our case, a better tool will be the one that predicts how well an individual will perform their job.
For instance, the predictive validity of the JEE (engineering entrance exam) is measured through the correlation of students’ JEE scores with their undergraduate scores. If high scorers in JEE perform better in their undergraduate than the ones who score low, then the JEE is predictively valid.
To establish the evidence of convergent validity, i.e., the second most precise validation methodology, scores on a test must relate to scores on other tests or variables that purport to measure similar traits or constructs.
For instance, IPL 2018 has seen the use of the Decision Review System (DRS), which is but another set of a hawk-eyed test of decisions like LBW and other such vital decisions, which either way the field umpires decide. If both the field umpire’s decision and decision through DRS comes out to be the same, this proves the validity of the game.
This, unlike the former, doesn’t concern the basic item of whether a test measures an attribute. It comes after the predictive and convergent has been applied. Instead, it is more oriented towards whether the test score interpretations are consistent with theoretical and observational terms around the construct.
The phenomenon to be measured must exist in the first place. There are different approaches (factor analysis and other correlational methods) compiled to generate the overall construct validity of a test.
For instance, you wish to develop a new measure to assess intelligence. Construct validity is found by ensuring the new measure precisely predicts the findings derived from the theory of intelligence.
Psychometric tests are frequently normed or standardized against groups for comparison. It likewise avoids looking at individual items or questions, and rather observe the total score of an individual in comparison with a representative sample.
Representative sample alludes to a group of similar people when developing a test. To make this point clear, we use a group of children when developing a test for children and an adult group when developing a test for adults.
Different test takers have very different performance levels, and therefore their scores differ quite a bit. The key attributes of norming are:
Assess the relative performance of test-takers
Brings value to the assessments
Indicates the real objective of a candidate among the pool
Leads to a better interpretation of results
Can provide primary care for top scorers
When you get a 97 percentile on a trait like openness to experience, you realize that you are essentially more open than 97% of the sample group, whereas 94% would directly mean you score 97 out of 100.
The different parameters of the demographic of norming sample subject to the gender are as follows:
Mercer | Mettl is a robust platform that provides psychometric tests for both recruiters and companies to make well-informed decisions of recruitment, training, and promotion of candidates/employees. The extensive library of tests and simulators can help you create your customized assessments to evaluate the underlying abilities and current skills of the shortlisted candidate. Explore a wide range of psychometric, cognitive, role-centric and technical assessments to onboard the right people.
Originally published December 21 2018, Updated June 16 2020
Abhilash works with the Content Marketing team of Mercer|Mettl. He has been contributing his bit to the world of online business for some years now. Abhilash is experienced in content marketing, along with SEO. He’s fond of writing useful posts, helping people, traveling, and savoring delicacies.