Get awesome marketing content related to Hiring & L&D in your inbox each week

Stay up-to-date with the latest marketing, sales, and service tips and news
English Test

Talent Assessment | 6 Min Read

SpeechX: An AI-powered Automated English Evaluation Tool for BPOs


A globally interconnected marketplace has lent English considerable clout in providing a level-playing field to exchange information and ideas. The primacy of English-speaking skills in any consumer-centric role in the service sector, cutting across industries, from hospitality, retail, travel, and tourism to insurance, sales, and banking, is paramount. It plays a critical role in the right communication of a product’s nuances or service up for sale. Therefore, it is of notable importance among recruiters scouting for the right talent to further their organization’s commercial objectives. The argument especially holds true for BPO players, who must hire the right set of candidates with apt English-speaking skills to maintain a favorable consumer interface. They are indeed on the lookout for people to become the first line of contact for their existing and probable consumers. The scope of an error is virtually non-existent.

Hiring is a complicated and multi-layered process and demands to summon the very best of human and technological skills. From managing the logistical aspect to ensuring the sanctity of the exercise itself, hiring is complicated and involves several stages.

In the established process, BPO companies evaluate the candidates’ English-speaking skills by hiring Voice and Accent (VNA) trainers/assessors. They are professionals trained in a specific English framework called the ‘Common European Framework of Reference for Languages’ (CEFR) (For the uninitiated, CEFR is a set of guidelines used to determine the achievements of learners of foreign languages throughout Europe and now, increasingly, the world over.) While these trainers/assessors may be trained on other frameworks, they are primarily trained on the CEFR framework, a common standard for English evaluation guidelines.

How Do VNA Trainers Assess Prospective Hires?

VNA trainers evaluate the English-speaking ability of a prospective hire from several perspectives. It includes pronunciation, fluency, accent, and grammar. A VNA trainer/assessor scrutinizes errors in the criteria, as mentioned above, to understand whether a candidate can be trained in the discrepancies. VNA trainers evaluate and focus on discrepancies that have been divided into trainable and non-trainable errors. For instance, it would be counted as a non-trainable error if a candidate says


would constitute a non-trainable error. These fatal/non-trainable errors may also be grammatical in nature.

Conversely, trainable/non-fatal errors are the ones that can be worked upon and addressed in a relatively lesser amount of time. For instance, a definite ‘T’ or ‘D’ sound while speaking are examples of non-fatal and trainable errors. This process enables VNAs to ascertain the English-speaking and comprehension skills of candidates to determine their suitability for employment.

However, this process is not rigid and has its nuances. Over the years, each company/organization has built and developed its own set of nuances and best practices to make hiring a seamless and hassle-free exercise. They subject candidates to multiple rounds of screening using VNA trainers/assessors who have a defined set of processes to make hiring decisions.

Challenges With The Existing Process

Scaling for Mass-Hiring Is Resource-Intensive:

The process of employing VNA trainers/assessors by BPOs works exceptionally well for hiring and, therefore, has rightly been the established norm for a long time. Its efficacy in hiring at a smaller-scale is undisputed, but BPOs often struggle when planning to hire at a larger-scale. Large-scale hiring by using VNA trainers/assessors poses a unique set of challenges. Scaling the process is both resource-intensive and time-consuming, and is highly likely to make the entire recruitment process sluggish, consequently adversely impacting business plans and the company’s balance sheet.

Human-Led Methods Are Prone to Bias:

Not to speak blithely of their efforts, but a human-led process by VNA trainers is certainly not free from bias as every VNA trainer has his/her inherent understanding, likes and dislikes. Given the human-led intervention, the visible lack of consistency across multiple trainers is also a challenge in hiring at scale. As mentioned, the BPO industry is always faced with high attrition rates and is, therefore, perennially in the hiring mode. Such a labor-intensive process is prone to errors. It may dilute the outcome, inadvertently lowering the quality of the hire.

Given these constraints in scaling the VNA-led process quickly and in a financially viable manner, BPOs usually rely on automated English assessment tools to aid the evaluation process for hiring the right set of candidates.

Is There an Answer to the Challenge of Conducting Mass Assessment for Customer-facing Roles in BPOs? 

BPOs routinely face the challenges mentioned above. However, there is a tool available on the marketplace to address these pain-points by auto-evaluating the candidates’ English-speaking skills and simulating the VNA experience. An Artificial Intelligence (AI)-based automatic English evaluation tool can bypass the problem faced by BPOs in undertaking mass hiring by employing VNA trainers/assessors.

Introducing SpeechX

Mercer | Mettl is going to unveil an innovative tool called ‘SpeechX’ to address these existing challenges of the BPO industry. Powered by reliable Artificial Intelligence Speech Technology, this assessment tool is fully machine-administered and auto-graded to test a non-native speaker’s ability to speak and understand English. It is a scalable means of assessing prospective hires’ capability with a high level of accuracy by simulating a VNA trainer. It is also a beneficial and ready-to-use assessment solution for corporate houses to hire for critical client-facing roles and sales profiles.

SpeechX is a video-proctored assessment that analyzes a candidate’s proficiency across two key dimensions – the ability to listen and to articulate clearly. It reviews linguistics to identify correct and incorrect information in the candidate’s speech and detects errors in reading sentences and extempore speech. It also undertakes para-linguistic voice analytics to measure the quality and clarity of a candidate’s statements.

Why Does Speechx Simulate a VNA Trainer? 

As a VNA trainer determines a candidate’s employability based on his/her guidelines designed on the CEFR framework, SpeechX, too, simulates a VNA trainer. SpeechX provides a rating to the candidate on the lines of a VNA trainer to establish the level of the candidate’s English proficiency and employability. It checks for critical parameters of pronunciation, grammar, fluency, and listening skills, factoring in the nuances examined by VNA trainers, thereby simulating the entire process.

SpeechX And CEFR

The CEFR framework is one of the many frameworks to assess a non-native English speaker’s English-speaking proficiency. It was designed at the beginning of the 1990s by the Council of Europe to foster continent-wide collaboration among language teachers. It is interesting to note that as the framework was designed for all European languages, it can also be applied to test a person’s French or Spanish-speaking skills!

The framework has three core dimensions of language activities, domains in which they occur and competencies drawn when engaged. It principally divides learners into three segments, which is then further divided into six levels of language proficiency. The three segments include basic users, independent users, and proficient users. A basic user is also categorized into two levels, i.e., A1 (beginner) and A2 (elementary). An independent user is classified into B1 (intermediate) and B2 (upper-intermediate). Similarly, a proficient user is bracketed into C1 (advanced) and C2 (proficiency). A CEFR test determines the competency level and provides a score to the candidate, which is utilized to decide his/her employability.

SpeechX has incorporated the elements of CEFR and condensed them into four components of listening, comprehension, grammar, and fluency.


To ascertain trainable and non-trainable errors.


To understand whether a candidate can speak fluent English while conversing.


To verify a candidate’s level of understanding of grammar and to detect trainable and non-trainable errors.

Listening Comprehension:

A candidate is subjected to listening comprehension and assessed on his/her ability to listen and comprehend.

High Accuracy with Carnegie Speech

SpeechX uses Proprietary Speech Analytics and Carnegie Speech’s patented Speech Recognition Engine and Pinpointing Technology. It is combined with proprietary voice analytics. With over thirty years of experience in undertaking assessments, the platform’s robustness needs no further validation. It can listen to a non-native speaker of English and determine even the individual sound or phoneme level to identify errors. As a result, Carnegie Speech’s engine is often referred to as the most accurate system on the market by universities in the USA, including Stanford, Yale, and Northwestern, to name a few. By processing hundreds of millions of speech assessments every year from countries, the world over, Carnegie Speech has gained unparalleled expertise in Speech Technology.

Best Practices of Customer-Facing Roles

SpeechX has been designed after assimilating insights from various SMEs, VNA trainers, and BPO industry experts to combine the best practices of the BPO industry. On the lines of a VNA trainer who assesses trainable and non-trainable errors, SpeechX flags whether the gaps shown in the evaluation of a candidate are trainable.

While evaluating pronunciation, it checks for more than eight critical non-trainable mistakes. SpeechX reviews the fluency by measuring it across more than ten dimensions, such as speaking, prosody, intonation, pausing, etc. SpeechX evaluates a candidate’s understanding of grammar rules. Further, it assesses a candidate’s listening ability, looking for fact and inference-based knowledge.

Challenges With The Existing Tools On The Market

1. Impersonation and Cheating:

Firstly, impersonation and fraud are intrinsic to such large-scale recruitments. Despite the best intentions of VNA assessors, it is a fact that such means are employed. The use of IVR by current tools available on the market is susceptible to cheating as they cannot be monitored.

2. Accuracy:

There have been several reports of glaring discrepancies between the VNA and the existing tools’ results. This dichotomy raises a question on the validity of the test and calls for a re-evaluation.

3. The Lack of Ease of Use:

The IVR-based setup is challenging to administer, and the user experience often suffers too. It also requires considerable logistics, which can be time-consuming and resource-intensive for organizations.

How SpeechX Solves These Pain-Points

1. Impersonation and Cheating:

SpeechX uses AI-based video monitoring to deter candidates from using unfair means, thereby ensuring the sanctity of the exercise. It auto-generates cheating flags by using AI-proctoring technology.

2. Accuracy:

SpeechX is powered by Carnegie Speech’s world-class speech evaluation and recognition technology. This patented and reliable technology ensures a high degree of accuracy in the assessment results.

3. The Lack of Ease-of-Use:

It is a computer-based process and does not face challenges enumerated in the IVR process. It is a smoother and hassle-free experience overall.

The Key elements/USPs of SpeechX

1. A Detailed Report:

A candidate’s performance is summarized in a comprehensive, actionable, and objective report, immediately available for action. The report can be accessed in real-time and compared across multiple applicants and business and educational enterprises. It includes a CEFR and SpeechX score.

2. Accessibility:

It is accessible on the cloud. Therefore, test results can be accessed at a moment’s notice, without the hassle of maintaining logistics for the same. It can be used on computers and smartphones. Thus, it provides a high degree of mobility to the candidates.

The Mercer | Mettl Way

SpeechX is a part of a holistic suite of assessments. Mercer | Mettl’s suite of solutions are geared to provide you comprehensive, 360-degree assessments. These include psychometric, domain, language proficiency, and cognitive tools. A gamut of these offerings ensures that you are provided with all the possible means of assessment at one place with the highest levels of safety, security, and convenience to hire the right candidate by making better talent measurements.


Changing times calls for newer methods of talent assessment. The BPO industry is one of the few sectors amid a constant influx of the workforce. It is challenged with a considerably high attrition rate, as much as ten percent higher than the industry average of 35 percent. It requires a cost-effective, smart, and time-saving mechanism to improve the efficiency and lower input costs. With an AI-backed solution, companies can be assured of hiring the right talent at scale, without exerting themselves financially or otherwise.

Originally published August 14 2020, Updated November 18 2020


Written by

Shashank has been working in the publishing and online industry for eight-plus years now. He has donned many hats and has reported on diverse industry verticals, including aviation, tourism, hospitality, etc. He is currently the senior editor at Mercer | Mettl.

Would you like to comment?


Please write a comment before submitting


Thanks for submitting the comment. We’ll post the comment once its verified.

Get awesome marketing content related to Hiring & L&D in your inbox each week

Stay up-to-date with the latest marketing, sales, and service tips and news