AI and occupational evaluation
AI’s role in evaluating job prestige and value spotlighted by new ILO research
The utility and accuracy of AI tools in evaluating a range of job occupations, in comparison to human assessments, has potential benefits such as efficiency, cost effectiveness and speed, but some obstacles need to be overcome.
2 February 2024
GENEVA (ILO News) – New research on how artificial intelligence (AI) evaluates the prestige and social value of occupations has been published by the International Labour Organization (ILO), shedding light on the potential for risks of using such methods for sociological and occupational research.
The paper, A Technological Construction of Society: Comparing GPT-4 and Human Respondents for Occupational Evaluation in the UK, compares evaluations of occupations made by GPT-4 (a type of Large Language Model (LLM) AI that can recognize and generate text) with those of a high-quality survey undertaken in the United Kingdom.
Occupational evaluation captures people’s perceptions of occupations in society. The researchers used the most universally applicable occupational classification – ILO’s International Standard Classification of Occupations (ISCO-08) – to organize jobs into clearly defined groups according to their tasks and duties.
First, human respondents in the UK were asked to rank the prestige and social value of a selection of occupations. Subsequently, GPT-4 was requested to provide a similar ranking, by taking the role of 100 random respondents with what it would consider an “average UK profile”. The human ratings were subsequently compared with such algorithmic views in order to understand how closely the AI system was able to predict human opinions, and whether its way of perceiving human views aligned with particular demographic groups.
The study found a high correlation between the results generated by the two different approaches. GPT-4 showed strong proficiency in predicting average UK views on the prestige and social value of individual occupations, and in using these predictions to create relative occupational rankings. This “algorithmic understanding” of general human opinions could potentially allow AI to be used for occupational research, with benefits including efficiency, cost effectiveness, speed, and accuracy in capturing general tendencies.
However, the study also revealed some issues. The AI model tended to overestimate the prestige and value of occupations associated with the digital economy or with strong marketing and sales components. It also underestimated, in comparison to the human evaluators, the prestige and social value given to some illicit or traditionally stigmatised occupations. In addition, the researchers manipulated the AI’s algorithmic instructions, showing that it was not able to understand the hierarchies of prestige and social value of occupations as perceived by demographic minorities in the UK context.
The paper cautions that current LLMs tend to primarily reflect the opinions of Western, educated, industrialized, rich and democratic (WEIRD) populations, which are a global demographic minority, but which have produced the majority of data on which such AI models have been trained. Therefore, while they can be a helpful complementary research tool – for example in processing large amount of unstructured text, voice and image inputs – they carry a serious risk of omitting the views of demographic minorities or vulnerable groups. The researchers argue that these limitations should be carefully considered in applying AI systems to the world of work, for example when providing career advice or conducting algorithmic performance evaluations.
The paper was co-authored by the ILO’s Paweł Gmyrek, Christoph Lutz, Norwegian Business School, and Gemma Newlands, Oxford Internet Institute.
The paper, A Technological Construction of Society: Comparing GPT-4 and Human Respondents for Occupational Evaluation in the UK, compares evaluations of occupations made by GPT-4 (a type of Large Language Model (LLM) AI that can recognize and generate text) with those of a high-quality survey undertaken in the United Kingdom.
Occupational evaluation captures people’s perceptions of occupations in society. The researchers used the most universally applicable occupational classification – ILO’s International Standard Classification of Occupations (ISCO-08) – to organize jobs into clearly defined groups according to their tasks and duties.
First, human respondents in the UK were asked to rank the prestige and social value of a selection of occupations. Subsequently, GPT-4 was requested to provide a similar ranking, by taking the role of 100 random respondents with what it would consider an “average UK profile”. The human ratings were subsequently compared with such algorithmic views in order to understand how closely the AI system was able to predict human opinions, and whether its way of perceiving human views aligned with particular demographic groups.
The study found a high correlation between the results generated by the two different approaches. GPT-4 showed strong proficiency in predicting average UK views on the prestige and social value of individual occupations, and in using these predictions to create relative occupational rankings. This “algorithmic understanding” of general human opinions could potentially allow AI to be used for occupational research, with benefits including efficiency, cost effectiveness, speed, and accuracy in capturing general tendencies.
However, the study also revealed some issues. The AI model tended to overestimate the prestige and value of occupations associated with the digital economy or with strong marketing and sales components. It also underestimated, in comparison to the human evaluators, the prestige and social value given to some illicit or traditionally stigmatised occupations. In addition, the researchers manipulated the AI’s algorithmic instructions, showing that it was not able to understand the hierarchies of prestige and social value of occupations as perceived by demographic minorities in the UK context.
The paper cautions that current LLMs tend to primarily reflect the opinions of Western, educated, industrialized, rich and democratic (WEIRD) populations, which are a global demographic minority, but which have produced the majority of data on which such AI models have been trained. Therefore, while they can be a helpful complementary research tool – for example in processing large amount of unstructured text, voice and image inputs – they carry a serious risk of omitting the views of demographic minorities or vulnerable groups. The researchers argue that these limitations should be carefully considered in applying AI systems to the world of work, for example when providing career advice or conducting algorithmic performance evaluations.
The paper was co-authored by the ILO’s Paweł Gmyrek, Christoph Lutz, Norwegian Business School, and Gemma Newlands, Oxford Internet Institute.
Related content
A Technological Construction of Society: Comparing GPT-4 and Human Respondents for Occupational Evaluation in the UK
ILO Working paper 102
A Technological Construction of Society: Comparing GPT-4 and Human Respondents for Occupational Evaluation in the UK