Teamwork Situational Judgement Test (SJT-TW)


The Teamwork Situational Judgment Test (SJT-TW; Gatzka & Volmer, 2017) consists of 12 situation descriptions with four response options and assesses individual teamwork effectiveness. So far, only a German version of this test exists. To translate the SJT-TW to English, we utilized the TRAPD procedure (Harkness, 2003). TRAPD is an acronym for several steps needed to produce high-quality translations of questionnaires, namely translation, review, adjudication, pretesting and documentation. Results from a pilot study provide preliminary evidence for item and test score properties of the translated test when compared to the original German version. less

  • Language Documentation: English
  • Language Items: English
  • Number of Items: 12
  • Survey Mode: paper-pencil, CASI
  • Processing Time: 10 minutes
  • Reliability: Cronbach’s alpha = .52
  • Validity: no validity evidence for the translated version
  • Construct: teamwork effectiveness
  • Catchwords: teamwork, team spirit, SJT
  • Item(s) used in Representative Survey: no
  • URL Data archive:
  • Status of Development: tried


In contrast to the German test version, we only asked participants to pick the best out of four response options. The German version additionally asks for the worst response option. We chose this shortened instruction to reduce the test duration in an initial pilot study. The complete instruction is presented within square brackets and in grey font color.

We recommend a computer-assisted format to administer the test. A computer-assisted format usually forces test-takers to give the correct number of responses. However, in a paper-pencil format, this may not be obvious to test-takers. Thus, the following sentence should be included in the test instruction subsequent to the sentence “Please always select the best and the worst option for each situation” if a paper-pencil format instead of a computer assisted one is used:

“Please select exactly two response options for each situation. Mark (+) for the best solution and (−)

for the worst solution.”

When asking only for one (here: the best) option in a paper-pencil format, please insert the following sentence in the test instruction subsequent to the sentence “Please always select the best option for each situation”:

 “Please select exactly one response option for each situation.”


Below, 12 situations are described as they may occur in the occupational daily routine of teams or working groups. For each situation, four different behavioural options are presented.

Please pick the most [and least] suitable behaviour for each situation.

For some situations, it may be difficult for you to decide as certain details are not specified, you did not experience a similar situation before, or you consider some options very similar. However, please choose the alternative[s] that you generally take for the best [and worst] solution.

Please always select the best [and the worst] option for each situation. [Please do not indicate the same answer as the best and worst solution.]

Please do not skip any situation.


Your team has a task that is fundamentally different to previous tasks and covers completely new aspects. In addition, it is very likely that aspects of the task will change in the medium term.

What should your team do [and not do] in such a situation?

a) Some members of the team do not assist with the task to stay flexible. ( [+] ) [( X )]
b) All aspects of the tasks are assigned to several competent members of the team. ( [+] ) [( − )]
c) The team asks a supervisor to assign task aspects. ( [+] ) [( − )]
d) Task aspects are assigned as needed in regular meetings. (  X  ) [( − )]


Table 1

Items of the English Version of the SJT-TW

Situation Items   Given Answers
1 You are temporarily subjected to personal stress that also affects your occupational activity. A briefly acquainted colleague asks you about the reason for your decline in performance and offers help with your task. What should you do [and not do] in such a situation?
a) You confirm an increase in personal stress and accept the help.
b) You do not mention your personal problems, but accept the help.
c) You explain your specific situation and ask for help with your task.
d) You thank your colleague for the feedback but politely decline the offer.
2 Your team has clearly allocated all areas of responsibility. However, you incidentally notice that some team members in another area are challenged by a task that you have experience in. What should you do [and not do] in such a situation?
a) You tell your colleagues about your experience and offer advisory support.
b) You mention your expertise and offer active support.
c) You ask whether your experience or advice is desired and if so, when.
d) Your respect your colleagues’ responsibilities and stay out of it.
3 Your team has made a lot of progress working on a complex task when some unforeseen developments occur. Therefore, your tediously achieved results are no longer completely up to date. What should your team do [and not do] in such a situation?
a) Slight shortcomings will be tolerated due to the advanced progress of the work.
b) The team asks the customer or superior for their assessment of the situation.
c) Team members immediately discuss possible consequences in a meeting.
d) Changes are retrospectively implemented through intensive additional work.
4 You notice a sudden but continuous decrease in performance in one of your team members, whom you have experienced as competent and reliable. Other sources report that this colleague currently has some personal problems. What should you do [and not do] in such a situation?
a) You and the other team members discuss how to support this colleague.
b) You respect your colleague’s privacy and do not get involved in private matters.
c) You help your colleague without asking questions.
d) You ask your colleague if they want to talk about their problems.
5 Some of your colleagues discuss various aspects of a team task during a meeting. Your area of responsibility is not the focus of their discussion, which is why you hold back and refrain from partaking in it. What should you do [and not do] in such a situation?
a) You mentally prepare the discussion points that you wanted to address.
b) You use the opportunity to broaden your knowledge about other parts of the task.
c) You carefully steer the conversation towards a more familiar topic that you can engage in.
d) You attentively look for information that could be important for your area of responsibility.
6 You are transferred to an already existing team. During a brief introduction, your contact person tells you that all team members have their own area of responsibility. Without providing any further details, your contact person instructs you on your own area of responsibility. What should you do [and not do] in such a situation?
a) You limit your questions to your area of responsibility, as nothing else should concern you.
b) You hold back your curiosity and carefully listen to your contact person’s explanations.
c) You decide to become familiar with the other areas of responsibility on your own after the conversation.
d) You ask for the basic workflows and interdependencies in the team.
7 You have to inform another team member about a complex issue from your area of responsibility. It is of utmost importance for your team’s success that the other person takes note of your concern and that no uncertainties are left. What should you do [and not do] in such a situation?
a) You prepare a timesaving summary that you personally deliver.
b) You send a detailed report and ask for an acknowledgement of receipt.
c) You arrange a personal meeting with the other team member.
d) You send a message and explicitly request the other team member to contact you if any uncertainties are left.
8 You incidentally notice that another team member struggles to finish their work on time. You have already completed your tasks. However, you want to double check your work and, if necessary, improve some details before the deadline. What should you do [and not do] in such a situation?
a) You ask the team member in a confidential conversation whether they need help.
b) You contain yourself because you do not want the team member to appear incompetent.
c) You carefully address your observation in the next team meeting.
d) You finish your own tasks first before offering your help.
9 Together with your team members, you are setting objectives for each member for an upcoming task. What should your team do [and not do] in such a situation?
a) The team sets objectives that are positive, clearly defined and easily verifiable.
b) The team sets objectives that are specific, challenging and agreed upon by the whole team.
c) The team sets objectives that are moderately difficult and comprehensible to the whole team.
d) The team sets objectives that are easily attainable, open and flexible concerning time management.
10 You are working on a task that is mainly in your area of responsibility. When you present your intended procedure during a meeting, some team members from other areas speak up and add suggestions for changes and adaptations. What should you do [and not do] in such a situation?
a) You take note of suggestions and discuss them with everyone involved.
b) You reflect on what changes might be sensible and ask for details.
c) You politely point out that you have a better overview of the task due to your expertise.
d) You try to include as many of the suggested changes as possible in your plan.
11 Due to external circumstances, your team was unable to finish an important task on time. Since every team member has given their very best, there is considerable disappointment. When a new task comes up, you notice that low morale and poor motivation are impairing the team. What should you do [and not do] in such a situation?
a) You remind your team of past successes to spark new motivation.
b) You address your concerns in front of the whole team and encourage a discussion.
c) You ask for a team meeting to put the failure behind you.
d) You give the other team members the time to regain their motivation.
12 Together with your team members, you are planning how to tackle an upcoming task. The team’s success in mastering this challenge depends on several factors, some of which are difficult to predict. What should your team do [and not do] in such a situation?
a) The team discusses all possible developments in advance and works out a strategy for each of them.
b) The planning proceeds in small steps in order to allow quick adaptations.
c) The team waits with further planning until all uncertainties are eliminated.
d) The team focuses especially on currently available facts for the planning.

Response specifications

The answers are given in a forced-choice format. The best and the worst solution has to be identified. The respondent’s task is to indicate how they (or the entire team) should behave.


For each scenario there is a predefined best and worst solution, which can be taken from the scoring key. If test-takers correctly choose the best solution, the response is coded as “1”. If test-takers correctly choose the worst solution, the response is also coded as “1”. If test-takers select the best solution as their worst solution or vice versa, the responses are scored as “-1”. All remaining responses are scored as “0”. Item scores may be obtained by summing the best and worst of each scenario. Thus, each scenario can have values from -2 to +2. To obtain a score for the total test, values across scenarios are added up to an unweighted sum score. The total test score may range from -24 to +24. Test scores may also be obtained separately for best and worst responses across scenarios. When analyzing best and worst responses separately (or only one of them to reduce participation time as we did in this pilot study), item scores for each scenario can range from -1 to +1 and test scores across scenarios can range from -12 to +12.

Adequate methods may be applied to deal with missing values (i.e., multiple imputation; full information maximum likelihood).

Application field

This test should be applied to assess knowledge about teamwork effectiveness in research settings (given the lack of validity evidence for the translated version of the test, we do not encourage its use beyond research settings). This test can be applied independently from actual teams or team tasks. For instance, the original development study (Gatzka & Volmer, 2017) validated this test with a student sample as well as a sample of employees. It may be particularly useful for teamwork research (see Gatzka & Volmer, 2017). The test may be applied in a computer-assisted or a paper-pencil self-administered questionnaire format. For this study, we chose a computer-assisted questionnaire format. On average, participants took 6.57 minutes (SD = 1.63) to complete the shortened test asking only for the best option. Hence, participation will take approximately 10 minutes if test-takers are asked to pick both the best and worst response options.


The reported test is a translation of the German Situational Judgment Test for Teamwork (SJT-TW; Gatzka & Volmer, 2017). SJTs are popular tools in personnel selection and are traditionally defined as low-fidelity simulations (Motowidlo et al., 1990). Most SJTs consist of written situation descriptions and several behavioural response options of which test-takers chose the most similar to how they should or would behave in the given situation (McDaniel & Nguyen, 2001). As such, they sample knowledge about effective behaviours in relevant situations for work-related criteria (Motowidlo et al., 1990; Weekley et al., 2015). Meta-analyses confirmed the predictive power of SJTs for job performance criteria (Christian et al., 2010; McDaniel et al., 2001, 2007).

Effective teamwork can be best described as a set of various behaviours rather than a single, narrow construct (Salas et al., 2005; Rousseau et al., 2006). Gatzka and Volmer (2017) integrated results from two reviews on teamwork to develop a working model and to identify core elements of team effectiveness (Salas et al., 2005; Rousseau et al., 2006). Furthermore, they considered two models that have already been implemented in test procedures (O’Neil, et al., 1997; Stevens & Campion, 1994) as well as further reviews on team processes and team efficacy (Kozlowski & Ilgen, 2006; Marks et al., 2001; Mathieu et al., 2008). The working model (Gatzka & Volmer, 2017) consisted of 30 behaviours particularly relevant for teamwork success which can be categorizes into seven dimensions: (1) evaluation of the operational framework, (2) planning and organisation, (3) cooperation, (4) communication, (5) monitoring and adaptation, (6) help behaviour and support and (7) motivation and cohesion.

Gatzka and Volmer (2017) identified SJTs as suitable tool for the assessment of teamwork effectiveness. These authors demonstrated that the SJT-TW correlated with measures of teamwork skills and even predicted supervisor-rated contextual and teamwork performance. Overall, the original version of the test was well in line with contemporary conceptualizations of teamwork effectiveness and thus a valuable tool for teamwork research and personnel selection (Gatzka & Volmer, 2017).

Beyond the intended use of the SJT-TW for teamwork research, the test may be useful for research on the underlying psychological processes of SJTs. Despite the well-established criterion-related validity of SJTs, it remains unclear why SJTs work as assessment methods (e.g., Freudenstein et al., 2020; Lievens & Motowidlo, 2016; McDaniel et al., 2016; Schäpers et al., 2019). For instance, the role of situations for test-takers’ responses to SJT items is subject to debate. Some argue in favour of processes that are similar to those underlying behaviour in real-life situations, while others advocate context-independent constructs (e.g., Freudenstein et al., 2020; Lievens & Motowidlo, 2016; Schäpers et al., 2019). However, the number of SJTs that are available to research is limited. Thus, an English translation of the SJT-TW would further enable research about underlying processes of SJTs.

Scale development

Item generation and translation

Gatzka and Volmer (2017) used the 30 behaviours from their working model on team effectiveness to develop a Situational Judgment Test. Their final test consists of 12 hypothetical situations or scenarios that reflect a problem concerning teamwork and four behavioural response options for each situation. Test-takers are asked to indicate the best and worst solution for each situation. The SJT showed substantial correlations with related constructs and job-related criteria.

To translate the SJT-TW to English, we utilized the TRAPD procedure (Harkness, 2003). TRAPD is an acronym for several steps needed to produce high-quality translations of questionnaires, namely translation, review, adjudication, pretesting and documentation. We created two independent translations of the SJT-TW. The overall aim was to retain as much original item content and structure as possible. Both translators were fluent in spoken and written English and had expertise in SJT research. However, both translators were neither native speakers nor professional translators. The first author reviewed both translations and merged them into a single version. Afterwards the translators revised this version with regard to word flow and completeness of the original item content. Two independent native speakers then additionally reviewed this revised test version. All changes were adopted accordingly. Next, a senior researcher with high expertise in psychological assessment and SJT research made final changes to the translation.

In this study, we pilot-tested the translated SJT with a small sample to gauge whether test-takers understood all items and to inspect preliminary response patterns. We instructed participants to pick the response options that best resembles what they should do in each of the 12 scenarios. To reduce the duration to participate, we did not instruct participants to pick the response option that resembles the worst solution. This is contrary to the original test format. We scored responses with “1” if they reflected the most effective response, with “-1” if they reflected the most ineffective response, and all remaining responses with “0”. Please note that interpretations of these results are only preliminary and should be made with caution due to the small sample size. Data was analysed with R (version 3.6.1; R Core Team, 2019) and the R package psych (version 1.8.12; Revelle, 2018).


Data for the English version of the SJT-TW was collected in 2019 from the following convenience sample from the United States: N = 20 native speakers (American English) from Amazon MTurk; sex: = 40% female; age: M [min; max] = 35.25 [25; 53], SD = 9.21. Most participants (75%) were gainfully employed during the time of data collection. Participants had either an undergraduate (45%) or graduate degree (20%) or received vocational training (5%). The remaining 30% of the sample graduated high school. Test-takers received $1 for participation. No a-priori power analysis was conducted, as this was a pilot study. No missing values occurred.

Item parameter

Table 2 presents item parameters for the 12 SJT items. Item distributions were somewhat similar to those of the German version (Gatzka & Volmer, 2017). The range of item total correlations was also comparable between the German and the English version of the SJT-TW, with a slightly higher mean of item-total correlations for the English version (rit = .22 vs. .17). The internal consistency of SJTs is typically low (Catano et al., 2012; Kasten & Freund, 2016). Thus, small item-total correlations were to be expected. However, item 11 showed a negative item-total correlation. This may be due to the small sample size of this pilot study. Nevertheless, if a negative item-total correlation persists in future applications, this item should be excluded from further analyses.

The reported item-total correlations presume a single factor structure of the SJTs. This is in line with recommendations by Gatzka and Volmer (2017). However, these authors also proposed a two-factor structure of the SJT-TW (Factor 1: Items 2, 3, 5, 6, 7, 9, 10, 12; Factor 2: Items 1, 4, 8, 11). Gatzka and Volmer (2017) argued that this factor structure can only be interpreted as preliminary evidence due to the small number of items and low internal consistencies of the two factors. They concluded that only a total test score should be calculated. An investigation of the factor structure of the translated SJT-TW was not sensible due to the small sample size of N = 20.


Table 2

Means, Standard Deviations, Skew, Kurtosis and Item-Total-Correlations of the Manifest Items

  M SD Skew Kurtosis rit
Item 1 0.70 0.47 -0.81 -1.41 0.43
Item 2 0.00 0.65 0.00 -0.74 0.39
Item 3 0.25 0.55 0.11 -0.60 0.04
Item 4 -0.20 0.41 -1.39 -0.07 0.13
Item 5 0.40 0.60 -0.34 -0.95 0.49
Item 6 0.40 0.50 0.38 -1.95 0.25
Item 7 0.25 0.91 -0.47 -1.68 -0.02
Item 8 0.20 0.70 -0.25 -1.06 0.38
Item 9 0.25 0.72 -0.36 -1.12 0.11
Item10 0.15 0.67 -0.15 -0.93 0.32
Item 11 0.25 0.44 1.07 -0.89 -0.22
Item 12 0.15 0.75 -0.22 -1.27 0.29

Note. Scale ranging from -1 to 1 as test-takers only were asked to pick the best response option, = 20.

Quality criteria


The English translation of the SJT-TW is a standardised psychological instrument like the German original SJT-TW. Each test-taker receives the same instruction, items and response options. The answers are evaluated by means of a solution key. Hence, objectivity of application and evaluation is assured. Due to the ambiguous factor structure of the SJT-TW, individual test scores should be interpreted with care. Rather than allocating psychological meaning to individual tests scores, sum scores of the SJT-TW should be interpreted as indicators that correlate with various constructs such as job performance and team skills. This is not unique to this specific test but rather representative for most SJTs (see McDaniel et al., 2016).


The reliability of the scale was determined by internal consistency estimator Cronbach’s alpha. Coefficient alpha of the 12 SJT items was α = .52. Although this internal consistency is insufficient, it is similar to the sample of employees in the original validation study (α =.44; Gatzka & Volmer, 2017). Moreover, the internal consistency of the SJT-TW is in line with meta-analyses on the internal consistency of SJTs (Catano et al., 2012; Kasten & Freund, 2016). These low values generally reflect the ambiguous factor structure of SJTs.


Based on results from this very small sample, we tentatively conclude that the overall scale worked very similar to the German version and did not cause any major inconsistencies. Still, a proper validation study is needed before using the English SJT-TW beyond research settings. We consider the current version as a research version, which should not be used in high stakes settings.

Descriptive statistics (scaling)

The test sum score had a mean of 2.80 (SD = 3.02) with a skewness of -1.04 and a kurtosis of 0.37. Thus, participants chose on average more correct than incorrect response options This result is in line with the German test version (Gatzka & Volmer, 2017).

Further quality criteria

The test processing takes about 10 minutes (or 6.57 minutes (SD = 1.63) to complete the shortened test asking only for the best option), which indicates that the test is a very economical instrument. Research also suggests that SJTs are less susceptible to faking behaviour, especially when compared to personality self-reports (Kasten et al., 2018).

Further literature

Gatzka, T., & Volmer, J. (2017). Situational Judgment Test für Teamarbeit (SJT-TA). In Zusammenstellung sozialwissenschaftlicher Items und Skalen.