Test Methodology

≈ 9 min read Updated May 13, 2026

On this page

9 sections

01
What This Test Is
A Big Five questionnaire under MBTI labels, not the official MBTI
02
Why We Publish All of This
The credibility moat — competitors don't publish their item banks
03
The 50-Item Bank
Every item, verbatim, with its source and reverse-keying flag
04
How Scoring Works
Signed-sum across 10 items per dichotomy, with worked example
05
Reliability and Validity
The honest empirical numbers — and what we cannot claim
06
The Forer Effect
Why MBTI-style tests feel uncannily accurate, named explicitly
07
Item Provenance and License
The 44 IPIP items + 6 original sharpeners, public domain
08
Disclaimer and Trademarks
Not the official MBTI; not affiliated with The Myers & Briggs Foundation
09
Sources
The peer-reviewed work behind every claim on this page

What This Test Is

This is an MBTI-style personality assessment that uses public-domain Big Five items mapped to four-letter MBTI labels. It is not the official Myers-Briggs Type Indicator® (MBTI®) instrument, and we are not affiliated with The Myers-Briggs Company, the MBTI Foundation, or 16Personalities.

The pattern we follow is the same one 16Personalities popularised: administer a Big Five questionnaire under the hood, then translate the underlying scores into MBTI letters. The mapping is well-replicated in the personality literature — Extraversion lines up with MBTI’s E/I, Openness with N/S, Agreeableness with F/T, and Conscientiousness with J/P (McCrae & Costa, 1989). A fifth Big Five trait, Neuroticism, sits alongside the four MBTI letters as an optional “A/T” identity dimension — the same five-letter scheme 16Personalities uses (e.g., INTJ-A vs INTJ-T).

The questionnaire is 50 items long, scored on a 7-point Likert scale, and takes most people about five minutes. We use that bank because the underlying items — 44 from the public-domain International Personality Item Pool — have been studied for over two decades and are free for commercial use. The other six items are our own, written to sharpen three dichotomies where Big Five items underperform; see the item provenance section.

Why We Publish All of This

Most online personality tests treat their item bank as proprietary and their statistics as marketing copy. We do the opposite. Every question on the test is listed below in plain text. The scoring algorithm is documented and the source code is open. The reliability and validity numbers are reported in the form psychologists would expect — including the unflattering ones.

We publish this for three reasons. First, transparency is a credibility moat that closed competitors cannot easily match. Second, naming what a personality test can and cannot do protects readers from over-interpreting a result. Third, openly attributing the item bank to IPIP is the right way to use a public-domain resource — the people who built that resource deserve the citation, and the science is better when results are reproducible.

A reader who reaches this page is almost certainly the skeptical kind. They have either taken the test, glanced at the result, and want to know how seriously to take it; or they have not taken it yet and want to know whether it is worth their time. Either way, the rest of this page treats them as such — straight talk, no marketing.

The 50-Item Bank

All 50 items appear below in the order they are presented during the test. The first 40 contribute to the four MBTI letters; the final 10 measure Neuroticism and feed the optional A/T identity letter. “Reverse-keyed” means agreement with the item counts against the named pole rather than for it — a standard practice that helps detect inattentive responding.

#	Item	Dichotomy	Reverse?	Source
1	I am the life of the party.	E/I	—	IPIP-50 #1 (E+)
2	I talk to a lot of different people at parties.	E/I	—	IPIP-50 #31 (E+)
3	I start conversations.	E/I	—	IPIP-50 #21 (E+)
4	I don't mind being the center of attention.	E/I	—	IPIP-50 #41 (E+)
5	I feel comfortable around people.	E/I	—	IPIP-50 #11 (E+)
6	I don't talk a lot.	E/I	Yes	IPIP-50 #6 (E-)
7	I keep in the background.	E/I	Yes	IPIP-50 #16 (E-)
8	I have little to say.	E/I	Yes	IPIP-50 #26 (E-)
9	I don't like to draw attention to myself.	E/I	Yes	IPIP-50 #36 (E-)
10	I am quiet around strangers.	E/I	Yes	IPIP-50 #46 (E-)
11	I have a vivid imagination.	S/N	—	IPIP-50 #15 (O+)
12	I have excellent ideas.	S/N	—	IPIP-50 #25 (O+)
13	I am quick to understand things.	S/N	—	IPIP-50 #35 (O+)
14	I spend time reflecting on things.	S/N	—	IPIP-50 #45 (O+)
15	I am full of ideas.	S/N	—	IPIP-50 #50 (O+)
16	I have difficulty understanding abstract ideas.	S/N	Yes	IPIP-50 #10 (O-)
17	I am not interested in abstract ideas.	S/N	Yes	IPIP-50 #20 (O-)
18	I do not have a good imagination.	S/N	Yes	IPIP-50 #30 (O-)
19	I prefer concrete facts to abstract patterns.	S/N	Yes	Original
20	I often notice connections between unrelated ideas.	S/N	—	Original
21	I sympathize with others' feelings.	T/F	—	IPIP-50 #17 (A+)
22	I feel others' emotions.	T/F	—	IPIP-50 #42 (A+)
23	I have a soft heart.	T/F	—	IPIP-50 #27 (A+)
24	I take time out for others.	T/F	—	IPIP-50 #37 (A+)
25	I make people feel at ease.	T/F	—	IPIP-50 #47 (A+)
26	I feel little concern for others.	T/F	Yes	IPIP-50 #2 (A-)
27	I insult people.	T/F	Yes	IPIP-50 #12 (A-)
28	I am not really interested in others.	T/F	Yes	IPIP-50 #32 (A-)
29	When making a tough call, I weigh logic over feelings.	T/F	Yes	Original
30	I would rather be fair than kind when the two clash.	T/F	Yes	Original
31	I am always prepared.	J/P	—	IPIP-50 #3 (C+)
32	I pay attention to details.	J/P	—	IPIP-50 #13 (C+)
33	I get chores done right away.	J/P	—	IPIP-50 #23 (C+)
34	I like order.	J/P	—	IPIP-50 #33 (C+)
35	I follow a schedule.	J/P	—	IPIP-50 #43 (C+)
36	I am exacting in my work.	J/P	—	IPIP-50 #48 (C+)
37	I leave my belongings around.	J/P	Yes	IPIP-50 #8 (C-)
38	I make a mess of things.	J/P	Yes	IPIP-50 #18 (C-)
39	I prefer to keep my options open rather than commit early.	J/P	Yes	Original
40	I feel calmer once a decision is made and locked in.	J/P	—	Original
41	I get stressed out easily.	N (telemetry)	—	IPIP-50 #4 (N+)
42	I am relaxed most of the time.	N (telemetry)	Yes	IPIP-50 #9 (N-)
43	I worry about things.	N (telemetry)	—	IPIP-50 #14 (N+)
44	I seldom feel blue.	N (telemetry)	Yes	IPIP-50 #19 (N-)
45	I am easily disturbed.	N (telemetry)	—	IPIP-50 #24 (N+)
46	I get upset easily.	N (telemetry)	—	IPIP-50 #29 (N+)
47	I change my mood a lot.	N (telemetry)	—	IPIP-50 #34 (N+)
48	I have frequent mood swings.	N (telemetry)	—	IPIP-50 #39 (N+)
49	I get irritated easily.	N (telemetry)	—	IPIP-50 #44 (N+)
50	I often feel blue.	N (telemetry)	—	IPIP-50 #49 (N+)

44 of these items are reproduced verbatim from the IPIP-50 (Goldberg, 1992; ipip.ori.org). The six items marked “Original” were authored for this test and are released into the public domain (CC0) so any other tool can use them too. See item provenance and license below.

How Scoring Works

Each response is a number from 1 to 7. We subtract 4 to centre the scale at zero, giving a signed value from −3 to +3. For reverse-keyed items, we flip the sign. Then we sum the signed values across the ten items that share a dichotomy.

The sign of the resulting sum picks the letter. If the E/I sum is positive, the letter is E; if negative, the letter is I. The other three dichotomies work the same way, with the convention that high Openness maps to N (so the “positive pole” for the S/N items is N) and high Agreeableness maps to F (so the “positive pole” for T/F is F). The magnitude of the sum becomes the percentage bar on the result page, scaled so 50% means “tied” and 100% means “every item agreed unanimously.”

Worked example. Suppose you give a 6 (“Agree”) to I am the life of the party and a 2 (“Disagree”) to I keep in the background. Both responses lean extraverted. The first item contributes 6 − 4 = +2 to the E/I sum directly. The second item is reverse-keyed, so 2 − 4 = −2 becomes +2 after the flip. Net effect: +4 toward E across just two items.

If a dichotomy comes out exactly zero, we still need to pick a letter. The tiebreak is deterministic: we take a SHA-256 hash of the answer array and use one byte of it as a coin flip. The same answer set always produces the same letter on every device, so re-running the test does not change the outcome, but it does mean a tied result is, in a literal sense, a coin flip — and we flag that on the result page as “borderline” rather than hiding it.

A dichotomy is flagged as borderline whenever its score sits in the bottom 20% of its possible range (e.g., a T/F sum between −5 and +5 out of a possible ±30). When that happens, the result page also offers the alternate type as a link so readers can compare both possibilities side by side. Three small caveats live alongside the score: we never run an answer through a hidden “personality calculator”; we do not adjust for age, gender, or country; and we keep a 6-point confidence band around the displayed percentage so the bar is not falsely precise.

Reliability and Validity

Reliability and validity are two different questions. Reliability asks whether the test gives consistent answers; validity asks whether those answers mean what we say they mean. The honest picture for MBTI-style four-letter typing — including this one — is that internal-consistency reliability is good, test-retest reliability of the four letters is poor, and predictive validity for job performance is essentially zero. Each of those claims has a peer-reviewed citation, and each matters for how you should read your result.

Internal Consistency: α ≈ .80

Internal consistency measures whether the items inside one dichotomy hang together statistically. Capraro & Capraro (2002), a meta-analysis of MBTI reliability across many studies, put Cronbach’s alpha at about .80 to .87 for the four dichotomies, with T/F being the most variable across studies (.64–.87). Alpha above .70 is considered adequate for research instruments and above .80 for applied use. Our 50-item bank is designed to land in this range, and we will publish the empirical alpha from our own user data once we have a stable pilot sample.

Test-Retest Reliability of the Four-Letter Type: 39–76%

Test-retest is the question most readers actually care about: if you take the test again next month, will you get the same answer? For four-letter MBTI typing, the answer is that you might not. McCarley & Carskadon (1983) found that only 47% of subjects retained all four letters after a five-week interval — i.e., more than half of respondents flipped on at least one dichotomy. Pittenger (2005) reviewed the broader literature and put the four-letter retest range at 39% to 76% across studies. The reason is structural: when one of your scores sits near the midpoint, a small shift in how you happened to answer one or two questions on a given day can flip the letter, even though your underlying preference has not changed.

This is why we surface the percentage bars and the borderline flag rather than just showing the letters. The continuous score behind each letter is far more stable — per-dichotomy retest correlations land in the .74 to .83 range across competitor instruments — and reporting it gives you the signal that the four-letter shorthand throws away.

Predictive Validity for Job Performance: r ≈ .02

Furnham (2018), reviewing decades of literature in the Encyclopedia of Personality and Individual Differences, put the average correlation between MBTI type and job performance at approximately r = .02. That is statistical noise. The U.S. National Academy of Sciences review of the MBTI (Druckman & Bjork, 1991, In the Mind’s Eye, Chapter 5) concluded that “the popularity of this instrument in the absence of proven scientific worth is troublesome.”

For comparison, Big Five Conscientiousness — measured directly, not under an MBTI label — predicts job performance at about r = .22 corrected (Barrick & Mount, 1991). That is meaningfully better but still far from a guarantee. The takeaway is that no personality measure is a good single predictor of how someone will perform in a specific role.

What We Can and Cannot Claim

With the numbers above on the table, here is what we are willing to put our name to and what we are not:

This test measures preferences with internal consistency that meets standard psychometric thresholds.
It helps you find jobs that people like you tend to enjoy and stay in. That is a different and weaker claim than predicting job performance.
The mapping from Big Five items to MBTI labels is well-replicated and academically defensible.
We cannot claim this test predicts job performance. The evidence does not support that, and we will not pretend otherwise.
We cannot claim the four-letter type is your “true personality.” A four-letter shorthand drops too much information to be that.
We do not claim the test is clinically validated. It is not. For clinical assessment, please see a licensed practitioner.

The Forer Effect (Also Called the Barnum Effect)

There is a reason MBTI-style results feel uncannily accurate even when the statistics are weaker than the marketing suggests. The reason has a name: the Forer effect, sometimes called the Barnum effect. We name it here because skipping past it would be dishonest.

In a 1949 study, psychologist Bertram Forer gave 39 university students a personality description he claimed had been “uniquely” generated from their individual responses. Students rated the description at 4.30 out of 5.0 for accuracy. They had all received the identical generic vignette — sentences like “You have a great deal of unused capacity which you have not turned to your advantage,” “You tend to be critical of yourself,” and “Some of your aspirations tend to be pretty unrealistic.” The descriptions worked because they were broad, positive, self-affirming, and just specific enough to feel personal.

Every successful MBTI-style test exploits this effect. 16Personalities’ per-type copy is heavily Forer-effect optimised. Truity’s is similar. We are not pretending we have escaped it. The result page you see after taking this test will feel accurate partly because the test measured something real about your preferences, and partly because the description is written to be agreeable.

What we try to do differently is fight the effect in two specific places. First, the percentage bars on the results page are real and surfaced honestly — including the borderline annotation when a dichotomy is too close to call. A reader who would otherwise over-identify with their type can see how thin the margin actually was. Second, this methodology page exists at all. Naming the Forer effect, citing the criticism, and pointing to alternate interpretations is the slow way to build credibility, but it is the only way that survives a skeptical second read.

Both things can be true at once: the result page is engaging partly because of the Forer effect, and the underlying preferences are measured with real psychometric tools. The first does not invalidate the second. It just means a single test result should not carry more weight than it can.

Item Provenance and License

44 of the 50 items come from the International Personality Item Pool (IPIP). Specifically, they are reproduced verbatim from the IPIP-50, a 50-item public-domain Big Five scale developed by Lewis R. Goldberg and colleagues. Goldberg published the IPIP as a deliberate alternative to commercial personality instruments, and the items are free to use, including commercially. The canonical citation is:

Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4(1), 26–42.

Source: https://ipip.ori.org/New_IPIP-50-item-scale.htm. Public-domain status: https://ipip.ori.org/newPermission.htm.

The remaining 6 items were authored by PersonalityCareers. They sharpen three dichotomies where Big Five items underperform their MBTI counterparts: the S/N items in IPIP focus on imagination and intellect rather than the “facts vs patterns” framing native to MBTI; the T/F items in IPIP measure Agreeableness in a way that captures “kind vs unkind” more than “logic vs harmony”; and the J/P items in IPIP measure organisation and orderliness rather than the closure-preference flavour that distinguishes Judging from Perceiving. The six originals are the items listed as Original in the bank table above (questions 19, 20, 29, 30, 39, and 40).

All six items were written with the same constraints as the IPIP items: first-person, single-idea, at or below an eighth-grade reading level, no culture-bound idioms. Each was verified at the passage level for Flesch-Kincaid readability (grade 5.3 averaged across the six). We will revise any of them if pilot data shows interpretation drift, and we will note the revision here when that happens.

License. The six original items are released under a CC0 / public-domain dedication so any other test, researcher, or tool can use them. The IPIP items retain their original public-domain status. The combined 50-item bank, the scoring algorithm, and this methodology document are all reproducible without permission.

Disclaimer and Trademarks

About this test. This is a personality assessment for self-reflection and entertainment. It is not a clinical diagnosis, medical advice, or career counselling, and the results should not be used as the basis for major life decisions without consulting a qualified professional. The test, results, and commentary are provided “as is” with no warranty.

Trademark notice. Myers-Briggs Type Indicator®, Myers-Briggs®, and MBTI® are registered trademarks of The Myers & Briggs Foundation in the United States and other countries. PersonalityCareers is not affiliated with, endorsed by, or licensed by The Myers-Briggs Company or The Myers & Briggs Foundation. This test is an original assessment using public-domain items from the International Personality Item Pool (Goldberg, L. R., 1999, ipip.ori.org). It is not the official MBTI® instrument.

Privacy. Your answers are stored in your browser’s localStorage, not on our servers. We send anonymous events to Google Analytics — including your response per question and total time — so we can monitor test quality and detect issues like straight-line answering. These events carry no name, email, or other identifier that could link a result back to you. See the Privacy Policy for the full data-handling notice.

Sources

Every empirical claim on this page can be traced to one of the references below. Where a single study is the authority for a number, that study is cited inline above; where the claim is broad, multiple references support it.

Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.
Capraro, R. M., & Capraro, M. M. (2002). Myers-Briggs Type Indicator score reliability across studies: A meta-analytic reliability generalization study. Educational and Psychological Measurement, 62(4), 590–602.
Druckman, D., & Bjork, R. A. (Eds.). (1991). In the Mind’s Eye: Enhancing Human Performance (Ch. 5). National Academies Press.
Forer, B. R. (1949). The fallacy of personal validation: A classroom demonstration of gullibility. Journal of Abnormal and Social Psychology, 44(1), 118–123.
Furnham, A. (2018). Myers-Briggs Type Indicator (MBTI). In V. Zeigler-Hill & T. K. Shackelford (Eds.), Encyclopedia of Personality and Individual Differences. Springer.
Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4(1), 26–42.
Goldberg, L. R. (1999+). International Personality Item Pool (IPIP). https://ipip.ori.org/.
McCarley, N. G., & Carskadon, T. G. (1983). Test–retest reliabilities of scales and subscales of the Myers-Briggs Type Indicator and of criteria for clinical interpretive hypotheses involving them. Research in Psychological Type, 6, 24–36.
McCrae, R. R., & Costa, P. T. (1989). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1), 17–40.
Pittenger, D. J. (2005). Cautionary comments regarding the Myers-Briggs Type Indicator. Consulting Psychology Journal: Practice and Research, 57(3), 210–221.

Browse by Type

INTJ INTP ENTJ ENTP INFJ INFP ENFJ ENFP ISTJ ISFJ ESTJ ESFJ ISTP ISFP ESTP ESFP