Kevin Langdon, P.O. Box 795, Berkeley, CA 94701; (510) 524-0345; [old e-mail address]; April 12, 1997
To the TNS Psychometrics Committee:
New Chairman of This Committee
As you all know by now, Dr. Greg Grove has been elected Chairman of the TNS Psychometrics Committee. The vote was unanimous among those submitting votes, but once again I did not hear from Henry Milligan. Henry, please let us know whether you're still interested in serving on this Committee.
I was pleased to see Greg's first issue of the PsyCom Newsletter and the accompanying materials. It's important that we have a Chairman who can publish the Newsletter regularly as a conduit through which we can accomplish needed business.
Reevaluation of TNS Qualifying Scores
The thirteen tests now accepted for TNS admission are:
F. High-Range Standard Tests
| 19. Raven Advanced Progressive Matrices | 32 |
| 25. Terman Concept Mastery Test (CMT), Form T | 160 |
G. Mensa Tests
| 5. California Test of Mental Maturity (CTMM) | 150 |
| 7. Cattell Verbal | 173 |
H. College Admission Tests
| 1. Admission Test for Graduate Study in Business (ATGSB, GMAT) | 746 |
| 2. American College Testing Program (ACT) | 32 |
| 3. American Council on Education (ACE) | 142 |
| 9. Graduate Record Exam (GRE aptitude, Verbal + Quantitative) | 1500 |
| 11. Law School Admissions Test (LSAT) | 764 |
| 13. Miller Analogies Test (MAT) | 93 |
| 22. Scholastic Aptitude Test (SAT, prior to April 1995) | 1470 |
I. High-Range, Unsupervised and Untimed, ``Home-Brew'' Tests
| 10. Langdon Adult Intelligence Test (LAIT) | 150 |
| 12. Mega Test | 24 |
We now need to consider four further matters:
1. Examination of whether some of the tests we now accept are actually suitable for TNS admission. The prime candidates for reexamination are the CTMM and the Cattell Verbal (the ceiling of the latter is below three sigma for adults; also, there is a separate American version; we need to find out about the norming of this version). I believe that this category is the most critical. We should make decisions on these two tests immediately, as we advertise our qualifying scores in the Mensa Bulletin.
2. Examination of whether scores on the rescaled SAT (April 1995 and after) can be used for admission to the society, given the reduction in ceiling (I'm inclined to think that the SAT no longer discriminates at the three-sigma level). In any case, we can't accept rescaled scores without examining this matter. I'm not inclined to give this high priority, as I doubt we'll be able to use SAT scores after April 1995.
3. Reexamination of our cutoff scores on the tests we accept, escpecially on the Raven, the Terman Concept Mastery, and the less-frequently-used college admission tests. Medium priority, in my opinion.
4. Examination of several high-range tests which are not currently on our accepted list. We should take a look at Ron Hoeflin's Titan and Ultra tests and Polymath Systems' Four Sigma Qualifying Test, Polymath Intellectual Ability Scale, Langdon Short Form Intelligence Test, and Mobius Test. (All but the Mobius are no longer scored, but several hundred people have made 99.9th percentile scores on them; the PIAS has been taken by over 1400 people.) Later, we will also need to examine Ron Hoelfin's Power Test, the Eight Item Test, by Alan Aax, Paul Cooijmans' Test for Genius, and Polymath Systems' forthcoming Stratospheric Test of Attention in Reasoning. Michael Madow will participate as a member of this Committee in my place in making and voting on motions regarding my tests.
The most urgent among these is the Titan Test. Norming samples to date for the other tests which have been released are still too small for a proper norming, with the execption of the PIAS and the LSFIT. (A sample of at least 100 is considered a minimum for a proper statistical treatment; 200 would be better.) Only 50 people have taken The Mobius Test to date; the Ultra Test, the EIT, and the Test for Genius also have tiny samples so far. Examining the Titan Test is a high priority. The norming of the Titan is very closely related to that of the Mega Test, so we should take another look at that at the same time. I have completed a norming of the Mega based on a data set Ron sent me in 1984. My norms are fairly close to Ron's over most of the test range. Comparative data indicates that the Titan may be a little harder than the the Mega.
Opinions of readers of the PsyCom Newsletter on relative priorities, on any of the above questions, and on other matters which we should be looking at are invited.
Comments on Greg Grove's Brochure Draft
This is a very good approach. There are a few details that could use a little adjustment.
In the second line of the first paragraph following ``The Triple Nine Society,'' insert ``the'' before ``adventure'' and remove the word ``a''.
In the third paragraph, eliminate the comma at the end of line 1 and change line 2 to read ``(including unemployment and retirement) and varied educational backgrounds,''.
At the end of the fourth paragraph, change ``the'' to ``an''.
In the heading ``High Range Standarized Tests,'' the word should be simply ``Standard''. All the categories of tests we accept are standardized (normed) or we wouldn't have them on our list.
In the heading ``College Admissions Tests,'' change the word ``Admissions'' to ``Admission''.
The phrase ``prior to 1995'' (under the SAT) should read ``prior to April 1995''.
In the last paragraph before Jacquelinne's name and address, the first sentence should be replaced with the following text: ``Yearly dues are $20. Nonmembers may subscribe to the Society's journal, Vidya, for $20/year.''
In the same paragraph, line 2, there should be a comma following ``journal''.
In the second footnote (``**''), Dr. Hoeflin's address needs to be updated. His current address is P.O. Box 539, New York, NY 10101.
All paragraphs should be justified and the qualifying scores in the righthand column should have their units positions aligned.
There could be additional changes as a result of pending actions. This Committee needs to examine the Titan Test, the PIAS and LSFIT, and the Mensa tests and update our list of qualifying scores if necessary. Also, one of my motions contained in the March 15 ExCom Memo would make ISPE members eligible for TNS membership without further testing only if they're over 18.
Finally, the TNS logo should appear at the top of the first page. With the logo, the brochure will fit nicely on two sides of an 8-1/2 x 11 sheet, which will make it easy to handle.
Comments on Paul Cooijmans' Letter to the Committee
Paul is correct that I made the motion that resulted in the appointment of the present Psychometrics Committee, after examining credentials submitted by volunteers. I chose the members of the Committee based on their depth of background in psychometrics. Subsequently, I received an expresion of interest in the Committee from Paul Cooijmans, who might also be a good candidate if he learns some basic psychometric statistics.
A summary of the credentials of all six volunteers who submitted their qualifications appeared in the June 25 ExCom Memo and is reprinted below:
Julia Cybele Cachia: Psychometric studies for Mensa and ISPE, knowledgeable about the literature on many standard tests, creator of a table showing relationships among scores on various I.Q. and aptitude tests.
Greg Grove: Ph.D. in education, course work in parametric and nonparametric statistics and individual assessment instruments, familiarity with many standard tests, developer of I.Q./aptitude tests, has operated an educational assessment service.
Robert Kopp: Degree in psychology, advanced course work in statistics, familiar with a number of statistical packages, experience with ANOVA, multiple regression, canonical correlation, and factor analysis (article on this subject published in ``Teaching of Psychology'').
Kevin Langdon: Developer of a number of high-range intelligence tests, member of psychometrics committees of the Triple Nine and Prometheus Societies, many published articles on psychometrics in the journals of the higher-I.Q. societies.
Michael Madow: Harvard-trained psychiatrist, psychoanalyst, psychopharmacologist, interested in intelligence and intellectual creativity.
Henry Milligan: Course work in multi-variable calculus, statistical methods, deterministic and stochastic systems analysis, and statistical modeling, at Princeton University.
Paul mentioned a ``paradox'' in his letter: While we accept members of ISPE into TNS without further testing, the ISPE accepts some tests that we don't. There are two somewhat incompatible considerations involved. We wish to maintain our psychometric standards but we are also locked in a competition, of a sort, with the ISPE for the hearts and minds of members of the higher-I.Q.-societies community. The ISPE is an authoritarian organization; TNS was founded on the principles of democracy and member rights. Many people among the TNS leadership feel that it is very important to offer ISPE members a democratic alternative.
I appreciate Paul's bringing to our attention the age-correction factor, the lack of a consistent score-reporting system, or the low ceiling of the Cattell Culture Fair, something I had not been aware of before. While we do not currently accept this test, it might be considered in the future. Based on this information, I recommend against accepting the Cattell Culture Fair for admission to TNS.
The trouble with adding verbal tests in other languages to uur list of qualifying scores is that we are not aware of well-constructed and well-normed high-ceiling tests with high g-loadings in languages other than Engligh. Perhaps Paul would be interested in researching this for us.
I agree with Paul that we should pay close attention to the methods used in norming the tests we're evaluating, including the question of item weighting. In general, psychometricians have concluded that weighting test items does not produce any better accuracy, but the authors of the high-range tests of interest to us face a different problem: how to reach ceilings high enough to allow these tests to be used as admission instruments for higher-level groups such as the Prometheus and Mega societies--and item weighting is useful for this purpose.
It is certainly appropriate for the Committee to examine Paul's tests along with various other instruments, as possible admission tests. I will have more to say about Paul's Test for Genius below.
Paul wrote:
One IQ point is really nothing at all. It would probably be more in accordance with the accuracy of our tests to divide the standard deviation into 3, rather than into 15, 16 or--heaven forbid!--24 . . .
One point should be nothing at all. If the resolution of the point scale is not greater than the resolution of the test, this will give rise to avoidable rounding errors.
Although I don't agree with everything that Paul has written, he has some good ideas and a lot of energy. I appreciate his contribution to the exchange here in the PsyCom Newsletter.
Comments on Paul Cooijmans' Test for Genius
I question the face validity of this test. Neither anologies nor number series provide good measures of g, as both are contaminated, to a significant degree, with other factors.
Verbal analogies are notoriously idiosyncratic; this item type comes closest to justifying the view that I.Q. tests only test whether one thins like the test-maker (or at least whether one shares a conceptual background with him). Perhaps not coincidentally, this is also the most common item type on poor-quality tests. Paul's anal-ogies clearly depend heavily on acquired knowledge; some of them make use of obscure vocabulary. Items 12, 13, and 19 have been permanently compromised by the presence of answers instead of question marks; this means that the 45-item ``long form'' of the test can never be normed. The analogy in #19 is incorrect. Many items are simply weird.
Number series are much more easily solved if one has a background in mathe-matics, particularly in number theory. They're easy to devise but boring to solve, which creates a loading on motivation and persistence (a problem for all these new, high-range tests, particularly for the Hoeflin tests).
The sample size of 38 used in what Paul calls a ``norming'' is insufficient for anything more than a preliminary norming. A sample of at least 100, and preferably 200 is considered a minimum for an adequate statistical treatment, though my data seems to stabilize reasonably when I've got about
It the items were weighting as Paul suggested, it would be important not to use an item which has not been solved by enough people that the frequency of solving can reasonably be said to have been estimated, at least five and preferably ten. Only 21 items qualify by this criterion, but this is probably adequate to do decent statistics on a non-multiple-choice test and as Paul accumulates a larger sample of testees, more items will be solved by adequate numbers of testees.
Paul's selection of previous scores used, as he described it seems arbitrary and statistically questionable; this renders any norming based on them highly suspect.
Throwing out high scores because they don't seem to correspond to scores of the same testees on the object test will improve the overall previous score/object-test score correlation, but it's not correct statistical procedure. And Paul should certainly know better than to hitch his norming his other tests without exhibiting either the tests or norms for them.
Paul wrote, ``The letter A tests only cover the IQ range below 3 sigma, the letter B tests cover the range from 3 sigma up, and the letter C tests cover the entire range.'' What does this mean? Were there just no scores about 3 sigma in categary A or were such scores excluded? Were there no low scores on category B tests?
What makes this really hard to follow is that Paul has not bothered to tell us how many scores were used on each test, let alone provide separate correlation figures between the Test for Genius and these other tests.
A better estimator of the quality of an item than the mean score of those answering it correctly is the point biserial correlation between the scores on the item (0 or 1) and overall scores (or its square root). Dividing by the number of testees answering the item correctly is reasonable if there is an adequate sample of testees answering the item correctly.
No item analysis is presented. There is no estimate of reliability, nor of correlation with other tests, individually or in the aggregate, or standard error. Paul's account of his statistical procedure leaves many gaps.
My conclusions regarding this test are:
1. The quality of the items is uneven; what appear to be bad items are included in the norming.
2. The sample is inadequate for a proper norming.
3. The statistical treatment is flawed and incomplete.
4. We should not accept this test for admission to TNS at this time.
[Note: Paul Cooijmans has rectified some of the problems mentioned here in his more recent norming studies.]
The LSFIT
Greg Grove asked me to submit one of my tests for evaluation. Along with this letter, I have sent Greg copies of the Langdon Short Form Intelligence Test and the LSFIT statistical report of January 23, 1996.
While the LSFIT is no longer scored, 175 people took it before January 1, 1994; of these 55 earned scores within the 99.9th percentile. A small percentage of these people are already members of TNS, but most of this group will be candidates for membership if this Committee accepts the LSFIT for admission to TNS.
I'll be happy to answer any question anyone may have about this norming.