Psychological testing is the branch of psychology in which we use standardized tests, construct them in order to understand individual differences.
Psychological testing is a term that refers to the use of psychological tests. It refers to all the possible uses, applications, and underlying concepts of psychological tests.
Test is defined as a series of questions on the basis of which information is sought.
A psychological test is an objective and standardized measure of sample of behavior.
A psychological test is a standardized measure quantitatively or qualitatively one or more than one aspect of a trait by means of a sample of verbal or non-verbal behaviors.
PURPOSE OF A PSYCHOLOGICAL TEST:
In psychology and education, the purpose of a test is two-fold.
- First, it attempts to compare the same individual on two or more than two aspects of traits.
- Second two or more than two persons may be compared on the same trait. Such a measurement may be either quantitative or qualitative.
CHARACTERISTICS OF A GOOD TEST:
For a test to be scientifically sound, it must possess the following characteristics;
A test must have the trait of objectivity, that is it must be free from the subjective element so that there is complete interpersonal agreement among experts regarding the meaning of items and scoring of the test. Objectivity here refers of two aspects of the test i-e
- OBJECTIVITY OF ITEMS:
By objectivity of items is meant that the items should be phrased in such a manner that they are interpreted in exactly the same way by all those who are taking the test. For ensuring the objectivity of items, items must have uniformity of order of presentation (either ascending or descending).
- OBJECTIVITY OF SCORING:
By objectivity of scoring is meant that the scoring method of the test should be a standard one so that complete uniformity can be maintained when the test is scored by different experts at different times.
A test must also be reliable. Reliability is “Self-correlation of the test.” It shows the extent to which the results obtained are consistent when the test is administered. Once or more than once on the same sample with a reasonable gap. Consistency in results obtained in a single administration is the index of internal consistency of the test and consistency in results obtained upon testing and retesting is the index of temporal consistency.
Reliability thus includes both internal consistencies as well as temporal consistency. A test to be called sound must be reliable because reliability indicates the extent to which the scores obtained in the test are free from such internal defects of standardization, which are likely to produce errors of measurement.
Validity is another prerequisite for a test to be sound. Validity indicates the extent to which the test measures what it intends to measure when compared with some outside independent criteria. In other words, it is the correlation of the test with some outside criteria.
The criteria should be independent and should be regarded as the best index of trait or ability being measured by the test. Generally, the validity of a test is dependent upon reliability because a test that yields inconsistent results( poor reliability) is ordinarily not expected to correlate with some outside independent criteria.
A test must also be guided by certain norms. Norms refer to the “average performance of the representative sample on a given test.” There are four common types of norms;
- Age norm
- Grade norm
- Percentile norms
- Standard score norms.
Depending upon the purpose and use, a test constructor prepares any of these above norms of his test. Norms help in the interpretation of the scores. In the absence of norms, no meaning can be added to the score obtained on the test.
A test must also be practicable from the point of view of the time taken in its completion, length, scoring etc. in other words, the test should not be lengthy and the scoring method must not be difficult nor one which can only be done by a highly specialized person.
“GENERAL STEPS OF TEST CONSTRUCTION”
The development of a good psychological test requires thoughtful and sound application of established principles of test construction. Before the real work of test construction, the test constructor takes some broad decisions about the major objectives of the test in general terms and population for whom the test is intended and also indicates the possible conditions under which the test can be used and its important uses.
These preliminary decisions have far-reaching consequences. For example, a test constructor may decide to construct an intelligence test meant for students of tenth grade broadly aiming at diagnosing the manipulative and organizational ability of the pupils. Having decided the above preliminary things, the test constructor goes ahead with the following steps:
- Writing items for the test.
- Preliminary administration of the test.
- Reliability of the final test.
- The validity of the final test.
- Preparation of norms for the final test.
- Preparation of manual and reproduction of the test.
The first step in the test construction is the careful planning. At this stage, the test constructor address the following issues;
- DEFINITION OF THE CONSTRUCT:
Definition of the construct to be measured by the proposed test.
- OBJECTIVE OF THE TEST:
The author has to spell out the broad and specific objectives of the test in clear terms. That is the prospective users (For example Vocational counselors, Clinical psychologists, Educationalists) and the purpose or purposes for which they will use the test.
What will be the appropriate age range, educational level and cultural background of the examinees, who would find it desirable to take the test?
- CONTENT OF THE TEST:
What will be the content of the test? Is this content coverage different from that of the existing tests developed for the same or similar purposes? Is this cultural-specific?
- TEST FORMAT:
The author has to decide what would be the nature of items, that is to decide if the test will be a multiple-choice, true-false, inventive response, or n some other form.
- TYPE OF INSTRUCTIONS:
What would be the type of instructions i-e written or to be delivered orally?
- TEST ADMINISTRATION:
Whether the test would be administered individually or in groups? Will the test be designed or modified for computer administration. A detailed agreement for preliminary and final administration should be considered.
- USER QUALIFICATION AND PROFESSIONAL COMPETENCE:
What special training or qualifications will be necessary for administering or interpreting the test?
- PROBABLE LENGTH, TIME STATISTICAL METHODS:
The test constructor must have to decide about the probable length and time for completion of test.
- METHOD OF SAMPLING:
What would be the method of sampling i-e random or selective.
- ETHICAL AND SOCIAL CONSIDERATION:
Is there any potential harm for the examinees resulting from the administration of this test? Are there any safeguards built into the recommended testing procedure to prevent any sort of harm to anyone involved in the use of this test.
- INTERPRETATION OF SCORES:
How will the scores be interpreted? Will the scores of an examinee be compared to others in the criteria group or will they be used to assess mastery of a specific content area? To answer this question, the author has to decide whether the proposed test will be criterion-referenced or norm-referenced.
- MANUAL AND REPRODUCTION OF TEST:
Planning also include the total number of reproductions and a preparation of manual.
- WRITING DOWN ITEMS:
A single question or task that is not often broken down into any smaller units. (Bean, 1953:15)
EXAMPLE: An arithmetical mean may be an item, a manipulative task may be an item, a mechanical puzzle may be an item and likewise sleeplessness may also be an item of a test.
Items in a test are just like atoms in a matter that is they are indivisible.
The second step in item writing is the preparation of the items of the test. Item writing starts with the planning done earlier. If the test constructor decides to prepare an essay test, then the essay items are written down.
However, if he decides to construct an objective test, he writes down the objective items such as the alternative response item, matching item, multiple-choice item, completion item, short answer item, a pictorial form of item, etc. Depending upon the purpose, he decides to write any of these objective types of items.
PREREQUISITES FOR ITEM WRITING:
Item writing is essentially a creative art. There are no set rules to guide and guarantee the writing of good items. A lot depends upon the item writer’s intuition, imagination, experience, practice, and ingenuity. However, there are some essential prerequisites that must be met if the item writer wants to write good and appropriate items. These requirements are briefly discussed as follows;
- COMMAND ON SUBJECT MATTER:
The item writer must have a thorough knowledge and complete mastery of the subject matter. In other words, he must be fully acquainted with all facts, principles, misconceptions, Fallacies in a particular field so that he may be able to write good and appropriate items.
- FULLY AWARE OF THE POPULATION:
The item writer must be fully aware of those persons for whom the test is meant. He must also be aware of the intelligence level of those persons so that he may manipulate the difficulty level of the items for proper adjustment with their ability level. He must also be able to avoid irrelevant clues to correct responses.
- FAMILIARITY WITH DIFFERENT TYPES OF ITEMS:
The item writer must be familiar with different types of items along with their advantages and disadvantages. He must also be aware of the characteristics of good items and the common probable errors in writing items.
- COMMAND ON LANGUAGE:
The item writer must have a large vocabulary. He must know the different meanings of a word so that confusion in writing the items may be avoided. He must be able to convey the meaning of the items in the simplest possible language.
- EXPERT OPINION:
After writing down the items, they must be submitted to a group of subject experts for their criticism or suggestions, which must then be duly modified.
- CULTIVATE A RICH SOURCE OF IDEAS:
The item writer must also cultivate a rich source of ideas for items. This is because ideas are not produced in the mind automatically but rather require certain factors or stimuli. The common source of such factors are textbooks, Journals, discussions, questions for interviews, coarse outlines, and other instructional materials.
CHARACTERISTICS OF A GOOD ITEM:
An item must have the following characteristics;
An item should be phrased in such a manner that there is no ambiguity regarding its meaning for both the item writer as well as the examinees who take the test.
- MODERATELY DIFFICULT:
The item should not be too easy or too difficult.
- DISCRIMINATING POWER:
It must have discriminating power, that is, it must clearly distinguish between those who possess the trait and those who do not.
- TO THE POINT:
It should not be concerned with the trivial aspects of the subject matter, that is, it must only measure the significant aspects of knowledge or understanding.
- NOT ENCOURAGE GUESSWORK:
As far as possible, it should not encourage guesswork by the subjects.
- CLEAR IN READING:
It should not present any difficulty in reading.
- INDEPENDENT FOR ITS MEANING:
It should not be such that its meaning is dependent upon another item and/or it can be answered by referring to another item.
GENERAL GUIDELINES FOR ITEM WRITING:
Writing items is a matter of precision. It is perhaps more like computer programming than writing a prose. The task of the item writer is to focus the attention of a large group of examinees, varying in background experience, environmental exposure and ability level on a single idea. Such a situation requires extreme care in the choice of words. The item writer must keep in view some general guidelines that are essential for writing good items. These are listed as under;
CLARITY OF THE ITEM:
The clarity in writing test items is one of the main requirements for an item to be considered good. Items must not be written as “verbal puzzles”. They must be able to discriminate between those who are competent and those who are not. This is possible only when the items have been written in simple and clear language. The items must not be a test of the examinee’s ability to understand the language.
The item writer should be very cautious particularly in writing the objective items because each such item provides more or less an isolated bit of knowledge and there the problem of clarity is more serious. If the objective item is a vague one, it will create difficulty in understanding and the validity of the item will be adversely affected. Vagueness in writing items may be because of several reasons such as poor thinking and incompetence of the item writer.
NON-FUNCTIONAL WORDS SHOULD BE AVOIDED:
Non-functional words must not be included in the items as they tend to lower the validity of the item. Non-functional words refer to those words which make no contribution towards the appropriate and correct choice of a response by the examinees. Such words are often included by the item writer in an attempt to make the correct answer less obvious or to provide a good distractor.
AVOID IRRELEVANT ACCURACIES:
The item writer must make sure that irrelevant accuracies unintentionally incorporated in the items are avoided. Such irrelevant accuracies reflect the poor critical ability to think on the part of the item writer. They may also lead the examinees to think that the statement is true.
DIFFICULTY LEVEL SHOULD BE ADAPTABLE:
The item must not be too easy or too difficult for the examinees. The level of difficulty of the item should be adaptable to the level of understanding of the examinees. Although it is a fact that exact decisions regarding the difficulty value of an item can be taken only after some statistical techniques have been employed, yet an experienced item writer is capable of controlling the difficulty value beforehand and making it adaptable to the examinees.
In certain forms of objective-type items such as multiple choice-items and matching items, it is very easy to increase or decrease the difficulty value of the item. In general, when the response alternatives are made homogenous, the difficulty value of the item is increased but when the response alternatives are made heterogeneous, except the correct alternative, the examinee is likely to choose the correct answer soon and thus, the level of difficulty is decreased.
The item writer must keep in view the characteristics of both the ideal examinees as well as the typical examinees. If he keeps the typical examinees ( who are fewer in number) in view and ignores the ideal examinees, the test items are likely to be unreasonably difficult ones.
STEREOTYPED WORDS SHOULD BE AVOIDED:
The use of stereotyped words either in the stem or in the alternative responses must be avoided because these facilitate rote learners in guessing the correct answer. Moreover, such stereotyped words failed to discriminate between those who really know and understand the subject and those who do not. Thus, stereotyped words do not provide an adequate and discriminatory measure of an index. The most obvious way of getting rid of search such words is to paraphrase the words in a different manner so that those who really know the answer can pick up the meaning.
IRRELEVANT CLUES MUST BE AVOIDED:
Irrelevant clues must be avoided. These are sometimes provided in several forms such as clang association, verbal association, length of the answer, keeping a different foil among homogenous foils, giving the same order of the correct answer, etc. In general, such clues tend to decrease the difficulty level of the item because they provide an easy route to the correct answer.
The common observation is that the examinees who do not know the correct answer, choose any of these irrelevant clues and answer on that basis. The item writer must, therefore, take special care to avoid such irrelevant clues. Specific determiners like never, always, all, none must also be avoided because they are also irrelevant clues to the correct answer, especially in the two-alternative items.
INTERLOCKING ITEMS MUST BE AVOIDED:
Interlocking items must be avoided. Interlocking items, also known as interdependent items, are items that can be answered only by referring to other items. In other words, when responding correctly to an item is dependent upon the correct response of any other item, the item constitutes an example of an interlocking or independent item. For example:
- Sociometry is a technique used to study the effect structure of groups. True/false
- It is a kind of projective technique. True/false
- It was developed by Morene et al. true/false
The above examples illustrate the interlocking items. Answer to items 2 and 3 only be given when the examinee knows the correct answer of item 1. Such items should be avoided because they do not provide an equal chance for examinees to answer the item.
NUMBER OF ITEMS:
The item writer is also frequently faced with the problem of determining the exact number of items. As a matter of fact, there is no hard and fast rule regarding this. Previous studies have shown that the number of items I usually linked with the desired level of reliability coefficient of the test. Studies have revealed that usually 25-30 dichotomous items are needed to have the reliability coefficient as high as 0.80 whereas 15-20 items needed to reach the same level of reliability when multipoint items are used.
These are the minimum number of items which should be retained after item analysis. An item writer should always write almost TWICE the number of items to be retained finally. Thus, if he wants 30 items in the final test, he should write 60 items.
In the speed test, the number of items to be written is entirely dependent upon the intuitive judgment of the test constructor. On the basis of his previous experiences, he decides that a certain number of items can be answered with the given time limit.
ARRANGEMENT OF ITEMS:
After the items have been written down, they are reviewed by some experts are by the item writer himself and then arranged in the order in which they are to appear in the final test. Generally, items are arranged in increasing order of difficulty those having the same form (say alternative form, matching, multiple-choice, etc.) and dealing with the same contents are placed together.
- PRELIMINARY ADMINISTRATION:
Before proceeding toward administration of the test review by at least three experts. When the test have been written down and modified in the light of the suggestions and criticisms given by the experts, the test is said to be ready for experimental try-out.
THE EXPERIENTIAL TRYOUT/ PRE-TRY-OUT:
The first administration of the test is called EXPERIMENTAL TRY-OUT or PRE-TRY-OUT. The sample size for experimental try out should be 100.
The purpose of the experimental try out is manifold. According to Conrad (1951), the main purpose of the experimental tryout of any psychological and educational test is as follows:
- DETERMINES VAGUENESS AND WEAKNESSES:
Finding out the major weaknesses, omissions, ambiguities and inadequacies of the Items.
- DETERMINING DIFFICULTY LEVEL OF EACH ITEM:
Experimental tryout helps in determining the difficulty level of each item, which in turn helps in their proper distribution in the final form.
- DETERMINES TIME LIMIT
Helps in determining a reasonable time limit for the test.
- DETERMINES APPROPRIATE LENGTH OF A TEST.
Determining the appropriate length of the tests. In other words, it helps in determining the number of items to be included in the final form.
- IDENTIFYING WEAKNESSES IN DIRECTIONS.
Identifying any weaknesses and vagueness in directions or instructions of the test.
The second preliminary administration is called PROPER TRYOUT. At this stage test is delivered to the sample of 400 and must be similar to those for whom the test is intended.
The proper try-out is carried out for the item analysis. ITEM ANALYSIS is the technique of selecting discriminating items for the final composition of the test. It aims at obtaining three kinds of information regarding the items. That is;
- ITEM DIFFICULTY:
Item difficulty is the proportion or percentage of the examinees or individuals who answer the item correctly.
- DISCRIMINATORY POWER OF THE ITEMS:
The discriminatory power of the items refers to the extent to which any given item discriminates successfully between those who possess the trait in larger amounts and those who possess the same trait in the least amount.
- EFFECTIVENESS OF DISTRACTORS:
Determines the non-functional distractors.
The third preliminary administration is called the Final tryout. The sample for final administration should be at least 100. At this stage, the items are selected after item analysis and constitute the test in the final form. It is carried out to determine the minor defects that may not have been detected by the first two preliminary administrations.
The final administration indicates how effective the test will be when it would be administered on the sample for which it is really intended. Thus, the preliminary administration would be a kind of “DRESS REHEARSAL” providing a sort of final check on the procedure of administration of the test and its time limit.
After final tryout, expert opinion should be considered again.
- Reliability OF The Final Test
The basis of the experimental or empirical tryout the test is finally composed of the selected items, the final test is again administered on a fresh. For this purpose, we check the reliability of the test and it indicates the consistency of scores.
In simple words, it is defined as the degree to which a measurement is consistent. If findings from the research are replicated consistently then they are reliable.
Reliability also refers to the self-correlation of a test. A correlation coefficient can be used to assess the degree of reliability; if a test is reliable it should show a high positive correlation.
Types of Reliability
- Internal reliability
- External reliability
Internal reliability assesses the consistency of results across items within a test.
External reliability refers to the extent to which a measure varies from one use to another.
Errors in Reliability:
At a time scores are not consistent because some other factors also affect reliability e.g.
There is always a chance of 5% error in reliability which is acceptable.
TYPES OF ERRORS
- Random error
- Systematic error
A random error exists in every measurement and is often a major source of uncertainty. These errors have no particular assignable cause. These errors can never be totally eliminated or corrected. These are caused by many uncontrollable variables that are an inevitable part of every analysis made by human beings. These variables are impossible to identify, even if we identify some they cannot be measured because most of them are so small.
Systematic error is caused due to instruments, machines, and measuring tools. It is not due to individuals. Systematic error is acceptable we can fix and handled it.
WAYS OF FINDING RELIABILITY:
Following are the methods to check reliability
- Alternate form
- Split –half method
It is the oldest and commonly used method of testing reliability. The test-retest method assesses the external consistency of a test. Examples of appropriate tests include questionnaires and psychometric tests. It measures the stability of a test over time.
A typical assessment would involve giving participants the same test on two separate occasions. Each and everything from start to end will be the same in both tests. The results of the first test need to be correlated with the result of the second test. If the same or similar results are obtained then external reliability is established.
The timing of the test is important if the duration is too brief then participants may recall information from the first test which could bias the results. Alternatively, if the duration is too long it is feasible that the participants could have changed in some important way which could also bias the results.
The utility and worth of a psychological test decreases with time so the test should be revised and updated. When tests are not revised systematic error may arise.
In an alternate form, two equivalent forms of the test are administered to the same group of examinees. An individual has given one form of the test and after a period of time, the person is given a different version of the same test. The two forms of the rest is then correlated to yield a coefficient of equivalence.
- Positive point
In alternate form no deal to wait for time.
- Negative point
It is very hectic and risky task to make two test of equivalent level.
The split-half method assesses the internal consistency of a test. It measures the extent to which all parts of the test contribute equally to what is being measured. The test is technically spitted into odd and even form. The reason behind this is when we making test we always have the items in order of increasing difficulty if we put (1,2,—-10) in one half and (11,12,—-20) in another half then all easy question/items will go to one group and all difficult questions/items will go to the second group.
When we split the test we should split it with same format/theme e.g. Multiple questions – multiple questions or blanks – blanks.
- VALIDITY OF THE TEST:
It refers to the extent to which test claim to measure what it claims to measure.
If a test is reliable then it is not necessary that it is valid but if a test is valid then it must be reliable.
TYPES OF VALIDITY:
- External validity
- Internal validity
It is the extent to which the results of a research study can be generalized to different situations, different groups of people, different settings, different conditions etc.
It is basically the extent to which a study is free from flaws and that any differences in a measurement are due to an independent variable.
TYPES OF VALIDITY
- Face validity
- Construct validity
- Criterion-related validity
- FACE VALIDITY
Face validity is determined by a review of the items and not through the use of statistical analysis. Face validity is not investigated through formal procedures. Instead, anyone who looks over the test, including examinees, may develop an informed opinion as to whether or not the test is measuring what it is supposed to measure. While it is clearly of some value to having the test appear to be valid, face validity alone is insufficient for establishing that the test is measuring what it claims to measure.
- CONSTRUCT VALIDITY:
It implies using the construct correctly (concepts, ideas, notions). Construct validity seeks agreement between a theoretical concept and a specific measuring device or procedure.
For example, a test of intelligence now a day’s must include measures of multiple intelligences, rather than just logical-mathematical and linguistic ability measures.
CRITERION RELATED VALIDITY:
It states that the criteria should be clearly defined by the teacher in advance. It has to take into account other teachers’ criteria to be standardized and it also needs to demonstrate the accuracy of a measure or procedure compared to another measure or procedure which has already been demonstrated to be valid.
When psychologists design a test to be used in a variety of settings, they usually set up a scale for comparison by establishing norms.
Norm is defined as the average performance or scores of a large sample representative of a specified population. Norms are prepared to meaningfully interpret the scores obtained on the test for as we know, the obtained scores on the test themselves convey no meaning regarding the ability or trait being measured. But when these are compared with the norms, a meaningful inference can immediately be drawn.
Types of norms:
- Age norms
- Grade norms
- Percentile norms
- Standard scores norms
All these types of norms are not suited to all types of tests. Keeping in view the purpose and type of test, the test constructer develops a suitable norm for the test.
- AGE NORM
Age norms indicate the average performance of different samples of test takers who were at various ages at the time the test was administered.
If the measurement under consideration is height in inches for example we know that scores (heights) for children will gradually increase at various rates as a function of age up to the middle to late teens.
The child of any chronological age whose performance on a valid test of intellectual ability indicated that he or she had intellectual ability similar to that of the average child of some other age was said to have the mental age of the norm group in which his or her test score fell.
The reasoning here was that irrespective of chronological age, children with the same mental age could be expected to read the same level of material, solve the same kinds of math problems, and reason with a similar level of judgment. But some have complained that the concept of mental age is too broad and that although a 6-year-old might, for example, perform intellectually like a 12-year-old, the 6-year-old might not be very similar at all to the average 12 years old socially, psychologically, and otherwise.
- GRADE NORMS:
Grade norm was designed to indicate the average test performance of test-takers in a given school grade, grade norms are developed by administering the test to representative samples of children over a range of consecutive grade levels.
Like age norms, grade norms have a widespread application with children of elementary school age, the thought here is that children learn and develop at varying rates but in ways that are in some aspects predictable.
One drawback of grade norms is that they are useful only with respect to years and months of schooling completed. They have little or no applicability to children who are not yet in school or who are out of school.
- PERCENTILE NORMS:
Percentile system is ranking of test scores that indicate the ratio of score lower from higher than a given score. A percentile is an expression of the percentage of people whose score on a test or measure falls below a particular raw score. A more familiar description of test performance, the concept of percentage correct, must be distinguished from the concept of a percentile.
A percentile is a converted score that refers to a percentage of test takers.
Percentage correct refers to the distribution of raw scores-more specifically, to the number of items that were answered correctly multiplied by hundred and divided by the total number of items.
Because percentiles are easily calculated they are a popular way of organizing test data and are very adoptable to a wide range of tests.
- STANDARD SCORE NORMS:
When a raw score is converted into a formula it becomes standard scores.
For example, marks obtained in the paper may be in 100% are applicable only in a specific area but when they are converted in GPA they become standard score.
- PREPARATION OF MANUAL AND REPRODUCTION OF THE TEST:
The last step in test construction is the preparation of a manual of the test. In the manual, the test constructor reports the psychometric properties of the test, norms, and references. This gives a clear indication regarding the procedures of the test administration, the scoring methods, and time limits if any of the test.
It also includes instructions as well as the details of the arrangement of materials that is whether items have been arranged in random order or in any other order. The test constructor finally orders for printing of the test and the manual.