A. Validity Analysis
Validity comes from a valid word, which means appropriate. In connection with the test, the test is valid is defined as consisting of the questions right. The test is said to have had a degree of validity, if the test is to ask what was supposed to be asked. There are two ways to determine the validity (accuracy measure) a test, namely in terms of the test ( as a totality) and in terms of its items. Analyzing the test (as a totality) can be done with two approaches, namely the ratio approach (logical analisys) and empirical approaches (empirical analisys).
Analyzing the rational approach can be done in two ways, namely: 1) to see the contents, content, and 2) to see its structure or construct, the construct. The first relates to the content of the questions and the second relates to the wording in the question itself. The following will discuss the validity of both perspectives.
1. Content Validity (Content Validity)
Content validity is validity in terms of the content of the test itself, ie the extent to which the test as a measure of its contents have been able to represent the whole reprentatif the subject matter or materials should diteskan. The content validity identified after analyzing, testing of the searches or the content contained in these tests. The statement is not the intent that the validity of the test is known as a matter of diteskan.
M artery-material that must be mastered by students in learning, in general, has been described in the Competency Standards (SK). Competency Standards are then clarified in the Basic Competence (KD), and the Basic Competence detailed in several indicators. Of these indicators exam material is made. Thus, a test
can be said to fulfill the terms of the validity of the rational material if the test materials refers to indicators that have been set.
2.
Construct validity (Construct Validity)
Construct validity can be defined as judging the validity of the terms of the arrangement, frame or partner. While the construct of the test as valid if, had the right, reflecting a construct in psychological theory. In psychology it is said that the soul of each child consists of three aspects, namely cognitive, affective and psychomotor. So the test is said to have construct validity if the test construct includes three aspects.
Analyzing the construct validity of the test results of this study can be done by way match between the aspects contained in the thinking about the aspects of thought contained in the indicator. Thus, learning about the test result is said to satisfy construct validity if it is logically or rationally analyzing the results indicate that aspects of thinking are revealed through the test items that test has been properly reflect aspects of thinking that by indicators ordered to be revealed
Analyzing the empirical approach can be done by looking at the test results. If the results of the tests are able to accurately predict the students in the later period and or
if the tests are conducted twice in an interval of time which is not too far and the results showed the similarity of the results of the test can be regarded as having validity. With the first ability (read: test capable of predicting), the test can be said to have the validity of the forecast (predictive validity) . While the second ability (read: tests performed twice with similar results), then the test can be said to have a comparative validity (concurrent validity) .
B. Reliability
Reliability comes from the word meaning reliable steady There are three ways that can be used to perform reliability test. 1) form a parallel: two tests that have a common purpose, level of difficulty, and composition, but because of different grains. In this way, the test given twice on the same students. A series of tests, for example (which will be sought reliability) and test series B, if the result is the same (high coefficient) then already had a reliability test. 2) the same questions given to the same students in different times. If the results of the first test in accordance with the results of the second test (the position in the middle of the student) then the test has been reliable.
3) about the same given to the same students and the same time. The third way is done by dividing the problem into two groups. Could the problem is with the even-and odd-numbered total questions divided by two, for example, the number of questions in the test are 30 items, the first group is a matter of numbers 1-15 and the second group about the numbers 16-30. So the reliability test is performed by comparing the results of the two test groups.
In determining the reliability of an objective test with essay between different forms of the formula. In the essay formula used is Alpha formula :
n ΣS i 2
r 11 = 1 -
n - 1 S t 2
r 11 : Coefficient of reliability tests
n : a lot of items in the test items issued
1 : a constant
ΣS i 2 : number of variants of the scores for each item items
S t 2 : total variance
ΣS i 2 can be obtained by using the formula below. For example, the description of the test will be determined reliability consists of 5 items, then ΣS i 2 can be obtained by adding a variant of item numbers 1-5:
ΣS i 2 = S i 2 1
+ S i 2 2 + S i 2 3
+ S i 2 4 + S i 2 5
While S i 2 1 , S i 2 2 , S i 2 3 , S i 2 4
and S i 2 5 sought by the formula:
(ΣX i ₁) 2
ΣX i ₁ 2 -
N
S i 2 1
=
N
(ΣX i 2 ) 2
ΣX i2 2 -
N
S i 2 2
=
N
(ΣX i 3 ) 2
ΣX i 3 2 -
N
S i 2 3
=
N
(ΣX i 4 ) 2
ΣX i 4 2 -
N
S i 2 4
=
N
(ΣX i 5 ) 2
ΣX i 5 2 -
N
S i 2 5
=
N
Next is the provision of interpretation of test reliability coefficient (r 11 ) is generally used benchmark as follows:
1. If equal to or greater than 0.70 mean achievement test that is being tested reliability otherwise been had high reliability (reliable)
2. If less than 0.70 means that the test results are being tested reliability study revealed yet have high reliability (un-reliable)
Here is an example on how to find the reliability of a test in the form of essays, presented about 5 items, followed by five testee with SMI 10:
Step
one: summing the scores achieved by each testee: ΣX i ₁ ΣX i2 ΣX i3
ΣX i4 ΣX i5 and look for the total score achieved by each testee to the item 5 item (X t ) and calculate the square of the total score (X t 2 )
Testee
|
Scores for item item number
|
X t
|
X t 2
|
||||
1
|
2
|
3
|
4
|
5
|
|||
Abdullah
|
7
|
5
|
6
|
6
|
5
|
29
|
841
|
Bin Laden
|
6
|
5
|
5
|
4
|
5
|
25
|
625
|
Chairil
|
3
|
3
|
2
|
4
|
3
|
15
|
225
|
Darulquthni
|
5
|
4
|
4
|
4
|
5
|
22
|
484
|
Ema
|
4
|
4
|
3
|
4
|
3
|
18
|
324
|
N = 5
|
ΣX i ₁ = 25
|
ΣXi 2
= 21
|
ΣX i3
= 20
|
ΣX i4
= 22
|
ΣX I5
= 21
|
ΣX t
|
ΣX t 2
|
Step
two: calculate (find) the sum of squares of items 1, 2, 3, 4, 5
JK
item1 = 7 2 + 6 2 + 3 2 + 5 2
+ 4 2 = 49 + 36 + 9 + 25 + 16 = 135
JK
item2 = 5 2 + 5 2 + 3 2 + 4 2 + 4 2 = 25 + 25 + 9 + 16 + 16 = 91
JK
item3 = 6 2 + 5 2 + 2 2 + 4 2
+ 3 2 = 36 + 25 + 4 + 16 + 9 = 93
JK
item4 = 6 2 + 4 2 + 4 2 + 4 2
+ 4 2 = 36 + 16 + 16 + 16 + 16 = 100
JK
item5 = 5 2 + 5 2 + 3 2 +5 2
+ 3 2 = 25 + 25 + 9 + 25 + 9 = 93
Implementation of an objective test forms
C. Analysis of Difficulties
Analysis of the items (items was) is an assessment of the item to get a question that has adequate quality. There is two types of item analysis, namely the analysis of the level (degree) and the difficulty of distinguishing analysis.
Analysis of the level of difficulty is the analysis of the items (questions) in terms of the difficulty. This analysis resulted in the level of difficulty of the items in a test. To determine the level of difficulty of the items can be done by using the formula:
B I :
Index difficulties for each item on the
I = B : the number of students who answered Yes
each item on the
N N : the number of students who respond to the matter in question
While the criteria used is, the smaller the index gained, the more difficult questions. Conversely, the greater the index gained, the easier questions. Criteria index of difficulty about it is:
0.00
- 0.30 = about difficult category
0.31
- 0.70 = about the category being
0.71 - 1.00 = about easy category
Example:
Name of student
|
Scores for students grains
|
Qty scores
|
||||||||||||||
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
||
Agus
Mind
Cicik
Dono
Endro
Farhan
Gandi
Hadi
Lyas
Jatmiko
|
1
1
1
0
0
0
1
0
0
1
|
0
0
0
0
0
0
1
0
0
1
|
0
1
1
0
0
1
0
0
1
1
|
1
1
0
0
1
1
1
1
0
0
|
0
1
1
1
1
0
1
1
0
0
|
0
0
1
1
1
1
0
0
1
0
|
1
1
1
1
1
1
1
1
1
1
|
1
0
0
0
1
1
1
0
0
1
|
1
1
0
0
1
1
0
0
1
1
|
1
0
0
0
1
0
0
0
0
1
|
1
0
1
0
1
0
1
1
1
1
|
0
0
1
1
1
0
0
0
1
0
|
1
1
1
1
1
0
1
1
1
0
|
1
1
1
1
1
1
1
0
1
1
|
0
0
0
0
0
0
0
1
0
0
|
9
8
9
6
11
7
9
6
8
9
|
Qty true
|
5
|
2
|
5
|
6
|
6
|
6
|
10
|
5
|
6
|
3
|
7
|
4
|
8
|
9
|
1
|
|
To assess the level of difficulty of questions:
1. 5: 10 = 0.5 = moderate (0.50)
2. 2: 10 = 0.2 = difficult (0.20)
3. 5: 10 = 0.5 = moderate (0.50)
4. 6: 10 = 0.6 = moderate (0.60)
5. 6: 10 = 0.6 = moderate (0.60)
6. 6: 10 = 0.6 = moderate (0.60)
7. 10:10 = 1.00 = easy
8. 5: 10 = 0.5 = moderate (0.50)
9. 6: 10 = 0.6 = moderate (0.60)
10. 3: 10 = 0.3 = difficult (0.30)
11. 7: 10 = 0.7 = moderate (0.70)
12. 4: 10 = 0.4 = moderate (0.40)
13. 8: 10 = 0.8 = easy (0.80)
14. 9: 10 = 0.9 = easy (0.90)
15. 1: 10 = 0.1 = difficult (0.01)
There are two opinions in the face of difficulty level of the items in this test. Firstly,
saying good test is a test that point the item has moderate difficulty level. According to the first opinion, the items are classified as difficult and not easy to use in the test. There are three possible actions for the second level. 1) the item is discarded. 2) look for the things that make the item and be difficult or easy, and then used again. 3) used ses u ai needs. Secondly, say good test is a test that is constructed from items that have a balance of degree of difficulty, in proportion. The difficulty level is measured from the side of the child instead of the teacher. This understanding is important in the determination of a matter of proportion and criteria that include easy, moderate, and difficult. There are two considerations in determining the proportion of the amount of matter in the category of easy, medium and difficult. 1) lack of balance, which is about the same amount for all three categories. For example, the number of about 60, then 20 in the category of easy questions, 20 questions and 20 questions were difficult. Secondly, based on the normal curve. Most of the questions are in the middle category. While others are in the category of easy and difficult to draw proportions. For example: 3-4-3. That is 30% easy, 40% moderate, and 30% are in a difficult category. Example: number of about 60, then 18, about to be in the category of easy, 24 medium, and 18 others difficult. Or 3-5-2. This means that: 30% easy, 50% moderate and 20% difficult.
Example: a matter of which consists of 10 multiple choice questions with composition 3 easy, 4 medium and 3 hard questions. If described, because the arrangement is as follows:
No.. about
|
Abilitas measured
|
Level of difficulty about
|
1
2
3
4
5
6
7
8
9
10
|
Knowledge
Application
Comprehension
Analysis
Evaluation
Synthesis
Comprehension
Application
Analysis
Synthesis
|
Easy
Moderate
Easy
Moderate
Difficult
Difficult
Easy
Moderate
Moderate
Difficult
|
Data obtained after correcting the answer
No.. about
|
The number of students who answered (N)
|
The number of students who answered right (B)
|
Index
B
N
|
Category matter
|
1
2
3
4
5
6
7
8
9
10
|
20
20
20
20
20
20
20
20
20
20
|
10
15
20
7
16
17
6
13
14
5
|
0.5
0.75
1.0
0.35
0.8
0.85
0.3
0.65
0.7
0.25
|
Moderate
Easy
Easy
Moderate
Easy
Easy
Difficult
Moderate
Moderate
Difficult
|
The above data indicate that about 5 misses, which is about the projected number 1 easily after trying turns fall into the category of being . Problem number 2 is the opposite. And so on. On this basis, the fifth question must be repaired, according to the projections.
Other criteria in determining the matter is to use judgment (decision) of teachers based on certain considerations, among others:
- abilitas (ability) as measured in the question. For example: for the field of cognitive, aspects of knowledge / understanding of memory and easy category, including aspects of implementation analysis and evaluation sitesis being temporarily categorized as difficult.
- nature of the material being tested or asked. Such as: the fact (easy), concepts and principles / law including medium and generalization (draw conclusions) fall into the category difficult.
- content of the material in question in accordance with the scientific field, both broad and kedalamnnya. In this regard, a teacher should have determined which includes an easy-medium-hard.
form of matter. For example, an objective test: BS easier than multiple choice. And the match is more difficult than multiple choice. Once the judgment
is done teacher, then tested and analyzed matter, whether judgment is appropriate or not. Was that in the matter of judgment as easily, is easy also in the results of the analysis.
D. Differential Power Analysis
Analysis of distinguishing examine those items in order to know about the ability to distinguish among the better students with students classified as less. That is, if the question is given to students who are able to demonstrate the results of high achievement, and vice versa. Thus, the issue of distinguishing features. Method used in the data analysis differentiator is to use a table or criteria of Rose andStanley : SR - ST. SR: Students answer one of the Low group. ST: Students answer one of the High group.
The steps that must be taken:
1. check all students' answers to the test participants.
For example, the results of the correction is as follows:
Name of student
|
Scores for students grains
|
Qty scores
|
||||||||||||||||||||||
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
||||||||||
Agus
Mind
Cicik
Dono
Endro
Farhan
Gandi
Hadi
Lyas
Jatmiko
Intention
Linda
Minul
Ninik
Opik
Whistle
Qira
Rita
Susi
Tutut
Udin
Vina
Xanana
Yes ni
Zidan
Amri
Care
Cica
Dora
Echo
|
1
1
1
0
0
0
1
0
0
1
1
1
0
0
1
0
1
1
0
0
0
1
1
1
1
1
0
1
0
1
|
0
0
0
0
0
0
1
0
0
1
1
0
0
1
1
1
1
0
0
1
1
1
0
0
1
1
1
1
1
0
|
0
1
1
0
0
1
0
0
1
1
0
0
1
1
1
1
0
0
0
1
1
1
1
1
0
0
0
1
1
1
|
1
1
0
0
1
1
1
1
0
0
1
0
1
1
0
0
1
1
1
0
0
1
1
1
1
0
1
0
1
0
|
0
1
1
1
1
0
1
1
0
0
0
0
1
1
1
1
1
0
0
1
0
1
1
0
0
1
1
1
0
1
|
0
0
1
1
1
1
0
0
1
0
1
1
0
0
1
1
0
0
0
1
1
0
1
1
0
1
0
1
0
1
|
1
1
1
1
1
1
1
1
1
1
0
0
0
1
1
1
0
0
1
1
0
0
1
1
1
0
1
0
1
0
|
1
0
0
0
1
1
1
0
0
1
1
1
1
1
0
0
0
0
0
1
1
0
1
0
1
0
1
1
1
0
|
1
1
0
0
1
1
0
0
1
1
1
1
0
0
1
1
1
0
0
0
1
1
1
1
0
0
1
1
0
0
|
1
0
0
0
1
0
0
0
0
1
1
1
1
1
0
0
1
0
1
0
1
0
0
1
1
1
0
0
0
1
|
1
0
1
0
1
0
1
1
1
1
0
0
0
1
1
1
0
0
0
1
0
0
0
0
1
1
1
1
1
0
|
0
0
1
1
1
0
0
0
1
0
1
1
1
1
0
0
1
0
1
0
1
1
1
0
0
0
1
1
1
0
|
1
1
1
1
1
0
1
1
1
0
1
1
1
0
0
0
1
1
0
1
1
1
1
1
0
0
0
1
1
0
|
1
1
1
1
1
1
1
0
1
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
0
0
0
1
1
1
|
0
0
0
0
0
0
0
1
0
0
1
1
0
1
0
1
0
0
1
1
1
0
0
1
1
1
0
1
1
0
|
9
8
9
6
11
7
9
6
8
9
11
8
7
11
9
8
9
4
6
9
10
8
11
9
8
7
8
12
10
6
|
||||||||
Qty true
|
17
|
15
|
17
|
18
|
18
|
17
|
20
|
16
|
17
|
14
|
16
|
16
|
20
|
19
|
13
|
|
||||||||
2.
make a list of test results ranked based on the scores achieved:
Name of student
|
Score
|
Rating
|
Cica
Endro
Xanana
Ninik
Intention
Udin
Dora
Agus
Cicik
Gandi
Jatmiko
Opik
Qira
Tutut
Yani
Mind
Lyas
Linda
Whistle
Vina
Zidan
Care
Farhan
Minul
Amri
Dono
Hadi
Susi
Echo
Rita
|
12
11
11
11
11
10
10
9
9
9
9
9
9
9
9
8
8
8
8
8
8
8
7
7
7
6
6
6
6
4
|
|
3.
determine the total sample 27% of the number of participants to a group of students test proficient and 27% for groups of students are less
Name of student
|
Score
|
Rating
|
Cica
Endro
Xanana
Ninik
Intention
Udin
Dora
Agus
Cicik
Gandi
Jatmiko
Opik
Qira
Tutut
Yani
Mind
Lyas
Linda
Whistle
Vina
Zidan
Care
Farhan
Minul
Amri
Dono
Hadi
Susi
Echo
Rita
|
12
11
11
11 ST
11
10
10
9
9
9
9
9
9
9
9
8
8
8
8
8
8
8
7
7
7
6 SR
6
6
6
4
|
|
4.
analyzing the grain problem, ie counting the number of students who answered one of all matter, both of clever group (ST) and the less intelligent groups (SR).
No matter
|
The number of students who answered one of the SR
|
The number of students who answered one of ST
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
5
7
5
3
3
4
4
6
7
4
5
5
4
4
5
|
4
3
3
2
5
3
3
0
2
3
3
1
1
0
3
|
5. calculate the difference in the number of students who incorrectly answered the group less intelligent groups (SR-ST).
No matter
|
The number of students who answered one of the SR
|
The number of students who answered one of ST
|
SR-ST
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
5
7
5
3
3
4
4
6
7
4
5
5
4
4
5
|
4
3
3
2
5
3
3
0
2
3
3
1
1
0
3
|
1
4
2
1
-2
1
1
6
5
1
2
4
3
4
2
|
6. comparing the difference value obtained with the value of the table Ross & Stanly.
No matter
|
The number of students who answered one of the SR
|
The number of students who answered one of ST
|
SR-ST
|
Limit value table
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
5
7
5
3
3
4
4
6
7
4
5
5
4
4
5
|
4
3
3
2
5
3
3
0
2
3
3
1
1
0
3
|
1
4
2
1
-2
1
1
6
5
1
2
4
3
4
2
|
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
|
7.
find whether there is any distinguishing features about the numbers on each criteria: having distinguishing if the value of the difference in the number of students who answered one of the groups with the less intelligent groups (SR-ST) equal to or greater than the value of the table.
Tests that do not have distinguishing sometimes too easy and too difficult sometimes. Ideally, all items must have a distinguishing matter and level of difficulty.
No matter
|
The number of students who answered one of the SR
|
The number of students who answered one of ST
|
SR-ST
|
Limit value table
|
Ket.
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
5
7
5
3
3
4
4
6
7
4
5
5
4
4
5
|
4
3
3
2
5
3
3
0
2
3
3
1
1
0
3
|
1
4
2
1
-2
1
1
6
5
1
2
4
3
4
2
|
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
|
Denied
Denied
Denied
Denied
Denied
Denied
Denied
Be accepted
Be accepted
Denied
Denied
Denied
Denied
Denied
Denied
|
E. Function distractors
Distractor is option (alternative) accompanying answer keys. Function as a rapscallion. The goal: to outwit the test participants. The more participants are fooled test, the distractors are better in carrying out its functions. In other words, the distractor can be said to function properly if you have energy stimulation. A distractor said to be good if at least 5% of the test participants chose (fooled). Example:
No..
|
Alternative (option)
|
Information
|
||||
A
|
B
|
C
|
D
|
E
|
||
1
2
3
|
4
1
1
|
6
(44)
1
|
5
2
(10)
|
(30)
1
1
|
5
2
37
|
( ) key to answer
|
From the above examples it can be said that the distractor is in accordance with item 1 function, that is outwit test participants. For option A was chosen by 4 participants (4:50 x100 = 8) which means 8% of participants. Option B has been selected by six participants (6:50 x100 = 12) which means 12% as well as the C and E, respectively chosen by 10% of participants test.
While the second option in the item is not functioning, because the test participants selecting them below the minimum limit. Option A was chosen by (1:50 x100 = 2) 2% of participants. So is option D. whereas option C and E, respectively selected by (2:50 x100 = 4) 4% of the total participants. Likewise, the ABD option in item 3, was chosen by 2% of participants. Medium option E shows a large function, which is chosen by 74% of participants.
The description of the function of distractor above illustrates the different power once held by the matter.
Tidak ada komentar:
Posting Komentar