Survey Classification

Data Generation Process Workshop

Yue Hu

Tsinghua University

Goal

Download the pool of surveys from the dcpotools website
Have a look at the downloaded spreadsheet
- Select Freeze Panes from Window
- Start from the cross-national surveys (country_var is empty)

Search keywords relating to your topic
- e.g., for gender egalitarianism, you can search for “wom,” “husband,” “wife,” etc.
Look around the questions around the questions you found
Look over the index of questions

Download the template from the DCPOtools website.

survey: survey in the surveys_data spreadsheet;
variable: The question index, e.g., “q56,” “v122”;
question_text: The complete sentences read to the people taking the survey, or as close to that as you can find;
response_categories: The number and the label of each of the options, e.g., “1. Strongly agree, 2. Agree, 3. Neither agree nor disagree, 4. Disagree, 5. Strongly disagree”.

If, you’re sure there are no relevant questions in the survey, enter the survey and put “NA” under variable, and move on.

Three people per group, one group per topic.

Read the questions through (at least 1/3 each of an archieve)
Categorize them into three topics
- Using a term to represent each topic
- Mark the topic for each question
Talk with your partners to justify/modify your categorization system to be a consistent one
Mark all the questions
Measure the intercoder reliability (ICR) with Fleiss’ κ to determine if you need to categorize it again

Make sure recording the full sentence of the questions

According to Landis & Koch (1977), let’s aim 0.8.

A high κ is not the ultimate goal, a.k.a., no! fake! consistency!

Twice communications with your partners:

After you are famililar with the data and communicate to nail down the categorization system
After calculating the ICR and try to figure out the problem if the κ is low.

Make sure you record the data in the same way.