The University Writing Center provides a starting point for assessing written or oral communication. Contact Dr. Valerie Balester (email@example.com) for a consultation. It is highly recommended that any assessment plan be carefully constructed and that professionals such as those available in the Office of Research Compliance, Office of Institutional Effectiveness and Evaluation, or Data and Research Services .
The following guiding principles apply to assessment of written or oral communication and are important to consider in developing a plan:
Language is best assessed within a contextual framework that considers the sociopolitical context, rhetorical situation (audience, genre, purpose), and the composing process, as well as the finished product.
Assessment should be undertaken in the spirit of scholarship—to know and understand a process (composing) or product (a speech, essay, etc.) in order to continuously improve and perfect it.
Assessment should lead to appropriate instruction for each student and should never be used to exclude or deny opportunity.
Assessment should be tailored to local contexts, values, and attitudes.
Authentic performances are preferred over artificial tasks that ask students to perform with no intrinsic motivation.
Results should be made available in language accessible to all stakeholders
Some Considerations When Assessing Communication Skills
While learning to speak is inherently different from learning to write, learning the formal oral and written communication we teach in the university is based on some common theories of literacy and language acquisition. Literacy, understood broadly as rhetorical performance, both spoken and written, is learned through a process of socialization within a discourse community. Thus, when we hold students to a standard of literacy, we expect more than knowledge of conventions such as those which govern edited American English or spoken standard English—we expect conformity to specific genres, understanding of specific conventions that arise within a community of discourse.
Planning an Assessment
Students who fail to conform to our expected discourse conventions may be doing so for many reasons besides cognitive deficiency or a lack of knowledge, including, for example, resistance to conformity or a misunderstanding about expectations. Error may be more than a sign of poor skills—it may signal growth, since learning to perform occurs in situ, often as a process of trial and error, not as a set of skills mastered in a particular order.
Thus, we advocate when possible using authentic data: “To get an accurate sense of students’ strengths and weaknesses as language users, assessors need to allow students to engage in authentic language use” (O’Neill and others, 41). Besides examining oral and written discourse for communicative effectiveness, we can examine them for evidence of student learning of course material. Written documents such as essay tests, papers, and reports, and oral presentations can provide evidence of deep learning that may be more significant than scores on a standardized test.
Banta (2004) advocates getting all stakeholders involved in creating a written assessment plan so that their concerns are addressed and their support in collecting and disseminating findings is more probable. A plan requires a schedule with reasonable deadlines, assignment of responsibilities, a realistic budget, and clearly stated and agreed upon objectives. The objectives must be broken down into measurable actions, often stated as learning outcomes. “Effective oral or written communication skills” is an outcome often associated with writing or speaking, but it is broad and difficult to measure; there are many forms and genres of writing and speaking, all requiring different kinds of knowledge or practice.
Banta (2004) uses action verbs to state learning objectives. Ask what you want students to do, know, or believe. For example, you might want them to do one of the following (also demonstrating knowledge of a process):
evaluate an argument in writing using properly documented and cited sources in MLA format;
present a research proposal using presentation slides to a technical audience using the appropriate tone and content geared to the audience level of knowledge;
report research findings in a collaboratively written report that requires they find sources through the electronic databases from University Libraries.
Examples of what students believe usually involves determining attitudes: for example, that writing is a process that requires feedback from readers and at least one revision of a draft; that engineers will have to write; or, that it is not a waste of time to practice an oral presentation more than once. Beliefs can also affect behavior, so an assessment might determine if students have changed or adopted behaviors like increasing the number of drafts they write or spending more time on revision.
Selecting the Appropriate Measures
Once learning outcomes have been stated, explain how they will be measured. Include how data will be collected, who is responsible for collecting data and conducting each measurement, and when each measurement will be completed. Finally, explain how results will be analyzed and disseminated, and, if necessary, used for improvement.
Learning outcomes can be measured either directly, by looking at products or performances, or indirectly, by ascertaining attitudes, behaviors, habits, or beliefs. Direct measures are preferred and should whenever possible be included in an assessment of writing or speaking quality. If, however, the assessment aims for information about attitudes or behaviors only, direct measures may be unnecessary.
Always consider validity of the measure. There are a number of ways to consider a test’s validity, all important. If a test is an accurate construct of the ability or trait being measured, it is said to have construct validity. Also, the results of a test must be reasonable applied to what is being tested—it should be established that a score from a standardized test is appropriate in a local context. Often, simple measures such as a test of grammar don’t really get at the complexity of writing or speaking within a specific context.
Cronbach and Messnick argue for a comprehensive theory of text validity that considers not only how well measures on one test correlate with similar measures on another test (for example, if scores on a writing test and a grammar test are similar) but also consequences for stakeholders and implications of the local testing environment. So if a test has no educational value, it is invalid. For example, a multiple choice test of grammar is measuring something other than writing ability since such a test is de-contextualized from actual writing. It measures the ability to answer questions about grammar, perhaps, but not to apply them.
A timed writing test might be a better measure of writing quality. However, a timed test measures a specific sort of writing ability—that required to respond to a prompt with little or no opportunity for revision. A timed test may tell you more about how a writer performs under pressure than about writing ability. Some writers may do well on a first draft, but others may not, even though they are quite capable of high quality work under more natural conditions. Likewise, a writer who can produce a fairly good first draft on an impromptu test may not have the skills to produce a finely polished or more sustained piece.
In considering the use of standardized tests such as the SAT test of writing, it is imperative to consider validity—does this particular test really measure what your plan intends to measure? Validity, however, has a broader meaning as well. A valid assessment is one that can be shown, through evidence, that the results of the assessment can be responsibly interpreted and applied in a specific situation. So, for example, a perfectly good test of editing skills may be invalid if it is used to place students in a class where editing skills are not taught.
Machine scoring also should be mentioned. Tests like Accuplacer or Criterion boast high rates of reliability; however, what they actually measure is limited. They might be used for placement, but will not generally help establish a baseline unless the learning objectives for assessment mirror the conditions under which the tests were administered. For example, Criterion measures performance on a short impromptu essay on a general topic, something like a five-paragraph theme. It’s a stretch to claim students who can write well for this type of test have a deep understanding of synthesizing material from disciplinary literature or that they can write a cogent and concise business memo in the appropriate professional tone.
A word on the limitations of pre-test/post test designs for assessing communication skills. Because learning to communicate effectively in speaking or writing occurs within highly specific contexts and often over extended periods of time and requires complex cognitive processes, a pre-test/post-test model may be difficult to design. For example, students who have adequate grasp of punctuation may slip in their punctuations skills when presented with the demands of writing in a new genre for a new audience; a pre-test may show adequate skills, and a post-test within a semester’s framework may show deterioration, attributed to divided attention but corrected once the new genre is learned.
If possible, use multiple measures so that you can consider the data from different angles and have a richer understanding of results. While examining student papers may show that documentation is a problem area, it would be necessary to examine the course syllabus to understand why. (Was documentation taught, or was there an assumption that it was learned in a prior course?) An interview, focus group, or survey of students or the instructor might round out the picture. (Did the instructor think documentation had been taught previously? Did students think it did not count in the grade and thus ignore it?)
Embedding assessment in classes is a good practice because the products assessed are intrinsically meaningful and students were presumably motivated to produce them. Data can be collected from in-class exercises, essays, projects, or presentations captured on video. Personal information should be redacted or at least protected, and care should be taken to disassociate the results from student grades or advancement.
Timed tests of impromptu writing such as the SAT essay test
Pre-recorded or live oral presentations
Portfolios of work collected over time, often with a reflective introductory essay that explains the portfolio contents and gives information about the editing and composing process; a portfolio is a collection of work by a single student
Work produced in class for a grade that is being used outside class for independent assessment; the set should include the work in the whole class, not cherry-picked for worst or best samples, and might include homework, essay tests, papers, journals, oral presentations, or capstone projects
Embedded assignments, meaning a common assignment completed under similar conditions for the same grade percentage across courses (in a program, or for a multi-section course, for example)
Observations of writing in progress, often accompanied by an oral protocol, that is, the writer vocalizes the composing process as the writing occurs
Observations of teaching and learning, for example of student behavior in a classroom or writing tutorial
Performance on national examinations such as those use for licensure
Reflections written by students on their communication processes or on their learning
Discourse-based interviews in which a writer/speaker is asked to reflect on the composing or delivery of a particular document or presentation
Curriculum mapping, in which desired skills are mapped onto a course of study; the map will show where particular skills are introduced and where they are reinforced
Surveys (for example, of students, faculty, former students, employers)
Focus groups (for example, of students, faculty, former students, employers)
Interviews (for example, of students, faculty, former students, employers and including exit interviews)
Percentage of students going to professional or graduate school or statistics on job placement
Tests of grammar, vocabulary, or other skills related to written or oral communication (usually multiple choice)
Reflections on learning produced in class or after class that may include a “minute paper,” a journal entry, or the open-ended response to a survey, anything that is open-ended and that provides information about attitudes, behaviors, or processes
Scores on an assessment embedded in classes, such as those used in assessing an oral or written performance
External review committee reports
Reports from accreditation agencies
The Inventory of Processes in College Composition (developed in 2001 by Ellen Lavelle and Nancy Zuercher and available from the University Writing Center) measures college students' attitudes and strategies as linked to their approaches to writing tasks. Other measures suggested by Angelo and Cross (1993), including opinion polls, double-entry journals, and self-confidence surveys, can be easily worked into a classroom context.
Sources of information used for other purposes should also be considered:
Collecting the Data
It is advisable to obtain permission from the Institutional Review Board at Texas A&M University (Office of Research Compliance) before undertaking an assessment so that results can be published. The IRB also provides a good check on practice such as confidentiality, consent, and storage of data. There are a number of ways to collect student performances or papers/projects/portfolios. One efficient method is to ask students to turn in two copies of their work, with one omitting any identifying information; if students turn in work electronically, or in Blackboard, copies are easy to obtain as well. Another option is to videotape presentations. Turnitin, or any similar software that provides a report can also provide data about student performance.
Analyzing the Data
Perhaps the most common form of assessing the products of writing or speaking occurs with rubrics or metrics. For a discussion of how to create a rubric for communication assessment, see Grading and Commenting. When scoring is used, an important consideration is inter-rater reliability. One rater is not considered sufficient to ensure consistency, so multiple raters, usually two per product, must be used. To obtain reliability between raters (inter-rater reliability), training (also called norming or calibration) is required. Scoring should occur at a common place and time so that training can occur; in other words, everyone should work together so that discussion is possible. You will need someone at the scoring session to conduct training and to keep track of score discrepancies. Once raters have scored a few samples and discussed their scores, they are ready to score the actual data. Make sure they cannot see each other’s scores. Create a unique id for each rater and at least two score sheets for every item scored. If scores vary widely on a given product, a third scorer can be used. Check the scores as you go for reliability so that you can gauge the need to recalibrate. Usually re-calibration is a good idea after 3 hours of scoring. Depending on the parameters of your testing, different ways to analyze scores and determine inter-rater reliability can be run.
It is advisable to determine the method of analysis in the planning stages because the number of raters, types of items rated, and other factors can be pertinent. For help with determining inter-rater reliability, consult with Data and Research Services.
Some forms of assessment can be done without multiple scores, although reliability still needs to be established, such as by comparing placement decisions with subsequent performance in classes. Moss argues that expert opinion may not always be reflected in scores that agree on a rubric. The rubric itself may not capture the most salient features of the work, and critical differences are common among experts who bring their own expertise to a judgment. Smith demonstrates that expert opinion using a holistic scoring guide has been successful in placement—teachers familiar with a specific class or program read portfolios or other samples of student work and decide on an appropriate placement. The assumption is that teachers have the required experience and expertise to make such a call and that teaches have a unified understanding of program standards and institutional context.
The Office of Institutional Assessment (OIA) is interested in collecting any assessment data generated at Texas A&M University and will help departments track their projects and data in Weave Online. The advantage of using WEAVE Online is that it helps you develop a good plan. The OIA provides assistance with planning and writing up the results of an assessment.
The annual Texas A&M Assessment Conference is one venue for disseminating results. Also check your professional journals and associations for conference and publications related to pedagogy. Higher education publications include the Journal of the Scholarship of Teaching and Learning or the International Journal for the Scholarship of Teaching and Learning.
Angelo, Thomas A. and K. Patricia Cross. Classroom Assessment Techniques: A Handbook for College Teachers. 2nd ed. San Francisco, CA: Jossey-Bass, 1993.
Banta, Trudy, Ed. Portfolio Assessment: Uses, Cases, Scoring, and Impact. Assessment Update Collections. San Francisco, CA: John Wiley & Sons, 2003.
Banta, Trudy, Ed. Hallmarks of Effective Outcomes Assessment. Assessment Update Collections. San Francisco, CA: John Wiley & Sons, 2004.
Cronbach, Lee J. “Construct Validation After Thirty Years.” In Eds. R. L. Linn Intelligence Measurement, Theory and Public Policy: Proceedings of a Symposium in Honor of L. G. Humphreys. Urbana and Chicago: University of Illinois Press, 1989.
Lavelle, Ellen and Nancy Zuercher. "The Writing Approaches of University Students.” Higher Education: The International Journal of Higher Education and Educational Planning 42.3 (2001): 373-91.
Messick, Samuel. “Meaning and Value in Test Validation: The Science and Ethics of Assessment. Educational Researcher 18.2 (1989): 5-11.
Moss, Pamela A. “Joining the Dialogue on Validity Theory in Educational Research.” In Ed. O’Neill, Peggy. Blurring Boundaries: Developing Researchers, Writers, and Teachers. Creskill, NJ: Hampton Press, 2007. 91-100.
O’Neill, Peggy, Cindy Moore and Brian Huot. A Guide to College Writing Assessment. Logan, UT: Utah State University Press, 2009.
Smith, William L. “The Importance of Teacher Knowledge in College Composition Placement Testing.” In Eds John R. Hayes. Reading Empirical Research Studies: The Rhetoric of Research. Norwood, NJ: Ablex, 1992, 289-316.
The University Writing Center offers consultations on communication assessment. Contact Dr. Valerie Balester (firstname.lastname@example.org) for a consultation.
National Council of Teachers of English Position Statements on Assessment and Testing
National Communication Association assessment resources
VALUE: Valid Assessment of Learning in Undergraduate Education
Texas A&M University Office of Institutional Effectiveness and Evaluation
Texas A&M University Data and Research Services
Texas A&M University Center for Teaching Excellence
Texas A&M University Office of Research Compliance
Oral Communication Rubric from Schreyer Institute for Teaching Excellence, The Pennsylvania State University
“Running a Grade-Norming Session” by Pamela Flash (University of Minnesota).