Documentation literacy as a metacognitive skill in computer programming
Dominic Bordelon, Research Data Librarian
University of Pittsburgh Library System
October 6, 2022
Quick terminology note
I will say “function” often in this presentation, e.g., “documentation tells us how a function works.” For the purposes of this argument, it could be interchanged with “method” and “class.”
The (anecdotal) search behavior of novice programmers
Scenario: learner recognizes an information need, because of 1) not knowing how to do something (which function/method to use?)
Straight to Google
Blog posts
Unpleasant and distracting UX; crucial version info may be unclear or missing (e.g., Python 2.x vs. 3.x)
Tutorials
Same problems as blog posts; info presented in a dribble and may not go deep enough to answer the user’s question
(if they’re lucky) Stack Overflow posts
large forum with trusted results
They don’t know how to google for their programming information needs.
What they typically don’t do: go the Help menu, press F1, search the official docs
As instructors, we often model mature ways of reasoning and effective strategies for writing code. But we should also model how to effectively formulate and answer questions about coding using API documentation.
What should we do instead?
Teach learners how to read and use API documentation and use help systems.
Why does this matter?
A poor information pathway contributes to extraneous cognitive load, which is already substantial while programming.
How to read documentation
Features of (good) API documentation:
names of arguments and their default values
data types of arguments \(\rightarrow\) what kind of object does the function expect as input?
return value \(\rightarrow\) what kind of object will the object return to us?
We are inured to [what we imagine to be] unhelpful help systems, and they feel particularly obsolete with the Internet.
But
running help() on your function might provide the answer you need more quickly than Google
alt-tabbing to your browser and searching the Web adds extraneous cognitive load
Googling requires the user to 1) switch applications, 2) run their search, 3) assess results with varying authorship and format, and then within a web page to 4) isolate content
Scherer, Siddiq, and Sánchez Viveros (2020) categorize existing literature of teaching and learning computer programming as dealing with:
effectiveness of programming interventions per se (e.g., effects of learning programming on math or problem-solving)
effectiveness of visualization or physicality (e.g., Scratch, Arduino)
effectiveness of instructional approaches (e.g., pair programming, learner reflection)
“Teaching programming through metacognition seems effective, and the metacognitive skills acquired during instruction may ultimately impact students’ problem-solving performance and success.” Scherer, Siddiq, and Sánchez Viveros (2020)
Rum and Zolkepli (2018) applied metacognitive strategies such as planning and organizing, making a project timeline, troubleshooting issues, linking learning to prior knowledge in discussion, and self-reflection and self-assessment, and found a correlation with student success in teaching and learning computer programming.
Documentation literacy is another metacognitive strategy or skill we could impart to students to support their learning and doing of computer programming.
We can also think of documentation literacy as a specific kind of information literacy from a library science perspective.
Teaching example: Learner exercise
A handy function is help(), which queries R’s documentation system. Most commonly, you’ll look up functions. You can search by running help(topic) or ?topic, e.g., ?sqrt. Notice that the result will appear in the Help pane.
There is confusion among some R users what the “c” in the function c() stands for. Using the help system, what does c() do? What do you think “c” stands for?
Create a vector of arbitrary patient ages and store it as an object called ages.
# answer code goes here
What do you estimate is the mean age? Calculate it using mean().
# answer code goes here
What is the mean value of the Size variable in tg? How about numAge in cvdr?
# answer code goes here
Teaching example: Problem set section about missing values
In R, a missing value (equivalent to an empty cell in Excel—NOT to zero) is represented with NA (not available). You can’t use it in calculations because its uncertainty taints any numbers it interacts with. Many times, the presence of NA is expected and fine; some variables are empty sometimes.
Try the code below:
my_values <-c(1, 0, 3, 4)# predict: what will sum() and mean() of my_values be?# calculate them below:some_values <-c(1, NA, 3, 4)# predict: what will sum() and mean() of some_values be?# calculate them below:# why does this happen?# how can we fix this problem? # (hint: run ?sum or ?mean and look in the Arguments section)# can you fix the sum() and mean() function calls for some_values?# (hint: if you're unsure of the syntax, run ?sum and check the Examples)
How to check whether a vector has any NAs? The anyNA() function:
anyNA(my_values)
[1] FALSE
anyNA(some_values)
[1] TRUE
There is a help file for missing values: ?NA
Limitations of this approach
These ideas are empirically based, but no hypotheses have been tested
I have a hunch this can be effective, but is it?
API documentation is most usable when the reader knows already the name of the function they want to look up.
API documentation is not always well written or up to date. (But we should still show learners how to use the manual, even if we don’t think it’s an ideal manual.)
Table 1. Each approach can be advantageous for different information-need cases.
API documentation might be better for:
| Web search might be better for:
“What order do the arguments take?”
“What are the names of the arguments?”
“What is the default value of this argument? What is the default behavior of this function?”
“Can str.join() only be used with lists, or also other kinds of iterables?”
“How do you turn a list of items into a single string?” (A novice will not think to search for str.join() and search functionality
“What does this error mean?” (copy/paste it)
“Functions A and B appear to do the same thing. Is that right? If yes, is there any advantage to one or the other?”
“Is recent development x a known issue with function A?”
Conclusions
Let’s emphasize the API documentation when we’re teaching programming languages and tools (people work hard on it!)
As part of our computer programming instruction, let’s model effective web search practices—query formation, assessment of results, navigation within a document—and favor official API documentation where appropriate (e.g., once determining the name of the needed function).
By walking novices through our own thought processes and strategies, which have formed from experience as practitioners, we can hope to transfer some of our skills and knowledge to them.
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Rossum, Guido van, and Fred L. Drake. The Python Language Reference. Release 3.0.1 [Repr.]. Python Documentation Manual / Guido van Rossum; Fred L. Drake [Ed.], Pt. 2. Hampton, NH: Python Software Foundation, 2010.
RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/.
Rum, Siti Nurulain Mohd, and Maslina Zolkepli. “Metacognitive Strategies in Teaching and Learning Computer Programming.” International Journal of Engineering & Technology 7, no. 4.38 (December 3, 2018): 788–94. https://doi.org/10.14419/ijet.v7i4.38.27546.
Scherer, Ronny, Fazilat Siddiq, and Bárbara Sánchez Viveros. “A Meta-Analysis of Teaching and Learning Computer Programming: Effective Instructional Approaches and Conditions.” Computers in Human Behavior 109 (August 2020): 106349. https://doi.org/10.1016/j.chb.2020.106349.
Sweller, John, Paul Ayres, and Slava Kalyuga. Cognitive Load Theory. 1st ed. Explorations in the Learning Sciences, Instructional Systems and Performance Technologies. New York Dordrecht Heidelberg London: Springer, 2011.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.