CS376: Operating Systems - File I/O (100 Points)
Assignment Goals
The goals of this assignment are:- Create and modify files using the C standard I/O interface
Background Reading and References
Please refer to the following readings and examples offering templates to help get you started:The Assignment
In this assignment, you will open a text file, get its size, malloc
a buffer, and read the contents of the file into that buffer.
To read a string from the console, you can use the line fgets(buf, 1024, stdin);
from stdio.h
, assuming that buf
is malloc
‘ed to at least 1025
bytes (I am leaving room for a null terminator in case you wish to print the buffer!). Using this, prompt the user to enter a word to search the text file for.
Loop over each line of the file contents by tokenizing it by the newline \n
character. For each line, tokenize that line by spaces. Then, loop over this list of tokens and check if the token matches that line. If it does, print the entire line to the screen.
Lastly, create a new file using O_CREAT
and write these matching lines to the new text file (so that lines that do not match are skipped). Be sure to open the file with 0666
permissions and with O_RDWR
so that you can both read and write to the file.
The filename should be given via argv
command line parameters. The filename to read will be in argv[1]
and the filename to write will be in argv[2]
(recall that the program name is argv[0]
). Be careful to check argc
before accessing these values. If argc
is not at least 3, print an error message to the screen and exit
or return
a nonzero token value to indicate that an error has occurred.
close
the file and free
any memory when done.
What to Do
-
Begin by
open
ing the file given byargv[1]
. Usefstat
to get the size of the file, andmalloc
a buffer equal to that size plus1
. This buffer will hold the contents of your file. Set the last index of this buffer to\0
for the null terminator. -
Now,
read
this file contents into your buffer. You’ll read a number of bytes equal to the size of the file, from the file descriptor you got when you calledopen
, and into the buffer you just created. -
Next, tokenize the buffer by newline. This will return a
char**
that represents an array of lines of text in your file. A tokenize function that usesstrtok
is given below. -
Now, loop over that array. My
tokenize
function accepts acount
paramter by reference that will contain the size of the array, so you know how far to loop! You want to check if yourargv[2]
search string is in this line, so you will want to tokenize each line by space. Do that now, and created a nested loop that iterates over the words that you get back. -
Using
strcmp
, check if each word matches your search string. If it does, print the line, and break out of this inner loop (since you don’t need to print it twice). -
Finally, instead of printing to the screen, modify this program to open another file, and write the matching lines to the file.
Helpful Utility Functions
Tokenizing a String
You can use the strtok
function to tokenize a string. Here is a function that dynamically allocates (and re-allocates) an array of tokens given an input string, a string of possible delimiters, and an integer representing the number of tokens found (passed as a pointer so that updates are seen in the main
function).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char** tokenize(char* input, const char* delimiter, int* tokenCount) {
char** tokens = NULL;
int capacity = 10; // Initial capacity for the array of tokens
*tokenCount = 0; // Initialize the token count
// Allocate initial memory for tokens
tokens = (char**)malloc(capacity * sizeof(char*));
if (tokens == NULL) {
fprintf(stderr, "Memory allocation failed\n");
return NULL;
}
// Duplicate the input string to avoid modifying the original string
char* inputDup = strdup(input);
if (inputDup == NULL) {
free(tokens);
fprintf(stderr, "Memory allocation failed\n");
return NULL;
}
// Tokenize the input string
char* token = strtok(inputDup, delimiter);
while (token != NULL) {
if (*tokenCount >= capacity) {
// Increase capacity
capacity *= 2;
char** temp = (char**)realloc(tokens, capacity * sizeof(char*));
if (temp == NULL) {
// Cleanup and error handling
for (int i = 0; i < *tokenCount; ++i) {
free(tokens[i]);
}
free(tokens);
free(inputDup);
fprintf(stderr, "Memory allocation failed\n");
return NULL;
}
tokens = temp;
}
// Duplicate token and add to the array
tokens[*tokenCount] = strdup(token);
if (tokens[*tokenCount] == NULL) {
// Cleanup and error handling
for (int i = 0; i < *tokenCount; ++i) {
free(tokens[i]);
}
free(tokens);
free(inputDup);
fprintf(stderr, "Memory allocation failed\n");
return NULL;
}
(*tokenCount)++;
token = strtok(NULL, delimiter);
}
free(inputDup); // Cleanup the duplicated string
return tokens;
}
int main() {
char input[] = "This is a test string, with several delimiters.";
const char* delimiter = " ,."; // Tokenize based on space, comma, and period
int tokenCount = 0;
char** tokens = tokenize(input, delimiter, &tokenCount);
if (tokens != NULL) {
for (int i = 0; i < tokenCount; i++) {
printf("%d: %s\n", i + 1, tokens[i]);
free(tokens[i]); // Free each token
}
free(tokens); // Free the array of tokens
}
return 0;
}
Comparing Two Strings
You can compare two strings using the strcmp
function, which returns 0
when the strings are equal. Here is an example:
#include <stdio.h>
#include <string.h>
int main() {
// Define pairs of strings to compare
char* pairs[][2] = {
{"apple", "apple"}, // Equal strings
{"apple", "banana"}, // Lexicographically less
{"banana", "apple"}, // Lexicographically greater
{"apple", "Apple"}, // Case sensitivity, 'a' > 'A' in ASCII
{"", ""}, // Both strings are empty
{"apple", ""}, // Non-empty string compared with an empty string
{"apple", "apple"} // The strings are equal
};
// Calculate the number of pairs
int numPairs = sizeof(pairs) / sizeof(pairs[0]);
for (int i = 0; i < numPairs; i++) {
// Compare the strings
int result = strcmp(pairs[i][0], pairs[i][1]);
// Print the result of the comparison
printf("Comparing \"%s\" and \"%s\": ", pairs[i][0], pairs[i][1]);
if (result < 0) {
printf("The first string is less than the second string.\n");
} else if (result > 0) {
printf("The first string is greater than the second string.\n");
} else {
printf("The strings are equal; result == 0.\n");
}
}
return 0;
}
Trimming Whitespace from a String
Sometimes, tokenizing a string will leave the newline character or a stray space at the end of your string. You can remove these using a loop over the array:
#include <string.h>
void trim(char* s) {
for(int i = 0; i < strlen(s); i++) {
if(isspace(s[i])) { // this is equivalent to checking if s[i] == ' ' || s[i] == '\n' ... (and other blank characters)
s[i] = '\0';
}
}
}
Using the CSAPP Reliable I/O Functions
To deal with short counts, you can use the built-in loops of rio_readn
and rio_writen
in the csapp.c
and csapp.h
file. The rio_readn
and rio_writen
functions are part of the Robust I/O (RIO) package provided by the “Computer Systems: A Programmer’s Perspective” (CS:APP) textbook. These functions are designed to handle the complexities and potential partial read/write issues associated with Unix I/O operations known as short counts. They provide a more reliable way to perform reading and writing operations on files, network sockets, and other I/O channels. They are designed to mirror the standard read
and write
functions, and work as follows:
rio_readn
: This function attempts to read n bytes from a file descriptor into a user buffer. It deals with the partial read problem by continuing to read until all n bytes have been read or an end-of-file (EOF) is reached.rio_writen
: Conversely, this function writes n bytes from a user buffer to a file descriptor. It addresses the partial write issue by continuing to write until all n bytes have been written.
Using rio_readn
#include "csapp.h"
int main() {
int fd = open("myfile.txt", O_RDONLY);
if (fd < 0) {
perror("open");
return 1;
}
ssize_t n;
char buf[1024];
n = rio_readn(fd, buf, sizeof(buf));
if (n < 0) {
perror("read");
return 1;
}
printf("Read %zd bytes\n", n);
close(fd);
return 0;
}
Using rio_writen
#include "csapp.h"
int main() {
int fd = open("myfile.txt", O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR);
if (fd < 0) {
perror("open");
return 1;
}
const char* buf = "Hello, world!";
ssize_t n = rio_writen(fd, (void *)buf, strlen(buf));
if (n != strlen(buf)) {
perror("write");
return 1;
}
printf("Wrote %zd bytes\n", n);
close(fd);
return 0;
}
Compiling the Program
Save the csapp.c
and csapp.h
files into your project directory and compile them with the following command (assuming your program is called myprogram.c
and consists of only that single file otherwise):
gcc -lpthread -o myprogram myprogram.c csapp.c
On some systems, the correct parameter is -pthread
instead of -lpthread
.
Makefile
Be sure to include a Makefile with your submission that builds and tests your program. If you prefer, you can include a shell script of test cases that you execute via the Makefile.
Submission
In your submission, please include answers to any questions asked on the assignment page in your README file. If you wrote code as part of this assignment, please describe your design, approach, and implementation in your README file as well. Finally, include answers to the following questions:- Describe what you did, how you did it, what challenges you encountered, and how you solved them.
- Please answer any questions found throughout the narrative of this assignment.
- If collaboration with a buddy was permitted, did you work with a buddy on this assignment? If so, who? If not, do you certify that this submission represents your own original work?
- Please identify any and all portions of your submission that were not originally written by you (for example, code originally written by your buddy, or anything taken or adapted from a non-classroom resource). It is always OK to use your textbook and instructor notes; however, you are certifying that any portions not designated as coming from an outside person or source are your own original work.
- Approximately how many hours it took you to finish this assignment (I will not judge you for this at all...I am simply using it to gauge if the assignments are too easy or hard)?
- Your overall impression of the assignment. Did you love it, hate it, or were you neutral? One word answers are fine, but if you have any suggestions for the future let me know.
- Using the grading specifications on this page, discuss briefly the grade you would give yourself and why. Discuss each item in the grading specification.
- Any other concerns that you have. For instance, if you have a bug that you were unable to solve but you made progress, write that here. The more you articulate the problem the more partial credit you will receive (it is fine to leave this blank).
Assignment Rubric
Description | Pre-Emerging (< 50%) | Beginning (50%) | Progressing (85%) | Proficient (100%) |
---|---|---|---|---|
Program Correctness (70%) | Code does not compile or execute, or produces critical errors. | Code compiles and executes but contains significant logic errors, leading to incorrect results. | Code compiles, executes, and produces correct results in most cases but may have minor issues. | Code is correct, produces accurate results, and handles edge cases effectively. |
Makefile and Compilation (15%) | No Makefile provided or compilation issues. | Basic Makefile provided but with errors or missing targets. | Functional Makefile, minor issues with targets or dependencies. | Well-structured Makefile, enabling seamless compilation without errors. |
Documentation (README) (15%) | No README or severely lacking in information. | Basic README provided but lacking essential information or clarity. | Good README, covers most essential aspects with minor omissions or clarity issues. | Comprehensive, clear, and detailed README, providing complete instructions on program usage. |
Please refer to the Style Guide for code quality examples and guidelines.