CS376: Operating Systems - File I/O (100 Points)

Assignment Goals

The goals of this assignment are:
  1. Create and modify files using the C standard I/O interface

Background Reading and References

Please refer to the following readings and examples offering templates to help get you started:

The Assignment

In this assignment, you will open a text file, get its size, malloc a buffer, and read the contents of the file into that buffer.

To read a string from the console, you can use the line fgets(buf, 1024, stdin); from stdio.h, assuming that buf is malloc‘ed to at least 1025 bytes (I am leaving room for a null terminator in case you wish to print the buffer!). Using this, prompt the user to enter a word to search the text file for.

Loop over each line of the file contents by tokenizing it by the newline \n character. For each line, tokenize that line by spaces. Then, loop over this list of tokens and check if the token matches that line. If it does, print the entire line to the screen.

Lastly, create a new file using O_CREAT and write these matching lines to the new text file (so that lines that do not match are skipped). Be sure to open the file with 0666 permissions and with O_RDWR so that you can both read and write to the file.

The filename should be given via argv command line parameters. The filename to read will be in argv[1] and the filename to write will be in argv[2] (recall that the program name is argv[0]). Be careful to check argc before accessing these values. If argc is not at least 3, print an error message to the screen and exit or return a nonzero token value to indicate that an error has occurred.

close the file and free any memory when done.

What to Do

  1. Begin by opening the file given by argv[1]. Use fstat to get the size of the file, and malloc a buffer equal to that size plus 1. This buffer will hold the contents of your file. Set the last index of this buffer to \0 for the null terminator.

  2. Now, read this file contents into your buffer. You’ll read a number of bytes equal to the size of the file, from the file descriptor you got when you called open, and into the buffer you just created.

  3. Next, tokenize the buffer by newline. This will return a char** that represents an array of lines of text in your file. A tokenize function that uses strtok is given below.

  4. Now, loop over that array. My tokenize function accepts a count paramter by reference that will contain the size of the array, so you know how far to loop! You want to check if your argv[2] search string is in this line, so you will want to tokenize each line by space. Do that now, and created a nested loop that iterates over the words that you get back.

  5. Using strcmp, check if each word matches your search string. If it does, print the line, and break out of this inner loop (since you don’t need to print it twice).

  6. Finally, instead of printing to the screen, modify this program to open another file, and write the matching lines to the file.

Helpful Utility Functions

Tokenizing a String

You can use the strtok function to tokenize a string. Here is a function that dynamically allocates (and re-allocates) an array of tokens given an input string, a string of possible delimiters, and an integer representing the number of tokens found (passed as a pointer so that updates are seen in the main function).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char** tokenize(char* input, const char* delimiter, int* tokenCount) {
    char** tokens = NULL;
    int capacity = 10; // Initial capacity for the array of tokens
    *tokenCount = 0; // Initialize the token count

    // Allocate initial memory for tokens
    tokens = (char**)malloc(capacity * sizeof(char*));
    if (tokens == NULL) {
        fprintf(stderr, "Memory allocation failed\n");
        return NULL;
    }

    // Duplicate the input string to avoid modifying the original string
    char* inputDup = strdup(input);
    if (inputDup == NULL) {
        free(tokens);
        fprintf(stderr, "Memory allocation failed\n");
        return NULL;
    }

    // Tokenize the input string
    char* token = strtok(inputDup, delimiter);
    while (token != NULL) {
        if (*tokenCount >= capacity) {
            // Increase capacity
            capacity *= 2;
            char** temp = (char**)realloc(tokens, capacity * sizeof(char*));
            if (temp == NULL) {
                // Cleanup and error handling
                for (int i = 0; i < *tokenCount; ++i) {
                    free(tokens[i]);
                }
                free(tokens);
                free(inputDup);
                fprintf(stderr, "Memory allocation failed\n");
                return NULL;
            }
            tokens = temp;
        }

        // Duplicate token and add to the array
        tokens[*tokenCount] = strdup(token);
        if (tokens[*tokenCount] == NULL) {
            // Cleanup and error handling
            for (int i = 0; i < *tokenCount; ++i) {
                free(tokens[i]);
            }
            free(tokens);
            free(inputDup);
            fprintf(stderr, "Memory allocation failed\n");
            return NULL;
        }

        (*tokenCount)++;
        token = strtok(NULL, delimiter);
    }

    free(inputDup); // Cleanup the duplicated string
    return tokens;
}

int main() {
    char input[] = "This is a test string, with several delimiters.";
    const char* delimiter = " ,."; // Tokenize based on space, comma, and period
    int tokenCount = 0;

    char** tokens = tokenize(input, delimiter, &tokenCount);
    if (tokens != NULL) {
        for (int i = 0; i < tokenCount; i++) {
            printf("%d: %s\n", i + 1, tokens[i]);
            free(tokens[i]); // Free each token
        }
        free(tokens); // Free the array of tokens
    }

    return 0;
}

Comparing Two Strings

You can compare two strings using the strcmp function, which returns 0 when the strings are equal. Here is an example:

#include <stdio.h>
#include <string.h>

int main() {
    // Define pairs of strings to compare
    char* pairs[][2] = {
        {"apple", "apple"}, // Equal strings
        {"apple", "banana"}, // Lexicographically less
        {"banana", "apple"}, // Lexicographically greater
        {"apple", "Apple"}, // Case sensitivity, 'a' > 'A' in ASCII
        {"", ""}, // Both strings are empty
        {"apple", ""}, // Non-empty string compared with an empty string
        {"apple", "apple"} // The strings are equal
    };

    // Calculate the number of pairs
    int numPairs = sizeof(pairs) / sizeof(pairs[0]);

    for (int i = 0; i < numPairs; i++) {
        // Compare the strings
        int result = strcmp(pairs[i][0], pairs[i][1]);

        // Print the result of the comparison
        printf("Comparing \"%s\" and \"%s\": ", pairs[i][0], pairs[i][1]);
        if (result < 0) {
            printf("The first string is less than the second string.\n");
        } else if (result > 0) {
            printf("The first string is greater than the second string.\n");
        } else {
            printf("The strings are equal; result == 0.\n");
        }
    }

    return 0;
}

Trimming Whitespace from a String

Sometimes, tokenizing a string will leave the newline character or a stray space at the end of your string. You can remove these using a loop over the array:

#include <string.h>

void trim(char* s) {
  for(int i = 0; i < strlen(s); i++) { 
      if(isspace(s[i])) { // this is equivalent to checking if s[i] == ' ' || s[i] == '\n' ... (and other blank characters)
          s[i] = '\0';
       }
  }
}

Using the CSAPP Reliable I/O Functions

To deal with short counts, you can use the built-in loops of rio_readn and rio_writen in the csapp.c and csapp.h file. The rio_readn and rio_writen functions are part of the Robust I/O (RIO) package provided by the “Computer Systems: A Programmer’s Perspective” (CS:APP) textbook. These functions are designed to handle the complexities and potential partial read/write issues associated with Unix I/O operations known as short counts. They provide a more reliable way to perform reading and writing operations on files, network sockets, and other I/O channels. They are designed to mirror the standard read and write functions, and work as follows:

  • rio_readn: This function attempts to read n bytes from a file descriptor into a user buffer. It deals with the partial read problem by continuing to read until all n bytes have been read or an end-of-file (EOF) is reached.
  • rio_writen: Conversely, this function writes n bytes from a user buffer to a file descriptor. It addresses the partial write issue by continuing to write until all n bytes have been written.

Using rio_readn

#include "csapp.h"

int main() {
    int fd = open("myfile.txt", O_RDONLY);
    if (fd < 0) {
        perror("open");
        return 1;
    }

    ssize_t n;
    char buf[1024];
    n = rio_readn(fd, buf, sizeof(buf));
    if (n < 0) {
        perror("read");
        return 1;
    }

    printf("Read %zd bytes\n", n);
    close(fd);
    return 0;
}

Using rio_writen

#include "csapp.h"

int main() {
    int fd = open("myfile.txt", O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR);
    if (fd < 0) {
        perror("open");
        return 1;
    }

    const char* buf = "Hello, world!";
    ssize_t n = rio_writen(fd, (void *)buf, strlen(buf));
    if (n != strlen(buf)) {
        perror("write");
        return 1;
    }

    printf("Wrote %zd bytes\n", n);
    close(fd);
    return 0;
}

Compiling the Program

Save the csapp.c and csapp.h files into your project directory and compile them with the following command (assuming your program is called myprogram.c and consists of only that single file otherwise):

gcc -lpthread -o myprogram myprogram.c csapp.c

On some systems, the correct parameter is -pthread instead of -lpthread.

Makefile

Be sure to include a Makefile with your submission that builds and tests your program. If you prefer, you can include a shell script of test cases that you execute via the Makefile.

Submission

In your submission, please include answers to any questions asked on the assignment page in your README file. If you wrote code as part of this assignment, please describe your design, approach, and implementation in your README file as well. Finally, include answers to the following questions:
  • Describe what you did, how you did it, what challenges you encountered, and how you solved them.
  • Please answer any questions found throughout the narrative of this assignment.
  • If collaboration with a buddy was permitted, did you work with a buddy on this assignment? If so, who? If not, do you certify that this submission represents your own original work?
  • Please identify any and all portions of your submission that were not originally written by you (for example, code originally written by your buddy, or anything taken or adapted from a non-classroom resource). It is always OK to use your textbook and instructor notes; however, you are certifying that any portions not designated as coming from an outside person or source are your own original work.
  • Approximately how many hours it took you to finish this assignment (I will not judge you for this at all...I am simply using it to gauge if the assignments are too easy or hard)?
  • Your overall impression of the assignment. Did you love it, hate it, or were you neutral? One word answers are fine, but if you have any suggestions for the future let me know.
  • Using the grading specifications on this page, discuss briefly the grade you would give yourself and why. Discuss each item in the grading specification.
  • Any other concerns that you have. For instance, if you have a bug that you were unable to solve but you made progress, write that here. The more you articulate the problem the more partial credit you will receive (it is fine to leave this blank).

Assignment Rubric

Description Pre-Emerging (< 50%) Beginning (50%) Progressing (85%) Proficient (100%)
Program Correctness (70%) Code does not compile or execute, or produces critical errors. Code compiles and executes but contains significant logic errors, leading to incorrect results. Code compiles, executes, and produces correct results in most cases but may have minor issues. Code is correct, produces accurate results, and handles edge cases effectively.
Makefile and Compilation (15%) No Makefile provided or compilation issues. Basic Makefile provided but with errors or missing targets. Functional Makefile, minor issues with targets or dependencies. Well-structured Makefile, enabling seamless compilation without errors.
Documentation (README) (15%) No README or severely lacking in information. Basic README provided but lacking essential information or clarity. Good README, covers most essential aspects with minor omissions or clarity issues. Comprehensive, clear, and detailed README, providing complete instructions on program usage.

Please refer to the Style Guide for code quality examples and guidelines.