Announcements

  • This checklist page will now revert back to its original intent - to list the things you need to turn in for the assignment. This way, we won't ruin the experience of going through the thought process of writing the program. We'll continue to post any changes to the assignment here. We'll also put reference solutions here, so you can check your work. As a result, if you don't need any help with the assignment, you can just look here and quickly get all the information you need.

  • We'll continue to provides hints on the assignment for students that wish to use them. Your readme file should inidicate whether or not you used the hints. However, no points will be deducted for using the hints.

  • Assignment Goals

  • Solve a pattern matching problem that arises in computational biology. You might find the following demo helpful if you don't understand the statement of the problem.

  • Learn to use strings.

  • More practice with arrays.

  • Sample Inputs and Reference Solutions

  • Various test protein and genetic input files are located at /u/cs126/files/gene/. You may wish to use these to test your code.

  • The solution for the example data in prot.1 and gene.1 is gene.1.ans.

  • The solution for the example data in prot.3 and gene.3 is gene.3.ans.

  • The solution for the example data with prot.3 and gene.2 is "NOT FOUND". Your program should behave properly even if there is no match.

  • Submission and readme
  • Use the following submit command:
    /u/cs126/bin/submit 7 readme gene.c
    

  • To get the full 10 points, your code should avoid hardwiring unnecessary constants, e.g., 3, 4, and 64. If you did, modify your code so that it relies only on the #define'd values. If you're struggling and have spent too much time already, there's no shame in conceding 1 point.

    Rationale: you wouldn't hardwire mathematical constants like PI = 3.14..., would you? Even though these are biological constants, you may still want to reuse the code for other purposes. In other applications, you might want to extend the functions code and decode to index arrays with longer strings over arbitrary alphabets, instead of just strings of length 3 over the genetic alphabet.

  • The readme file should contain:

  • Name, precept number, high level description of code, any problems encountered, and whatever help (if any) your received.

  • Describe how you implemented code() and decode().

  • Enrichment Links

  • The genetic data is actually cDNA (the coding region of DNA) not DNA; the mapping will be similar to RNA with t replaced by u if you wish to compare with your biology textbook, or the following amino acid table borrowed from EBB 320.

  • The genetic data is taken from the National Center for Biotechnology.



  • Kevin Wayne