Description
1 Introduction
1.1 Getting The Homework Assignment
The starter files for the homework assignment have been distributed through our git repos- itory, as usual.
1.2 Submitting The Homework Assignment
Submissions will be handled through Autolab, at
https://autolab.cs.cmu.edu
In preparation for submission, your hw/07 directory should contain a file named exactly
hw07.pdf containing your written solutions to the homework.
To submit your solutions, run make from the hw/07 directory (that contains a code folder and a file hw07.pdf). This should produce a file hw07.tar, containing the files that should be handed in for this homework assignment. Open the Autolab web site, find the page for this assignment, and submit your hw07.tar file via the “Handin your work” link.
The Autolab handin script does some basic checks on your submission: making sure that the file names are correct; making sure that no files are missing; making sure that your code compiles cleanly. Note that the handin script is not a grading script—a timely submission that passes the handin script will be graded, but will not necessarily receive full credit. You can view the results of the handin script by clicking the number (usually either 0.0 or 1.0) corresponding to the “check” section of your latest handin on the “Handin History” page. If this number is 0.0, your submission failed the check script; if it is 1.0, it passed.
Remember that your written solutions must be submitted in PDF format—we do not accept MS Word files or other formats.
Your hw07.sml file must contain all the code that you want to have graded for this assignment, and must compile cleanly. If you have a function that happens to be named the same as one of the required functions but does not have the required type, it will not be graded.
1.3 Due Date
.
1.4 Methodology
You must use the five step methodology discussed in class for writing functions, for every
function you write in this assignment. Recall the five step methodology:
- In the first line of comments, write the name and type of the function.
- In the second line of comments, specify via a REQUIRES clause any assumptions about the arguments passed to the function.
- In the third line of comments, specify via an ENSURES clause what the function com- putes (what it returns).
- Implement the function.
- Provide testcases, generally in the format
val <return value> = <function> <argument value>. For example, for the factorial function presented in lecture:
(* fact : int -> int
* REQUIRES: n >= 0
* ENSURES: fact(n) ==> n!
*)
fun fact (0 : int) : int = 1
| fact (n : int) : int = n * fact(n-1) (* Tests: *)
val 1 = fact 0
val 720 = fact 6
2 Regular Expressions
In class, we introduced six different constructors to describe regular expressions. We will be extending this definition with two new constructors: Whatever and Both. The extended datatype is given below.
datatype regexp = Zero
| One
| Char of char
| Plus of regexp * regexp
| Times of regexp * regexp
| Star of regexp
| Whatever
| Both of regexp * regexp
The new constructors have the following definitions:
- Whatever is a string wildcard that accepts whatever, that is any finite list of characters:
L (Whatever) = {L | L is a list of characters}.
- Both(R1, R2) accepts a string if and only if it is in both L (R1) and L (R2):
L (Both(R1 , R2)) = {L | L ∈ L (R1) and L ∈ L (R2 )}.
The regular expression matcher match from class is included in the support code for the assignment. We have extended the datatype definition of regexp to include the new constructors Whatever and Both. Your job is to extend match to deal with these new constructors, and prove the correctness of your implementation. We define correctness as follows:
Theorem 1 (Correctness). For R : regexp, let P (R) be the following statement: For all values L : char list and total functions p : char list → bool,
- match R L p = true if there exist L1, L2 : char list such that L = L1@L2, L1 ∈ L (R), and p(L2) = true,
- match R L p = false otherwise.
Then, match is correct if for all R : regexp, P(R) holds.
In each of the following coding tasks, we strongly recommend that you think through the correctness spec when you are writing the code. If you’re stuck on the implementation, try doing the proof first—this will guide you to the answer.
Task 2.1 (8%). Implement the case of match for Whatever, the string wildcard.
Task 2.2 (10%). Implement the case of match for Both(R1 , R2).
Task 2.3 (7%). We give part (b) of the correctness proof of Times(R1, R2) below, with certain parts left out. Fill in the seven sections of blanks with the appropriate statements. Note that some of the sections have more than one blank, and some of the sections are used more than once. Each numbered section is worth 1 point.
Proof. Assume L, p such that match (Times(R1, R2)) L p = true. We need to show that
∃L1, L2 such that L1@L2 = L with L1 ∈ L (Times(R1 , R2)) and p (L2) = true.
Inductive Hypotheses Assume P (R1) and P (R2), i.e.
- ∀L1 : char list, p1 : char list → bool s.t. p1 total, if match R1 L1 p1 = true,
then ∃L11, L21 s.t. L11 @L21 = L1, L11 ∈ L (R1 ), p1 L21 = true
- ∀L2 : char list, p2 : char list → bool s.t. p2 total, if match R2 L2 p2 = true,
then ∃L12, L22 s.t.L12 @L22 = L2, L12 ∈ L (R2 ), p2 L22 = true.
By stepping:
match (Times(R1, R2 )) L p
= case Times(R1,R2 ) of … | Times(R1,R2) => … | …
= (1)
By assumption, match (Times(R1, R2)) L p = true. So by transitivity
(i) (1) = true
Thus, by (2) , taking p1 to be
(fn L’ => match R2 L’ p),
L1 to be L, and using (i) to satisfy the premise, we know that ∃L11 , L21 such that (3) ,
, and .
By stepping
(fn L’ => match R2 L’ p)L21
= match R2 L21 p
Thus, by transitivity,
(ii) match R2 L21 p = true
By (4) , taking p2 to be p and L2 to be L21 and using (ii) to satisfy the premise, we know that ∃L12 , L22 such that (5) , , and
.
Since L11 ∈ L (R1 ) and L12 ∈ L (R2 ), (6) ∈ L (Times(R1, R2 )) by (7) . Now, take L1 to be (6) and L2 to be L22 . Then, L1 = (6) ∈
L (Times(R1, R2 )), and L1@L2 = (6) @L22 = L11@L21 = L, and p L2 =
p L22 =true. So, the theorem holds for this case.
Task 2.4 (15%). Do part (b) of the correctness proof of Both(R1, R2). Your proof should follow the format of the proof given above. You should assume P (R1 ), P (R2), and that match is total. Then, proving (b) is the same as proving the following theorem:
For all values L : char list and total functions p : char list → bool,
If match (Both(R1, R2 )) L p = true, then there exist L1 , L2 such that L =
L1 @L2, L1 ∈ L (Both(R1 , R2)), and p(L2) = true.
Do this proof carefully! There is a plausible-looking, but incorrect, implemen- tation of Both; this case of the proof will fail if your code has this bug.
3 Irregular Expressions
Turns out, our implementation of a regular expression matcher can recognize more than just regular languages1. In this section, we will explore this idea further.
Task 3.1 (10%). Write the function halfmatch which takes two regular expressions, R1 and R2, a character list, L, and evaluates to true if and only if there exist L1, L2 : char list such that:
- L = L1 @ L2
- length L1 = length L2
- L1 ∈ L (R1) and L2 ∈ L (R2).
For this task, we want to use the power of the regular expression system we have in place; you should define your answer in terms of regular expressions and match. Your solution should not be recursive, and you should not define any helper functions. Do not use list functions apart from length.
Task 3.2 (0%). Now, let irregular = halfmatch (Star(Char #“0”)) (Star(Char #“1”)). Then, L (irregular) = {0n1n | n ∈ N}. This language turns out to be irregular. (You may prove this fact in higher level courses like 15-251 or 15-453.)
1 It is actually Turing complete.