Walkthrough: Input Validator Lab¶

This guide walks through the thinking process for building this project. It does NOT give you the complete solution. For that, see SOLUTION.md.

Before reading this¶

Try the project yourself first. Spend at least 20 minutes. If you have not tried yet, close this file and open the project README.

Understanding the problem¶

You need to build a system that validates different types of input: email addresses, phone numbers, and zip codes. The input comes from a file where each line has the format type: value (e.g., email: user@example.com). Your program reads each line, figures out what type of validation to apply, runs the right validator, and reports whether the input passed or failed with a clear reason.

The sample input file looks like this:

email: user@example.com
email: bad-email-no-at
phone: 555-123-4567
zip: 90210

Planning before code¶

flowchart TD
    A[Read lines from file] --> B[For each line]
    B --> C[Parse 'type: value']
    C --> D{Which type?}
    D -->|email| E[validate_email]
    D -->|phone| F[validate_phone]
    D -->|zip| G[validate_zip_code]
    D -->|unknown| H[Return error]
    E --> I[Collect result: PASS or FAIL + reasons]
    F --> I
    G --> I
    H --> I
    I --> J[Print summary + save JSON]

Four layers to build:

Individual validators -- one function per input type, each returning a structured result
A dispatcher -- parse the line, route to the correct validator
File processing -- read lines, skip blanks, collect results
Output -- print a table and save JSON

Step 1: Email validation¶

Start with the most complex validator. An email needs: - Exactly one @ symbol - Text before the @ - A dot in the domain part (after @) - No spaces

def validate_email(email: str) -> dict:
    email = email.strip()
    errors = []

    if " " in email:
        errors.append("contains spaces")
    if email.count("@") != 1:
        errors.append("must contain exactly one @")
    elif "@" in email:
        local, domain = email.split("@")
        if not local:
            errors.append("nothing before @")
        if not domain or "." not in domain:
            errors.append("domain must contain a dot")

    return {"value": email, "type": "email", "valid": len(errors) == 0, "errors": errors}

The pattern here is accumulate errors: start with an empty list, add each problem you find, then check if the list is empty at the end. This is better than returning after the first error because it tells the user everything that is wrong at once.

Predict before you scroll¶

What does validate_email("user@example.com") return? What about validate_email("@example.com")? Trace through the conditions.

Step 2: Phone validation¶

Phone validation is simpler: extract only the digits, then check that there are exactly 10.

def validate_phone(phone: str) -> dict:
    phone = phone.strip()
    digits = ""
    for char in phone:
        if char.isdigit():
            digits += char

    errors = []
    if len(digits) != 10:
        errors.append(f"expected 10 digits, got {len(digits)}")

    return {"value": phone, "type": "phone", "valid": len(errors) == 0, "errors": errors}

By stripping out everything except digits, this function accepts multiple formats: 555-123-4567, (555) 123-4567, and 5551234567 all produce the same 10 digits.

Predict before you scroll¶

What happens if the user enters 12345? How many digits does the function extract, and what error does it report?

Step 3: Zip code validation with regex¶

Zip codes have a strict format: exactly 5 digits, optionally followed by a dash and 4 more digits. This is a good case for a regular expression:

import re

def validate_zip_code(zipcode: str) -> dict:
    zipcode = zipcode.strip()
    errors = []

    pattern = r"^\d{5}(-\d{4})?$"
    if not re.match(pattern, zipcode):
        errors.append("must be 5 digits or 5+4 format (12345-6789)")

    return {"value": zipcode, "type": "zip", "valid": len(errors) == 0, "errors": errors}

Breaking down the regex r"^\d{5}(-\d{4})?$": - ^ -- start of string - \d{5} -- exactly 5 digits - (-\d{4})? -- optionally: a dash followed by 4 digits - $ -- end of string

Step 4: The dispatcher¶

The dispatcher parses the type: value format and routes to the right validator:

def validate_input(line: str) -> dict:
    if ":" not in line:
        return {"raw": line.strip(), "error": "Expected format: type: value"}

    input_type, value = line.split(":", maxsplit=1)
    input_type = input_type.strip().lower()
    value = value.strip()

    validators = {
        "email": validate_email,
        "phone": validate_phone,
        "zip": validate_zip_code,
    }

    if input_type not in validators:
        return {"raw": line.strip(), "error": f"Unknown type: {input_type}"}

    return validators[input_type](value)

Notice the validators dictionary maps type names to functions. This is the dispatch table pattern -- it is cleaner than a long if/elif chain and easier to extend. To add a new type, you just add one entry to the dictionary.

Also notice maxsplit=1 in the .split() call. This ensures we only split on the first colon, so a value like time: 12:30:00 would not break.

Common mistakes¶

Mistake	Why it happens	How to fix
Email validator accepts `user@domain` (no dot)	Forgot to check for a dot in the domain part	Add `"." not in domain` check
Phone validator rejects `(555) 123-4567`	Checking format instead of just extracting digits	Strip all non-digits first, then count
`line.split(":")` breaks on values containing `:`	Default split splits on every colon	Use `split(":", maxsplit=1)`
Returning just `True/False` instead of a dict	Simpler but less useful	Return `{"valid": bool, "errors": [...]}` so the caller knows WHY it failed

Testing your solution¶

Run the tests from the project directory:

pytest -q

The seven tests check: - Valid email passes - Email without @ fails - Valid phone number (with dashes) passes - Short phone number fails - Both zip formats (90210 and 90210-1234) pass - Short zip fails - The dispatcher routes email: test@test.com to the email validator

What to explore next¶

Add a new validation type: "url" that checks for http:// or https:// prefix and a dot in the domain
Add a --strict flag that rejects emails where the TLD (the part after the last dot) is fewer than 2 characters