Digital Forensics Project 3: which of these files are lying to you

Update - fixed a typo with the windows example code in the project

Summary: You will write a program which will find all of the computer files in a directory tree where the file types do not match the extension and copy those files to a destination folder for closer inspection


Get python-magic (not filemagic, which appears to have been superseded by python-magic)

Once you are using file or magic,

  1. ask the user for the directory they want to use
  2. search that directory for files that have an extension different from their file types. 
  3. make sure you catch any exceptions that occur during this proceedure. An uncaught exception during grading will result in a significantly lowered grade.
  4. When you find such a file, place it in a folder for suspicious files. (you might want to get this from the user too). Do not worry about files that report as 'data' - it means its a binary file saved from an arbitrary program. Also don't worry about files that report as ASCII text. lots of programming language files will report as ascii text but have a .py or .c or .h or other extension.
  5. a sample file has been uploaded to moodle. it is a zipped folder containing a variety of subfolders and file types (I'll be testing it on a much larger directory tree, but some of you clearly tested your earlier programs on too small a data set.)
  6. Comment your program
    1. with documentation comments and any additional comments that are needed
  7. Add a readme.txt with
    1. your name
    2. which technique you used
    3. anything i need to know to run your program on my machine (like how i will select the folder examine and where i can find the output)
    4. anything you left undone.
  8. zip up your project directory (the directory name should contain enough of your name to be unique in the class so i don't have your files overwriting each other) containing your readme and yourpython program.
  9. Submit on moodle