Friday, July 4, 2014

Autophile: Automatically Sort Files into Folder by Name

I like to mark my student's reports electronically rather than mark paper copies. I type faster than I write and I can copy/paste verbose comments when students do similar things. The major drawback is getting the work back to the students; it's always a painstaking process to manipulate each file into an email or a folder. To reduce that grunt-work I wrote a program that will move files into folders with similar names. I call it Autophile.


The idea is to get your students to put their student ID in the file name so that the software can sort each student's work into their own folder on the network drive. Choose the files you want to sort, select the folders you want to sort them into, adjust settings, and click "Sort"


A confirmation window will pop up telling you where the program thinks each file should go. You can make adjustments or cancel the process. Clicking "Apply Sort" will move the files.

Confirmation Window
The program computes the best match between folder and file by computing something called the Damerau–Levenshtein distance. Basically, it's the number of letters you would have to substitute, remove, insert, or swap to make a portion of the file name look like the folder name.

I was having problems with very short folder names getting picked up as the best match so I modified the result a little bit to favor longer, less accurate matches over shorter accurate matches. I weighted the D-L distance at 100% and the match length at 90% when doing the comparison. For example, I wanted to favor a match that was 5 letters long with one letter wrong over a match that was 3 letters wrong with no errors.

You can download the program here or the source code here.

21 comments:

  1. I am a lawyer in Brazil , and that helped me a lot in the organization of my files , thank you

    ReplyDelete
  2. it is a really great help for me...thank you

    ReplyDelete
  3. Hello, i am a slow learner and usually tough for me to learn new software. Your software by far is the best and easiest to use, this was exactly what I was looking for. You saved me so much time and effort. Thanks a bunch!

    ReplyDelete
  4. Wow... This is rather easy. Thank you for this, sir!

    ReplyDelete
  5. Hello. Really good software. Are you open to taking this one step further ?

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Hello, I recently found your program and decided to give it a try at my company, where it sorts up to 300 documents between a few thousand maps. I was wondering if there is any way to sort my files by their file name, since at the moment it seems to sort them randomly.

    ReplyDelete
    Replies
    1. Hi Hazmat,

      That's what the program is supposed to do. It's supposed to find the best fit between folder name and file name. I'd be interested to see what the Sort Report window tells you. However, I can't promise any help really, I've moved on from this project. The source code is available in the post if there's anyone at your company that wants to pick it up they are welcome to.

      Sorry I can't be more help,
      David

      Delete
  8. Hi David,

    This software is exactly what i need and it is simple too.

    I have one question, is it possible when already a file with the same name is in the folder, that the software automatically renames it into name (2), as Windows does when we copy it manually.

    ReplyDelete
    Replies
    1. Hi Unknown,

      That's a good point, it really should be copying without replacing. I haven't looked at this program in years though. If I get a chance I'll make that update, but I woudn't hold your breath ;)

      In the mean time the source code is available if you want to make an update I have no objections

      Soure: https://drive.google.com/file/d/0B-1eAQYKZo_5bTlMWGVXX2M2Ykk/edit?usp=sharing

      Delete
  9. THANK YOU!!! You saved me!!! -last minute guy

    ReplyDelete
  10. Where on Earth did you learn this? That is such a handy algorithm, and your logic flow seems anything but hobby level. You're working in C# using VS pro 2015? Are you interested in custom work? And have you done any projects since? Lot of questions - you can point me to something or other.. Many thanks for turning me on to this.

    ReplyDelete
    Replies
    1. You're welcome; glad it was helpful.

      The truth is, since you're curious, I learned to code by reading the manual of my Ti-83 Graphics Calculator! I've progressed from total amateur making all the classic mistakes and learning from them bit by bit. I still have a lot more to learn! I've been detailing some of my projects on this blog if you want to browse around.

      I did complete this project in C# using Visual Studio (2015 edition sounds about right). If you're interested in picking up a modern language for Windows programs I think that's a pretty good choice.

      I'm pretty busy with my full-time teaching job but I'm always open to talk about new opportunities. There is a group I organize called Coding for Good. There are several talented students in that group that might be willing to take on a project if it's for a good cause. (http://coding-for-good.s3-website-ap-southeast-1.amazonaws.com/). There's a contact form on that website if you want to drop us a line.

      Cheers,
      David

      Delete
  11. this is a great program! the only thing that would make it better if you can also add a general PDF file that doesn't have their name (e.g. general instruction sheet) into each folder as well.

    ReplyDelete
    Replies
    1. That's a good idea. Thanks for your feedback!

      Delete
  12. hi, it seems to be good software but I havent been able to get it to work. sorting many files causes it to be non responsive. then there is a glitch when choosing the both folders.

    ReplyDelete
    Replies
    1. Hi the galian,

      Good point. I hadn't learned about Threading yet when I built this program so all processes run in only one thread. The software becomes non-responsive because if you have a lot of files or a lot of folders, it takes a long time for the computation to complete. While the computation is running the UI doesn't have access to the CPU to keep it responsive.

      The computation time will grow linearly with the number of files you're trying to sort, but it will also grow geometrically with the number of folders you are sorting into and the length of the names of the files and folders. Unfortunately I don't think there's a way around that for this kind of application. You would have to find a faster way to identify matches between names.

      As for choosing two folders, I don't think the program supports sorting a single file into more than one folder.

      Cheers,
      David

      Delete