Discussion:
Speeding up file deletion
(too old to reply)
ThomasMc07
2009-06-06 12:47:08 UTC
Permalink
Hello,

I periodically need to delete 10 million small files from a single NTSF
directory on a hardware supported RAID 5 array on Windows Server 2003.

I have tried the normal delete *.* console command as well as running the
Win32 ::DeleteFile function on multiple threads. These work, but the task
takes one to two weeks to complete, which is way too long. I fastest I can
get it to run is on multiple threads is about 90,000 deletions per hour, or
about one every 25 ms.

What is happening under the covers during these deletions? Is there a way to
speed them up? Is there a way to configure NTFS for faster deletes? Would as
specialized filter driver help? Is there another forum more specific to NTFS
internals?

Assistence very much appreciated.

Thomas McLeod
Pegasus [MVP]
2009-06-06 14:12:10 UTC
Permalink
Post by ThomasMc07
Hello,
I periodically need to delete 10 million small files from a single NTSF
directory on a hardware supported RAID 5 array on Windows Server 2003.
I have tried the normal delete *.* console command as well as running the
Win32 ::DeleteFile function on multiple threads. These work, but the task
takes one to two weeks to complete, which is way too long. I fastest I can
get it to run is on multiple threads is about 90,000 deletions per hour, or
about one every 25 ms.
What is happening under the covers during these deletions? Is there a way to
speed them up? Is there a way to configure NTFS for faster deletes? Would as
specialized filter driver help? Is there another forum more specific to NTFS
internals?
Assistence very much appreciated.
Thomas McLeod
The problem most likely lies somewhere else. File access times increase
exponentially when you keep more than around 10,000 files in any one folder.
As an example: A moment ago I deleted 20,000 files from my humble IDE disk
inside a 5-year old laptop. Here are the figures:

Your RAID5 array: 800 seconds (based on your figures)
My IDE disk: 32 seconds (using del /q *.*)
My IDE disk: 25 seconds (using rd /s /q)

In other words, my old clapped out laptop does the job about 30 times faster
than your RAID5 Server 2003. You should see a huge improvement if you cut
down on the number of files per folder.
ThomasMc07
2009-06-06 16:47:01 UTC
Permalink
Unfortunitely, I do not have code for the app generating the files and I
cannot configure it to use multiple directories.

I was interested to know if there was some low level (possibly off-line)
coding that could run the deletes.

Thomas
Post by Pegasus [MVP]
Post by ThomasMc07
Hello,
I periodically need to delete 10 million small files from a single NTSF
directory on a hardware supported RAID 5 array on Windows Server 2003.
I have tried the normal delete *.* console command as well as running the
Win32 ::DeleteFile function on multiple threads. These work, but the task
takes one to two weeks to complete, which is way too long. I fastest I can
get it to run is on multiple threads is about 90,000 deletions per hour, or
about one every 25 ms.
What is happening under the covers during these deletions? Is there a way to
speed them up? Is there a way to configure NTFS for faster deletes? Would as
specialized filter driver help? Is there another forum more specific to NTFS
internals?
Assistence very much appreciated.
Thomas McLeod
The problem most likely lies somewhere else. File access times increase
exponentially when you keep more than around 10,000 files in any one folder.
As an example: A moment ago I deleted 20,000 files from my humble IDE disk
Your RAID5 array: 800 seconds (based on your figures)
My IDE disk: 32 seconds (using del /q *.*)
My IDE disk: 25 seconds (using rd /s /q)
In other words, my old clapped out laptop does the job about 30 times faster
than your RAID5 Server 2003. You should see a huge improvement if you cut
down on the number of files per folder.
Pegasus [MVP]
2009-06-06 17:09:45 UTC
Permalink
Sometimes, when you cannot solve a problem, you may be able to walk around
it. In your case it may be possible to move some files into a different
folder when their number exceeds a certain count. You would need to post
more details about the mechanism that populates your folder and what type of
access is required to those numerous files.

I do not think that there is a low-level "delete" method other than
formatting the partition. Remember that such a method would need to fully
comply with the file system (NTFS) design. If it does not then you're like
to suffer from file system corruption on a massive scale.
Post by ThomasMc07
Unfortunitely, I do not have code for the app generating the files and I
cannot configure it to use multiple directories.
I was interested to know if there was some low level (possibly off-line)
coding that could run the deletes.
Thomas
Post by Pegasus [MVP]
Post by ThomasMc07
Hello,
I periodically need to delete 10 million small files from a single NTSF
directory on a hardware supported RAID 5 array on Windows Server 2003.
I have tried the normal delete *.* console command as well as running the
Win32 ::DeleteFile function on multiple threads. These work, but the task
takes one to two weeks to complete, which is way too long. I fastest I can
get it to run is on multiple threads is about 90,000 deletions per
hour,
or
about one every 25 ms.
What is happening under the covers during these deletions? Is there a
way
to
speed them up? Is there a way to configure NTFS for faster deletes?
Would
as
specialized filter driver help? Is there another forum more specific to NTFS
internals?
Assistence very much appreciated.
Thomas McLeod
The problem most likely lies somewhere else. File access times increase
exponentially when you keep more than around 10,000 files in any one folder.
As an example: A moment ago I deleted 20,000 files from my humble IDE disk
Your RAID5 array: 800 seconds (based on your figures)
My IDE disk: 32 seconds (using del /q *.*)
My IDE disk: 25 seconds (using rd /s /q)
In other words, my old clapped out laptop does the job about 30 times faster
than your RAID5 Server 2003. You should see a huge improvement if you cut
down on the number of files per folder.
Pegasus [MVP]
2009-06-07 16:21:18 UTC
Permalink
Thanks for taking the time to respond. I this particular case I was asked
to
research an explanation and report back to the stakeholder. This is one
issue going into a larger decisionn of whether to 1) do more development
work, 2) split the workload across multiple machines, or 3) use a more
robust
machine and/or drive array.
I believe that File System Filter Drivers do have access to interfaces
lower
than Win32, I'm just not an expert in this area and was looking for a
forum.
See
http://www.microsoft.com/whdc/driver/filterdrv/default.mspx
Thanks again.
Thomas
Sorry, I know nothing at all about Filter Drivers.

I think I gave you an answer that you can pass back to the stakeholder:
Deleting files takes far too long because the stakeholder keeps a grossly
excessive number of files in his folder(s).

I also suggested a possible work-around: Limit the number of files per
folder to around 20,000. This could be achieved with some scheduled task but
without much more detail information it is not possible to be more specific.
ThomasMc07
2009-06-07 15:58:01 UTC
Permalink
Thanks for taking the time to respond. I this particular case I was asked to
research an explanation and report back to the stakeholder. This is one
issue going into a larger decisionn of whether to 1) do more development
work, 2) split the workload across multiple machines, or 3) use a more robust
machine and/or drive array.

I believe that File System Filter Drivers do have access to interfaces lower
than Win32, I'm just not an expert in this area and was looking for a forum.
See

http://www.microsoft.com/whdc/driver/filterdrv/default.mspx

Thanks again.

Thomas
Post by Pegasus [MVP]
Sometimes, when you cannot solve a problem, you may be able to walk around
it. In your case it may be possible to move some files into a different
folder when their number exceeds a certain count. You would need to post
more details about the mechanism that populates your folder and what type of
access is required to those numerous files.
I do not think that there is a low-level "delete" method other than
formatting the partition. Remember that such a method would need to fully
comply with the file system (NTFS) design. If it does not then you're like
to suffer from file system corruption on a massive scale.
Post by ThomasMc07
Unfortunitely, I do not have code for the app generating the files and I
cannot configure it to use multiple directories.
I was interested to know if there was some low level (possibly off-line)
coding that could run the deletes.
Thomas
Post by Pegasus [MVP]
Post by ThomasMc07
Hello,
I periodically need to delete 10 million small files from a single NTSF
directory on a hardware supported RAID 5 array on Windows Server 2003.
I have tried the normal delete *.* console command as well as running the
Win32 ::DeleteFile function on multiple threads. These work, but the task
takes one to two weeks to complete, which is way too long. I fastest I can
get it to run is on multiple threads is about 90,000 deletions per
hour,
or
about one every 25 ms.
What is happening under the covers during these deletions? Is there a
way
to
speed them up? Is there a way to configure NTFS for faster deletes?
Would
as
specialized filter driver help? Is there another forum more specific to NTFS
internals?
Assistence very much appreciated.
Thomas McLeod
The problem most likely lies somewhere else. File access times increase
exponentially when you keep more than around 10,000 files in any one folder.
As an example: A moment ago I deleted 20,000 files from my humble IDE disk
Your RAID5 array: 800 seconds (based on your figures)
My IDE disk: 32 seconds (using del /q *.*)
My IDE disk: 25 seconds (using rd /s /q)
In other words, my old clapped out laptop does the job about 30 times faster
than your RAID5 Server 2003. You should see a huge improvement if you cut
down on the number of files per folder.
Agent Sylar - First Class Hero
2010-01-06 14:45:01 UTC
Permalink
Hi
Is this problem resolved already?
--
mind has no limit
Post by ThomasMc07
Hello,
I periodically need to delete 10 million small files from a single NTSF
directory on a hardware supported RAID 5 array on Windows Server 2003.
I have tried the normal delete *.* console command as well as running the
Win32 ::DeleteFile function on multiple threads. These work, but the task
takes one to two weeks to complete, which is way too long. I fastest I can
get it to run is on multiple threads is about 90,000 deletions per hour, or
about one every 25 ms.
What is happening under the covers during these deletions? Is there a way to
speed them up? Is there a way to configure NTFS for faster deletes? Would as
specialized filter driver help? Is there another forum more specific to NTFS
internals?
Assistence very much appreciated.
Thomas McLeod
Loading...