1.

Solve : Bandwidth Question USB 2.0 & Multiple Com Streams to External Drive?

Answer»

So everyone knows that USB 2.0 is maximum of 480 M bits/s. ( Which converting to bytes is 60 MB/s. )

Looked online for my internal DVD ROM which shows figures in Mega Bytes/s of 22.16 MB/s when reading a DVD Disc at 16x speed and CD's at 7.2MB/s at 48x. As referenced here: http://www.newegg.com/Product/Product.aspx?Item=N82E16827136259R

I have 2 of these DVD-RW drives in my one system in which if both were reading data off of DVD Disc, the maximum bandwidth would be 44.32 MB/s combined leaving 15.68MB/s open.

Now the question is, that while there appears to be plenty of bandwidth for the data stream to the 3TB External Hard drive over USB 2.0 with nothing else on the USB 2.0 bus except for an optical mouse, how much of the bandwidth is used by Windows 7 to process the handshake and create the files/folders and verify they are created at the destination since it is not just a constant stream of data that is unmanaged.

*Reason why i am asking this is because I have a 3TB external hard drive that I am just starting the long process of going through stacks of backups to place all the data onto this drive as an archive to find data quickly when I am searching for someting from say 2006. Currently its a process of trying to remember when the data may have been used and then search the backup discs from that time period to find that one file or project. Its a big mess!

I wrote a program in C++ that can process the Discs and place their data on this external drive using the XCOPY routine and it increments the folder names and keeps track of the last one processed by writing the info to a data file. It creates folders at E:\> such as Disc1, Disc2, Disc3 etc and works well, however it took almost 3 hours to transfer 11 DVD's and I have hundreds of them to transfer to this drive at a maximum of 4.7GB per DVD most 4.5GB and smaller per Disc as well as many CD-R's with data backed up from before the DVD-R backup era.

So if both drives can transfer to this external at the same time, I will edit my source code to have an even and an odd version of this program and so the Even Version would create folders named Disc2, Disc4, Disc6... and the Odd Version would create folders Disc1, Disc3, Disc5... and this way data transfer will not step ontop of the other directory structure and data is kept seperate.

However I have concerns over whether I may not really have the bandwidth that I think I have doing the math because of the unknown of what Windows 7 uses to manage the creation of the data structure at the destination, and I am not sure if this could be problematic having two xcopy data streams to the single external at the same time in which the USB/SATA controller could be overworked and cause issues with data coming from more than one source?

** I was thinking about cracking open the external to installing this 3TB drive internally in which bandwidth concerns would be gone over SATA II communications, however I dont want to void the warranty on this brand new drive and want it to be portable vs mounted in a computer tower.

This is the external that I went with for $99.99 labor day deal: http://www.newegg.com/Product/Product.aspx?Item=N82E16822152408You are really taking on a big project.
Do you know about external SATA? Why not use it.
http://www.bing.com/images/search?q=extrernal+sata+drive+kit&qpvt=extrernal+sata+drive+kit&FORM=IGRE

Or even eSATAp

Also, maybe you need to read up on database management.
You don't have USB 3.0, right? In regards to Geek-9pm's
Quote

You are really taking on a big project.
Do you know about external SATA? Why not use it.
http://www.bing.com/images/search?q=extrernal+sata+drive+kit&qpvt=extrernal+sata+drive+kit&FORM=IGRE

Or even eSATAp

Also, maybe you need to read up on database management.

I dont want to void the warranty of the external by moving the drive into an eSATA enclosure as well as the drive I have the 3TB is all I have to work with right now since money is tight because my wife works for school system and just finally started working again. The $100 price tag for the external 3TB drive I couldnt turn down.

Also confused on the database management statement... so LOOKING for clarification on that. I know that a database could be created to index all the files data, if that is what you were referring to, however I just use Windows Search Function for finding data that I need, so that I am not reinventing the data index and search wheel.

And Soybeans suggestion for using/upgrading to USB 3.0:

Dont have any systems with USB 3.0 yet, but I see your point in far more bandwidth with 3.0 vs 2.0. Also USB 3.0 cards require an available PCIe slot and I might have a 1x slot available in my new computer build that is waiting for the AMD FX 8350 CPU, but all other systems are to old to support a USB 3.0 upgrade since it needs the PCI Express bandwidth to get the 3.0 speed.

More info on last nights testing and past history of data mess:

Last night I compiled my source into an Even and Odd version with a counter that stepped by +2, and created a file for each one that records the next counter available to be used so that if you stop the program and start it back up it will pick up where it left off with the destination folder name of DiscX where X is an even or odd interval integer depending on which version creates the folder on the external and I tested it out and it seems to be working ok. Although I haven't timed it yet with a large file transfer from one DVD ROM to the External and 2 DVD Roms at the same time to see if I can measure any lag in running 2 at a time. I figured that since the data transfer should be transactional, that it should be fine to send data to this external drive from 2 sources at the same time, with the only risk of the transfer slowing on one of the drives if there is conjestion caused by traffic of the second drive.

So far I processed about 70GB in data from these discs, most of them the older data CD's from the past. The next sequence after everything is on this external and a copy is made to my 1.5TB of this project, as a mirrored copy, will be to process all this data and find redundancy and remove it. For example I found multiples of data burned to disc because I backed up all data from week to week or month to month to CD's and DVD's depending on the time period and each time the backup was run to burn the disc another REDUNDANT copy was created of the same data that is on about 6 other discs, and some of which the location of the data moved from say a folder named MyMusic to the same data later being placed into subdirectories to be used as playlists etc in which new music was added as converted from my audio CD's to MP3's and so its a big mess.

A long time ago when burning the data backups I figured some day I was in for a mess to clean up and I put it off. CD's and DVD's cheap to burn, I just burned them at each backup process for the last 13 years or so, and now I have a big mess to clean up since I dont want to toss away my data and I dislike that when I am looking for old data its somewhere, on some disc, and I have to guess the time period it was created or used and sometimes even which computer name contained the data when for the last 25 years I have had multiple computers at the same time, and no central file server for them all to use to keep everything well managed.

Back in 2002 I went through a process to get rid of hundreds of 5.25" and 3.5" floppy disks that took up space and went through a similar process of getting all the data extracted off the floppies using multiple computers and manually creating folders at these computers with sub directories of 1, 2, 3, 4, 5, etc for primary directory names of PC1, PC2, PC3, and PC4 where each computer was PC1 or another named primary directory so that in the end when all disks were completed, I used these computers connected to the LAN to copy all the data to the one more modern computer that had a CD-RW drive and was able to burn about 300+ floppies to a single CD and made 3 copies of this CD in case any of them ever got lost of damaged. In the data I processed last night I found this CD and so all the floppy disks that I had from the 1980s all the way on forward to 2002 just became part of this soon to be massive storage of data accessible on a single drive, which is the same as how the single CD-R was 11 years ago. The only floppies I kept were the ones that were software that I had and OS which I wanted to retain the original floppies, and now I have less than 30 floppies. At some point I will probably take these discs and destroy them and dispose of them, the same as I did with the hundreds of floppies. Not sure if this plastic can be recycled or not vs going to landfill so I will ask the local recycle center to try to be green about it vs heavy trash bags full of them into a landfill.

Also to add to the redundancy is the fact that back in 2008 to CUT down on space used to retain my data, I started merging CD's into DVD's with the DVD-RW drive so that ( 1 ) DVD-R would take the place of ( 6 ) CD-R's. This project I processed some CD's and stopped short of completion because the 6 to 1 compression wasnt that much savings and I was going to hold out for Blu-Ray to burn a 25 to 1 compression to get rid of 25 CD-R's and replace that with 1 Blu-Ray disc as well as could squeeze 5 DVD-R's onto 1 BD-R Disc, but Blu-Ray Burners have only recently dropped in price and the BD-R Discs are still expensive at around $1.00 per Disc for 25GB of storage + cost of the Blu-Ray Burner. With the 3TB external drive for $100, storing 1TB of data for $33.33 gave me the push to merge all this data to a single location, same as what I did back in 2002 with the floppies.

This cleanup I am hoping I will be able to do with some sort of tool to find like copies of data and delete them, so that in the end I have an archive drive that shrinks in size due to the removal of redundancy that is not needed on this drive because I still have the original disc as my redundant copy as well as the mirrored copy I will make before I mess around with any redundant data deletion tools so that I dont destroy the archive that is going to take days to create. Once I know that the data is thinned out to single instances of files with same date/time stamp and file names, then I will replicate this data to my 1.5TB and have one drive to work with and the other in a firesafe.

Question regarding redundant file removal solution:

Anyone have any suggestions of a BATCH routine or software to use to clean up redundant data? Maybe I should open a new thread for this suggestion request, but figured I'd keep it here since its related to this project.


Discussion

No Comment Found