mysql - Optimizing php command line scripts to process large flat files -
For the downvoyee fairies .. I know that php is the wrong language for this ... but I I am working. Given that:
I have a large flat file that I need to process in PHP. I convert flat files to mysql in a normalized database. There are several million lines in the flat file.
I originally tried to use an ORM system while importing the flat file, with a great php memory leak problem with the design, with the design being carefully cured. Even if I make sure that there was enough memory, it could take around 25 days for the script to run on my desktop.
I stripped the overhead and created mysql commands directly to write the script again. I removed the auto information from my design because after that I was essentially recorded as a MyScall, which was the last ID entered to create a connection between the data points. I use a global counter for the database ID instead, and I never do any lookup, just inspect it.
I use the Unix Split command to create a lot of smaller files instead of one big one, because one memory is overhead associated with repeatedly using a file pointer
Using these optimizations (hopefully they help someone else) I found the import script to run in about 6 hours.
I hired a virtual example with 5 times more RAM and 5 times more processor power compared to my desktop and it went at exactly the same speed. The server runs the process but the CPU cycle and RAM are skipped. Maybe the limiting factor is disk speed but I have a lot of RAM, should I try to load memory in some way in the files? Any suggestions for more optimization of PSP command line scripts are processed in large files!
You will not like it but ... it sounds like you have a wrong language Are using If you want to make some big leap in motion then for a compiled language, a port will go to the next stage Compiled languages run much more than the scripting language, so that you will leave your processing time.
In addition, you can dump the data into DB using a command build. A postgraze in one (the dump was loaded? Something like that) will read into the delimited text file in the tab, which corresponds to the column in the column. This will give you the attention of getting a text file in the correct format and then it should spit in DB with an order and it will have to handle the optimization instead of itself.
You have done the right thing to talk to the ORM head on the head, you should not need to split the file because your text file reader should use the buffer internally, so it should not be "wanted" Should, but I'm not a * nix man, so it can be wrong on the front.
We have done something similar to every one by placing the Regex through 20 GB files every morning, maintains the memory hash for the unique record and then puts the new ones in the DB. After this, we spit up 9000+ JS files (this is the slowest part) using Ruby Script for ease. We were importer imports in Ruby as well and the whole thing was 3+ hours, rewriting .net runs the whole process in approximately 30-40 minutes and 20 of them are slow ruby script (it is not conducive that it is no longer favorable Not even though this works well).
Comments
Post a Comment