![]() |
![]() |
||||||
|
|||||||
| Tags: |
![]() |
|
|||
|
I?ve got a script that reads a large CSV spreadsheet and parses the data into a
list of the form [[A1,B1,C1], [A2,B2,C2], [A3,B3,C3]] and a second list of the form [#A1:B1,#A2:B2,#A3:B3] etc? The actual spreadsheet is about 10 columns x 10,000 rows. Reading the file string goes fast enough, the parsing starts off fast but slows to a crawl after about 500 rows (I put the row count on the stage to check progress). Does anyone know if the getaProp, addProp, and append methods are sensitive to the size of the list? A sample of one of the parsing loops is below. I?m aware all interactivity will stop as this is executed. This script is strictly for internal use, it crunches the numbers in two spreadsheets and merges the results to a new CSV file. The program is intended to run overnight and the new file harvested in the morning. writeLine("File 2 Data Parsing" & RETURN) myOrderColumn = myHeaders2.getOne("OrderNum") myChargesColumn = myHeaders2.getOne("Cost") myFile2data = [] mergedFedExCharges = [:] repeat with rowCount = 2 to file2string.line.count myLineData = [] repeat with i = 1 to file2string.line[rowCount].item.count myItem = file2string.line[rowCount].item[i] if i = 1 then myItem = chars(myItem,2,myItem.length) end if myLineData.append(myItem) end repeat if myLineData.count = myHeaders2.count then myFile2data.append(myLineData) myOrderSymbol = symbol("s" & myLineData[myOrderColumn]) myCurrentValue = getaProp(mergedFedExCharges,myOrderSymbol) if voidP(myCurrentValue) then mergedFedExCharges.addProp(myOrderSymbol,0.00) end if mergedFedExCharges[myOrderSymbol] = mergedFedExCharges[myOrderSymbol] + value(myLineData[myChargesColumn]) writeUpdate(myLineData[1]) else writeError("file 2 : " & string(myLineData) & RETURN) end if end repeat |
| Sponsored Links |
|
|||
|
> Does anyone know if the getaProp, addProp, and
> append methods are sensitive to the size of the list? Is this a trick question? Sure they are. All of them. Addprop and append are quite fast (due to the list object scalable preallocating memory as required), so i doubt that they are the cause of the problem. GetAProp will search each item in the list, therefore, if you are searching for the last item, or if the item is not in the list, the more the items, the slower the command. Didn't go through all your code but I noticed - this: repeat with rowCount = 2 to file2string.line.count Big no-no! Line counting is a very slow operation for it to be a evaluated in a loop. - and this: myFile2data.append(myLineData) String operations like this require memory reallocation, and therefore are very slow. If you do conclude that such an operation causes the problem, consider using a preallocated buffer (create a big string in advance) and then use mydata.char.[currentoffset..(currentoffset+newstr.length)]=newstr This can make code run even hundreds times faster, compared to the append method. Applied CD wrote: > I?ve got a script that reads a large CSV spreadsheet and parses the data into a > list of the form [[A1,B1,C1], [A2,B2,C2], [A3,B3,C3]] and a second list of the > form [#A1:B1,#A2:B2,#A3:B3] etc? The actual spreadsheet is about 10 columns x > 10,000 rows. Reading the file string goes fast enough, the parsing starts off > fast but slows to a crawl after about 500 rows (I put the row count on the > stage to check progress). Does anyone know if the getaProp, addProp, and > append methods are sensitive to the size of the list? > > A sample of one of the parsing loops is below. I?m aware all interactivity > will stop as this is executed. This script is strictly for internal use, it > crunches the numbers in two spreadsheets and merges the results to a new CSV > file. The program is intended to run overnight and the new file harvested in > the morning. > > writeLine("File 2 Data Parsing" & RETURN) > myOrderColumn = myHeaders2.getOne("OrderNum") > myChargesColumn = myHeaders2.getOne("Cost") > myFile2data = [] > mergedFedExCharges = [:] > repeat with rowCount = 2 to file2string.line.count > myLineData = [] > repeat with i = 1 to file2string.line[rowCount].item.count > myItem = file2string.line[rowCount].item[i] > if i = 1 then > myItem = chars(myItem,2,myItem.length) > end if > myLineData.append(myItem) > end repeat > if myLineData.count = myHeaders2.count then > myFile2data.append(myLineData) > myOrderSymbol = symbol("s" & myLineData[myOrderColumn]) > myCurrentValue = getaProp(mergedFedExCharges,myOrderSymbol) > if voidP(myCurrentValue) then > mergedFedExCharges.addProp(myOrderSymbol,0.00) > end if > mergedFedExCharges[myOrderSymbol] = mergedFedExCharges[myOrderSymbol] + > value(myLineData[myChargesColumn]) > writeUpdate(myLineData[1]) > else > writeError("file 2 : " & string(myLineData) & RETURN) > end if > end repeat > |
|
|||
|
Lol, I figured they?d be sensitive, I guess I was really wondering ?how?
sensitive. My real fear was that getaProp implicitly searches the list from start to until a match is found everytime it?s called, from what you?re describing I guess it does. I eliminated the repetitive line and item counting and got a speed boost but it?s still slow. As for the append statements I?m not sure how I can substitute a string operation. The append statements are being used to build a list representation of the original spreadsheet, so in the example myFile2data.append(myLineData), myLineData is a linear list representing a single spreadsheet row, all cells being treated as string elements in the list. myLineData[] is then append to myFile2data[] so for example cell C5 can be accessed by the construct myFile2data[5][3] |
|
|||
|
I also noticed you are creating symbols on the fly.. How many might they be?
Cause if we are talking about a large number of them, then that's what is causing the issue. Symbols are kept in an internal list (after some hashing is applied on the symbol's string). So, the more the symbols, the slower the lookup, and therefore, the slower ALL director's commands that use symbols. If you go past a certain number of symbols, performing symbols lookup will be slower than string comparisons. It's quite technical to explain, but, in simple words, from what I can tell, director's (internal) symbols creation code wasn't built with creating symbols on the fly in mind. So, it's not a scalable one. And though the number of symbols created does not affect compiled scripts, it does greatly affect compilation time and the string to symbol command. To test if this is the case in your script, try benchmarking the symbol("somestring") command once before you run your code, and once after you have run it - and the symbols have been created. Applied CD wrote: > Lol, I figured they?d be sensitive, I guess I was really wondering ?how? > sensitive. My real fear was that getaProp implicitly searches the list from > start to until a match is found everytime it?s called, from what you?re > describing I guess it does. I eliminated the repetitive line and item counting > and got a speed boost but it?s still slow. As for the append statements I?m not > sure how I can substitute a string operation. The append statements are being > used to build a list representation of the original spreadsheet, so in the > example myFile2data.append(myLineData), myLineData is a linear list > representing a single spreadsheet row, all cells being treated as string > elements in the list. myLineData[] is then append to myFile2data[] so for > example cell C5 can be accessed by the construct myFile2data[5][3] > |
|
|||
|
Applied CD wrote:
> I?ve got a script that reads a large CSV spreadsheet and parses the > data into a list of the form [[A1,B1,C1], [A2,B2,C2], [A3,B3,C3]] and > a second list of the form [#A1:B1,#A2:B2,#A3:B3] etc? The actual > spreadsheet is about 10 columns x 10,000 rows. Reading the file > string goes fast enough, the parsing starts off fast but slows to a > crawl after about 500 rows (I put the row count on the stage to check > progress). If you want it to work faster, you could look into using Perl, or, for less of a learning curve, Visual Studio Express editions are free and you'd have a choice of language. Processing time could be reduced by a factor substantially better than ten (maybe down to seconds). Anyway, rather than writing to a file in the loop, store it in a string and then write that to file, and don't write more than a few lines to the message window. Andrew |
|
|||
|
Thanks everyone. The program ran overnight and produced the finished file
before I got in so now the speed issue is more academic than practical, I don?t think it will ever be fast enough to run in real time and wait for the result. I agree another language would process the data faster (I?m rusty at vBasic but probably could get it done), the problem is that this little utility will only be run a few times and then be trashed or forgotten, it would be tough to justify the time to work in an unfamiliar language. What I?m really doing is helping someone reconcile a huge database table from their online order management system with another huge table generated by FedEx for their shipping costs. The consolidated table required the aggregation of certain values that seemed impossible in SQL but simple programmatically. Turns out their offer for ?free? shipping on orders over $200 is actually costing them money, lots of money. Once they build the shipping cost back into the price structure this program is done. As for on the fly symbol generation, I haven?t run a bench mark test yet, but you?re right, I create a ton of them (about 6000) on the fly. I actually thought it was an efficient solution, I had one column with Order Number, another column with Price. Every item in the shopping cart gets its own row, so there were about 6000 unique Order Numbers repeated over 13,000 line items. I needed the total Price for each unique Order Number so I converted the Order Number to a symbol and created a property list of the form: [#OrderNum:PriceTotal]. This avoided explicit loops to discover if the current Order Number was already in the list, or was it a new Order Number. Simply use getaProp(#OrderNum) and if its void the order number is new and added to the list, if it?s not void just add the current price to the property value. |
![]() |
| Thread Tools | |
| Display Modes | |
|
|
- Contact Us
-|-
Adobe Dreamweaver Forums -|-
Archive -|-
Top -|-Rules/Disclaimer-|-Help/Support-|-Advertise