Adobe Dreamweaver Forums



Last 10 THreads :         static sound (Last Post : kglad - Replies : 15 - Views : 16 )           »          Detect multiple areas of transparency in an image? (Last Post : Rothrock - Replies : 8 - Views : 9 )           »          Add a full screen button to a video. (Last Post : kglad - Replies : 5 - Views : 9 )           »          Problems when uploading my website (Last Post : speculumcm - Replies : 4 - Views : 5 )           »          Dreamweaver CS3 Download (Last Post : DWFAQ.info - Replies : 5 - Views : 6 )           »          Pop up window using AS3 (Last Post : azilen - Replies : 0 - Views : 1 )           »          flash movie plays prematurely (Last Post : lyshamo - Replies : 5 - Views : 6 )           »          Variable for Loader Content (Last Post : kglad - Replies : 5 - Views : 6 )           »          Sick flash (Last Post : edmond_dantes - Replies : 0 - Views : 1 )           »          Flash 10 hangs Internet Explorer 7 on Vista? (Last Post : Aigurs - Replies : 94 - Views : 2261 )           »         


Home Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
User Info Statistics
Go Back   Adobe Dreamweaver Forums > Macromedia Software > Director > Lingo
 
Tags:



Reply
  #1 (permalink)  
Old 11-20-2008, 02:56 AM
Applied CD
 
Posts: n/a
Diggs:
Default Performance issues when working with huge lists

I?ve got a script that reads a large CSV spreadsheet and parses the data into a
list of the form [[A1,B1,C1], [A2,B2,C2], [A3,B3,C3]] and a second list of the
form [#A1:B1,#A2:B2,#A3:B3] etc? The actual spreadsheet is about 10 columns x
10,000 rows. Reading the file string goes fast enough, the parsing starts off
fast but slows to a crawl after about 500 rows (I put the row count on the
stage to check progress). Does anyone know if the getaProp, addProp, and
append methods are sensitive to the size of the list?

A sample of one of the parsing loops is below. I?m aware all interactivity
will stop as this is executed. This script is strictly for internal use, it
crunches the numbers in two spreadsheets and merges the results to a new CSV
file. The program is intended to run overnight and the new file harvested in
the morning.

writeLine("File 2 Data Parsing" & RETURN)
myOrderColumn = myHeaders2.getOne("OrderNum")
myChargesColumn = myHeaders2.getOne("Cost")
myFile2data = []
mergedFedExCharges = [:]
repeat with rowCount = 2 to file2string.line.count
myLineData = []
repeat with i = 1 to file2string.line[rowCount].item.count
myItem = file2string.line[rowCount].item[i]
if i = 1 then
myItem = chars(myItem,2,myItem.length)
end if
myLineData.append(myItem)
end repeat
if myLineData.count = myHeaders2.count then
myFile2data.append(myLineData)
myOrderSymbol = symbol("s" & myLineData[myOrderColumn])
myCurrentValue = getaProp(mergedFedExCharges,myOrderSymbol)
if voidP(myCurrentValue) then
mergedFedExCharges.addProp(myOrderSymbol,0.00)
end if
mergedFedExCharges[myOrderSymbol] = mergedFedExCharges[myOrderSymbol] +
value(myLineData[myChargesColumn])
writeUpdate(myLineData[1])
else
writeError("file 2 : " & string(myLineData) & RETURN)
end if
end repeat



Reply With Quote
Sponsored Links
  #2 (permalink)  
Old 11-20-2008, 04:03 AM
alchemist
 
Posts: n/a
Diggs:
Default Re: Performance issues when working with huge lists

> Does anyone know if the getaProp, addProp, and
> append methods are sensitive to the size of the list?


Is this a trick question? Sure they are. All of them.
Addprop and append are quite fast (due to the list object scalable
preallocating memory as required), so i doubt that they are the cause of
the problem.
GetAProp will search each item in the list, therefore, if you are
searching for the last item, or if the item is not in the list, the more
the items, the slower the command.

Didn't go through all your code but I noticed
- this: repeat with rowCount = 2 to file2string.line.count
Big no-no! Line counting is a very slow operation for it to be a
evaluated in a loop.
- and this: myFile2data.append(myLineData)
String operations like this require memory reallocation, and therefore
are very slow. If you do conclude that such an operation causes the
problem, consider using a preallocated buffer (create a big string in
advance) and then use
mydata.char.[currentoffset..(currentoffset+newstr.length)]=newstr
This can make code run even hundreds times faster, compared to the
append method.


Applied CD wrote:
> I?ve got a script that reads a large CSV spreadsheet and parses the data into a
> list of the form [[A1,B1,C1], [A2,B2,C2], [A3,B3,C3]] and a second list of the
> form [#A1:B1,#A2:B2,#A3:B3] etc? The actual spreadsheet is about 10 columns x
> 10,000 rows. Reading the file string goes fast enough, the parsing starts off
> fast but slows to a crawl after about 500 rows (I put the row count on the
> stage to check progress). Does anyone know if the getaProp, addProp, and
> append methods are sensitive to the size of the list?
>
> A sample of one of the parsing loops is below. I?m aware all interactivity
> will stop as this is executed. This script is strictly for internal use, it
> crunches the numbers in two spreadsheets and merges the results to a new CSV
> file. The program is intended to run overnight and the new file harvested in
> the morning.
>
> writeLine("File 2 Data Parsing" & RETURN)
> myOrderColumn = myHeaders2.getOne("OrderNum")
> myChargesColumn = myHeaders2.getOne("Cost")
> myFile2data = []
> mergedFedExCharges = [:]
> repeat with rowCount = 2 to file2string.line.count
> myLineData = []
> repeat with i = 1 to file2string.line[rowCount].item.count
> myItem = file2string.line[rowCount].item[i]
> if i = 1 then
> myItem = chars(myItem,2,myItem.length)
> end if
> myLineData.append(myItem)
> end repeat
> if myLineData.count = myHeaders2.count then
> myFile2data.append(myLineData)
> myOrderSymbol = symbol("s" & myLineData[myOrderColumn])
> myCurrentValue = getaProp(mergedFedExCharges,myOrderSymbol)
> if voidP(myCurrentValue) then
> mergedFedExCharges.addProp(myOrderSymbol,0.00)
> end if
> mergedFedExCharges[myOrderSymbol] = mergedFedExCharges[myOrderSymbol] +
> value(myLineData[myChargesColumn])
> writeUpdate(myLineData[1])
> else
> writeError("file 2 : " & string(myLineData) & RETURN)
> end if
> end repeat
>

Reply With Quote
  #3 (permalink)  
Old 11-20-2008, 05:04 AM
Applied CD
 
Posts: n/a
Diggs:
Default Re: Performance issues when working with huge lists

Lol, I figured they?d be sensitive, I guess I was really wondering ?how?
sensitive. My real fear was that getaProp implicitly searches the list from
start to until a match is found everytime it?s called, from what you?re
describing I guess it does. I eliminated the repetitive line and item counting
and got a speed boost but it?s still slow. As for the append statements I?m not
sure how I can substitute a string operation. The append statements are being
used to build a list representation of the original spreadsheet, so in the
example myFile2data.append(myLineData), myLineData is a linear list
representing a single spreadsheet row, all cells being treated as string
elements in the list. myLineData[] is then append to myFile2data[] so for
example cell C5 can be accessed by the construct myFile2data[5][3]

Reply With Quote


  #4 (permalink)  
Old 11-20-2008, 06:19 AM
alchemist
 
Posts: n/a
Diggs:
Default Re: Performance issues when working with huge lists

I also noticed you are creating symbols on the fly.. How many might they be?
Cause if we are talking about a large number of them, then that's what
is causing the issue.
Symbols are kept in an internal list (after some hashing is applied on
the symbol's string). So, the more the symbols, the slower the lookup,
and therefore, the slower ALL director's commands that use symbols.
If you go past a certain number of symbols, performing symbols lookup
will be slower than string comparisons.
It's quite technical to explain, but, in simple words, from what I can
tell, director's (internal) symbols creation code wasn't built with
creating symbols on the fly in mind. So, it's not a scalable one. And
though the number of symbols created does not affect compiled scripts,
it does greatly affect compilation time and the string to symbol command.
To test if this is the case in your script, try benchmarking the
symbol("somestring") command once before you run your code, and once
after you have run it - and the symbols have been created.


Applied CD wrote:
> Lol, I figured they?d be sensitive, I guess I was really wondering ?how?
> sensitive. My real fear was that getaProp implicitly searches the list from
> start to until a match is found everytime it?s called, from what you?re
> describing I guess it does. I eliminated the repetitive line and item counting
> and got a speed boost but it?s still slow. As for the append statements I?m not
> sure how I can substitute a string operation. The append statements are being
> used to build a list representation of the original spreadsheet, so in the
> example myFile2data.append(myLineData), myLineData is a linear list
> representing a single spreadsheet row, all cells being treated as string
> elements in the list. myLineData[] is then append to myFile2data[] so for
> example cell C5 can be accessed by the construct myFile2data[5][3]
>

Reply With Quote
  #5 (permalink)  
Old 11-20-2008, 10:04 AM
Andrew Morton
 
Posts: n/a
Diggs:
Default Re: Performance issues when working with huge lists

Applied CD wrote:
> I?ve got a script that reads a large CSV spreadsheet and parses the
> data into a list of the form [[A1,B1,C1], [A2,B2,C2], [A3,B3,C3]] and
> a second list of the form [#A1:B1,#A2:B2,#A3:B3] etc? The actual
> spreadsheet is about 10 columns x 10,000 rows. Reading the file
> string goes fast enough, the parsing starts off fast but slows to a
> crawl after about 500 rows (I put the row count on the stage to check
> progress).


If you want it to work faster, you could look into using Perl, or, for less
of a learning curve, Visual Studio Express editions are free and you'd have
a choice of language. Processing time could be reduced by a factor
substantially better than ten (maybe down to seconds).

Anyway, rather than writing to a file in the loop, store it in a string and
then write that to file, and don't write more than a few lines to the
message window.

Andrew


Reply With Quote
  #6 (permalink)  
Old 11-20-2008, 03:44 PM
Applied CD
 
Posts: n/a
Diggs:
Default Re: Performance issues when working with huge lists

Thanks everyone. The program ran overnight and produced the finished file
before I got in so now the speed issue is more academic than practical, I don?t
think it will ever be fast enough to run in real time and wait for the result.

I agree another language would process the data faster (I?m rusty at vBasic
but probably could get it done), the problem is that this little utility will
only be run a few times and then be trashed or forgotten, it would be tough to
justify the time to work in an unfamiliar language. What I?m really doing is
helping someone reconcile a huge database table from their online order
management system with another huge table generated by FedEx for their shipping
costs. The consolidated table required the aggregation of certain values that
seemed impossible in SQL but simple programmatically. Turns out their offer for
?free? shipping on orders over $200 is actually costing them money, lots of
money. Once they build the shipping cost back into the price structure this
program is done.

As for on the fly symbol generation, I haven?t run a bench mark test yet, but
you?re right, I create a ton of them (about 6000) on the fly. I actually
thought it was an efficient solution, I had one column with Order Number,
another column with Price. Every item in the shopping cart gets its own row, so
there were about 6000 unique Order Numbers repeated over 13,000 line items. I
needed the total Price for each unique Order Number so I converted the Order
Number to a symbol and created a property list of the form:
[#OrderNum:PriceTotal]. This avoided explicit loops to discover if the current
Order Number was already in the list, or was it a new Order Number. Simply use
getaProp(#OrderNum) and if its void the order number is new and added to the
list, if it?s not void just add the current price to the property value.

Reply With Quote


Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



© Camley Interactive (camley.info) 2008 - all logos and images are copywrite their respective owners.
Proud member of the Camley Interactive Network
All times are GMT. The time now is 06:34 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.1.0 ©2007, Crawlability, Inc.
Cheap Car Insurance - Compare Motor Insurance
Endsleigh Car Insurance Natwest Car Insurance
More Than Car Insurance Norwich Union Car Insurance
Prudential Car Insurance Zurich Car Insurance
Inactive Reminders By Mished.co.uk