8 Replies Latest reply: Mar 10, 2012 6:04 AM by Michael Powe RSS

How much data can you extract via REST(On demand)? Before browser / Excel stops responding

Nikolaj Bardram Newbie

Hi All,

 

I find it very difficult to find out where the limit is to how much data you can extract. Therefore I would like to some experience sharing on this.

 

It varies a lot, how much data I can extract before my browser or excel stops responding.

 

Example the Pages report which has both title and URL, extracting top 10.000 pages to excel, that is just not possible.

 

What does the rest of you do, with big extracts?

  • How much data can you extract via REST(On demand)? Before browser / Excel stops responding
    Paul Lawbaugh Novice

    Performance is a really important topic for us.  We've been looking deeper into this and the team would like more details.

     

    Can you give a typical set of queries you have in a spreadsheet?  And a rough idea how many rows each query represents?

     

    Thanks so much for the help.

    • How much data can you extract via REST(On demand)? Before browser / Excel stops responding
      Li-Kim Lee Newbie

      Nikolaj: We find that the amount of data we can extract via REST is exactly what is shown in the Analytics interface, i.e. table limits for any specific period. We try to circumvent it by breaking up the calls across multiple smaller time periods and then combining the data (if possible). So, in theory, you could retrieve the top 10,000 pages by running the same query over multiple rolling time periods, subsequently combined via lookups and/or pivots. Also, when performing repeated calls via the REST DX API, some of us have experienced a non-Webtrends error in EXCEL related to the way Microsoft handles web queries, which Webtrends advised could be reduced or eliminated by increasing our cache size to hold all our query requests and/or clearing our cache before running our set of queries.

       

      Paul: I think I know what the answer might be but extracting full data sets via REST instead of batch downloads via DataScheduler would be great. We have to archive the DS full extracts on our own FTP server. Also, we find the "trend" option in v3 is useful, although I am never sure how much data is returned or how the limits are set. If you would like EXCEL samples of the the sort of queries we are running to assess the speed and data set size, I can send my files to you... the files are confidential so it would be off-forum.

    • How much data can you extract via REST(On demand)? Before browser / Excel stops responding
      Nikolaj Bardram Newbie

      Hi Paul,

       

      Currently I am building a news dahboard for our intranet. It holds 5 different reports with 10.000 in report limit for all of them. Depening on the timeframe of course, I often meet that limit.

       

      The real issue happens when I deal with two dimensional reports.

       

      I have one displaying the newstitle and the publishing date. So of course that gives me a lot of extra rows (like described in my other post about Blank rows that are useless)

      So potentially this report could give me 20.000 rows, but I have issues with it even if I only try fetching data which has around 4.000 different news, which leads to 8.000 rows. That still causes issues despite the fact that is less than the 10.000 rows I extract from the other reports. This leads me to think there is a difference when the report is two dimensional.

      (Some of my reports has close to 20 measures, that might also cause issues)

       

      Furthermore

      I have a report which is based on the news title and our internal organisation (which is done with a translation file)

      That has potentially 100 rows on second dimension. So in theory that could be 10.000x100, and I fully understand that could be a problem. My plan is to get suppost to lower second dimension report limit to 10.

       

      However even if I try with something that doesn't give me much more than 4000rows in total, this still doesn't work half the time.

       

      Did that help at all? Otherwise feel free to ask, I would certainly like to find a way around it.

  • How much data can you extract via REST(On demand)? Before browser / Excel stops responding
    Michael Powe Newbie

    Hello,

     

    The Web Query mechanism in Excel is not built for big data pulls across the network.  This is an Excel issue.  I recently posted a means of pulling data via PowerShell, saving it off to HTML and then loading it into Excel via the web query.  Any method of pulling data into a local file and loading that will dramatically improve Excel performance.  I speak from experience.  I think the issue for Excel is that it has to first pull then build a map then insert the data.

     

    My experience with data retrievals is that you can get somewhere between 10MB and 25MB of data in a single pull but it seems to depend somewhat on the connection speed.  If the pull takes too long, the server resets the connection.  Only very occasionally, I can get over 20MB but more usually 10-12MB is the sweet spot. 

     

    Thanks.

     

    mp

    • Re: How much data can you extract via REST(On demand)? Before browser / Excel stops responding
      Nikolaj Bardram Newbie

      Hi Michael,

       

      Where do you see how many MB of data you are extracting?

       

      And by the way, I have absolutely no idea how powershell works, also I would prefer being able to send my dashboards around in the organisation without including files.

       

      That was one of the things I was most fond of with REST, not having to include ODBC connections to the dashboards.

      • How much data can you extract via REST(On demand)? Before browser / Excel stops responding
        Michael Powe Newbie

        Hello,

         

        After repeated attempts to get any work done with Excel, I wrote a PowerShell function and used that to download and save the data to an HTML file.  I then imported the file into Excel using the web query form.  For big data pulls, even the PS function takes a dump, usually with the message "the server reset the connection."  I then chunk it up by weeks and insert the weekly files.

         

        I'm referring to the size of the files created using my PS function.  (Which are actually just HTML tables, not complete pages.)

         

        Strictly speaking, you could do this kind of chunking in VBA in Excel by writing a class for your workbook to do the same data retrieval.  However, it's twice as much work and the debugging is worse than a tooth extraction without anaesthetic. 

         

        REST is useful in most cases but it's primitive and limited in many respects.  The obvious case here is that with an ODBC connection, you can count records so you know how much data is being sent before trying to insert it. 

         

        Are your users refreshing the data in the dashboards themselves?  If not, I think my approach would be to write a tool that would create the workbooks dynamically, rather than trying to suck the data directly into Excel and then fiddle it.

         

        IOW, in VB or C#, it's about 3 lines of code to create a workbook and add a sheet to it.  You can then call your REST data in an efficient manner, automatically handle errors and restart the data extraction if necessary, and then at the end a couple more lines of code to save the workbook. 

         

        It is not a nontrivial exercise if you have to apply templates, do formatting and such.  But that is all up-front work, and once it's done, you just run the app and the dashboards are automatically created.

         

        Also, as long as I'm running on on a Saturday morning ... it's not a "dashboard" if you are putting 10,000 line items in it.  There are times when you have to apply the brakes and tell business users that they have to make the case for having data on every single event that occurred on the site.  Because usually, they can't.  Now, sometimes business needs information about particular clients or documents or other content.  But setting fire to the entire field in order to kill a single mole is not the best approach.  Possibly, a more efficient approach would be to extract all the data to an XML or CSV file, and use the dashboard to present the "top 100" or some such.  Anyone that needs to drill down can load the CSV into Excel and torture themselves pivotting.  You, meanwhile, will be sitting on the beach in the warm sun, having a cold drink.  ;-)

         

        mp