|
|
|
"Time To Slay Another Myth"
|
Time to slay another myth (1 of 9), Read 181 times |
|
|
Conf: |
|
|
From: |
|
|
Date: |
Monday, May 14, 2001 5:30 PM |
Alright. We have all heard of "always analyze at greater than 25-30 %
sample or you will get bad statistics". Shoot I think I started it. Anyway,
now I see some folks saying that a 40-50 ROW sample for any size table will get
you good statistics (within 2 standard deviations of the "correct"
statistics.) Anybody checked this out? I don't have any large databases to play
on right now...
Michael R. Ault
Training, Consulting, Monitoring
www.dbagroup.net
|
Topic: |
Time to slay another myth (2 of 9), Read 109 times |
|
Conf: |
|
|
From: |
|
|
Date: |
Monday, May 14, 2001 6:03 PM |
I believe it depends to a large degree on how skewed your data is. In our
(and most likely other) GL Peoplesoft application, for example, account and
department use in journals and ledgers is rather skewed: to use the old 80/20
metaphor, 20% of the accounts and departments are being used in 80% of the journals. If I use a too small sample then the sampling estimates the total nr of
accounts and departments so low that the cbo "thinks" they are not
very selective and shuns indexes in favor of FTS. Not a good idea on a 9
million row table.
Wolfgang Breitling
Oracle7, 8, 8i OCP DBA
|
Topic: |
Time to slay another myth (3 of 9), Read 101 times |
|
Conf: |
|
|
From: |
|
|
Date: |
Tuesday, May 15, 2001 4:32 PM |
For my really large tables, I usually sample about 30 to 35% of the table. I
always compute my indexes and look at the skew of the data in the tables for
generating histograms (which I compute where histograms are needed). I have
found some cases where sampling the 35% of a table where the data is sorted will
cause some incorrect stats to be generated between user_histograms and
user_tab_columns and I will bump up the sample size. Smaller tables (dimensions)
and materialized views I usually compute.
Outside of the occasional Oracle "feature" (i.e. bug), this has given
me a acceptable balance of time required to update statistics and achieving
great query performance.
Mark M.
|
Topic: |
Time to slay another myth (4 of 9), Read 102 times, 1 File Attachment |
|
Conf: |
|
|
From: |
|
|
Date: |
Wednesday, May 16, 2001 8:04 AM |
Yep...tests confirm this. Here is a spread sheet showing my test results.
Michael R. Ault
Training, Consulting, Monitoring
www.dbagroup.net
|
Topic: |
Time to slay another myth (5 of 9), Read 99 times |
|
Conf: |
|
|
From: |
|
|
Date: |
Wednesday, May 16, 2001 2:08 PM |
Since this is related, What's the advantages of sampling of rows and sampling of blocks ??
|
Topic: |
Time to slay another myth (6 of 9), Read 102 times |
|
Conf: |
|
|
From: |
|
|
Date: |
Wednesday, May 16, 2001 2:54 PM |
I would say percent is better since it will automatically adjust the number
of rows sampled based on the size of the table.
Michael R. Ault
Training, Consulting, Monitoring
www.dbagroup.net
|
Topic: |
Time to slay another myth (7 of 9), Read 82 times |
|
Conf: |
|
|
From: |
|
|
Date: |
Thursday, May 17, 2001 2:02 PM |
Yes Mike !
I would say percent is better since it will automatically adjust the number of
rows sampled based on the size of the table.
But the problem in percentage is for some cases it would go for COMPUTE
irrespective of sampling percentage..
Best Regards,
K Gopalakrishnan
Bangalore ,INDIA
|
Topic: |
Time to slay another myth (8 of 9), Read 78 times |
|
Conf: |
|
|
From: |
|
|
Date: |
Thursday, May 17, 2001 3:35 PM |
If you specify higher than 50%.
Michael R. Ault
Training, Consulting, Monitoring
www.dbagroup.net
|
Topic: |
Time to slay another myth (9 of 9), Read 72 times |
|
Conf: |
|
|
From: |
|
|
Date: |
Friday, May 18, 2001 8:32 AM |
Mike
If you specify higher than 50%.
And If your HWM is less than 100 Blocks (or db_file_multiblock_read_count..).
There are lot others.. I am not able to recollect at this moment.:)
Best Regards,
K Gopalakrishnan
Bangalore ,INDIA