How many records in a table?

Posted by ChUIMonster on 09-Feb-2018 08:23

I vaguely, and perhaps incorrectly, remember seeing an interesting alternative method of estimating the number of records in a table.

It was NOT any of the usual FOR EACH, SELECT *, OPEN QUERY, etc. approaches and I don't think it was "run tabanalys and parse the output (or use the csv)" either.

I'm thinking it had something to do with looking at index blocks.  Maybe.

Does this ring any bells?

All Replies

Posted by James Palmer on 09-Feb-2018 08:56

Sounds like the sort of thing [mention:ae2ea2f6412743fc8be36c522f414ef0:e9ed411860ed4f2ba0265705b8793d05] would be playing around with!

Posted by George Potemkin on 09-Feb-2018 08:56

Since V11.6 viewB2 shows (instantly) the number of blocks allocated per object in type 2 storage areas.

Also it's possible to get quickly the number of levels in an index tree. It can be used as a very rough estimation of table's size.

Posted by Alon Blich on 09-Feb-2018 09:10

yes. idxblockreport. there's the slib/utils/fast-rec-cnt.p in the standard libraries project at the oehive.org

that uses proutil idxblockreport to calc the number of records. it's not exact but it's pretty close.

i timed it years ago at roughly 500,000,000 records per second.

<code>

define var i as int no-undo.

run slib/utils/fast-rec-cnt.p( "<physical dbname>", "table", output i).

message i.

</code>

Posted by George Potemkin on 09-Feb-2018 09:19

idxblockreport is unsupported and has some issues.

As an alternative we can use idxfix/2 scan indexes without recids validation. It will be as fast as idxblockreport:

Index 15 (PUB.Customer, Name): 83 keys.

Posted by Alon Blich on 09-Feb-2018 09:23

thank you George! i'll have a look at updating the code to use idxfix.

Posted by ChUIMonster on 09-Feb-2018 10:15

[quote user="George Potemkin"]

Since V11.6 viewB2 shows (instantly) the number of blocks allocated per object in type 2 storage areas.

[/quote]

That does make it tempting to put everything in -B2 ;)

It would be great to be able to get that same info for B1.

[quote] 

Also it's possible to get quickly the number of levels in an index tree. It can be used as a very rough estimation of table's size.

[/quote]
I think that's what I am vaguely recalling.
Is that what you are talking about with regards to running idxfix?  Or is there a different method?
Once you have the number of levels how do you calculate the estimated table size?

Posted by George Potemkin on 09-Feb-2018 11:09

> It would be great to be able to get that same info for B1.

viewB2 reports the objects in B1 as well as in B2

Exampe:

proutil sports -C viewB2

Area "Customer/Order Area":8 - Primary Buffer Pool

Object Enablement    Size     Type   Object Name
-----------------  -------- -------  ------------
Default                   7  Master   Area.Control-Object:0
Default                   8   Table   PUB.Customer:2
Default                  16   Table   PUB.Order:4
Default                  32   Table   PUB.Order-Line:5
Default                   8   Index   PUB.Customer.Sales-Rep:16
Default                   8   Index   PUB.Order.Cust-Order:21
                   --------
                         79

prostrct statistics sports

  Database Block Usage for Area: Customer/Order Area

  Active blocks: 79
...
  Records/Block: 32
   Cluster size: 8

> Once you have the number of levels how do you calculate the estimated table size?

Number of records ~ 1000 ^ levels

Number of levels in the index tree (1, 2, 3, 4, 5, 6).is a kind of the table's size gradation: "very small", "small", "medium", "large" and "huge".

Posted by Rob Fitzpatrick on 10-Feb-2018 16:10

Here's another approach, using the cardinality stats.  It would be accurate once you hacve run "UPDATE TABLE STATISTICS" for each table.  It is lightly tested on sports, giving the same totals as tabanalys and ABL.  Caveat: it would have to be rewritten to work on a partitioned DB.

define temp-table table-stat no-undo
  field file-number as integer   format "->>>>9"          label "Tbl#"
  field file-name   as character format "x(32)"           label "Table name"
  field rec-count   as int64     format ">>>,>>>,>>>,>>>" label "Record count"
  field stats-dt    as datetime                           label "Update Stats date"
  index file-number as unique file-number
  index file-name file-name
.

for each dictdb._file no-lock where _file._file-number > 0 and _file._file-number < 32768,
  each dictdb._systblstat no-lock where _systblstat._tblid = _file._file-number:

  find table-stat no-lock where table-stat.file-number = _file._file-number no-error.

  if not available( table-stat ) then
  do:

    create table-stat.
    assign
      table-stat.file-number = _file._file-number
      table-stat.file-name   = _file._file-name
    .
  end.

  case _systblstat._property:

    when 2 then table-stat.stats-dt  = _systblstat._val_ts.
    when 4 then table-stat.rec-count = _systblstat._value.

  end case.

end.

for each table-stat no-lock use-index file-name:
  display table-stat.
end.

Posted by Dmitri Levin on 15-Feb-2018 14:40

Rob, it works very will on 1.5 TB database giving results matching tabanalys and ABL.

I would sort temp-table by record-count descending instead of file-name. With 2000 tables that would be a bit better.

define temp-table table-stat no-undo

.....

index rec-count rec-count desc.

.......

for each table-stat no-lock use-index rec-count:

Posted by gus bjorklund on 15-Feb-2018 14:58

this question of how many rows there are in a table seems to come up fairly often.

if you are one of those who care, could you please tell me /why/ you care. why is it important for you to know this number? and why cant you get it from a table analysis report from last week or last month?

-gus

Posted by Peter Judge on 15-Feb-2018 15:08

I’ll throw in a variant – I want to know how many rows result from a query. Mainly for UI affordance (where am I in a set of data). The query filters are typically unknown in advance.

Posted by Rob Fitzpatrick on 15-Feb-2018 15:17

Gus,

A couple of examples come to mind, there are probably others.

- D&L: after the fact, I want to compare record counts in source and target.  I use tabanalys for this but it can be time-consuming on large DBs.  And compared to others here, my "large" DBs are quite small.

- looking at CRUD stats.  I was doing this today.  Looked at a test DB and found a huge number of reads on a table.  I tracked down the program and found an unbracketed query.  I wanted to know whether, given the number of times the program had been run, this would account for the reads I saw.  So I wanted to know how many records were in the table, to see the cost of each table scan.  In cases like this an approximate count is good enough.

Posted by Dmitri Levin on 15-Feb-2018 16:45

One of the reasons I need to know the approximate record count is to estimate how fast I can add an index to the table. Of course tabanalys will work too. I run dbanayls every 5 days, parse it and publish in html on our internal web portal. So every developer could look at it before coming to me for a schema change. It is helpful.

Here Rob presented an other way. As we learned in math two solutions is better than one.

Posted by George Potemkin on 16-Feb-2018 00:59

1) CRUD stats

2) Before I would suggest for a customer how to check the corruptions in their database I should understand how large are the objects in the database.

I, of course, can ask them to send me the recent dbanalys but I prefer that my script to garther quickly all information I may need. That is why viewB2 is a solution for me. Though I would like if viewB2 (or another tool) will also reports the length of object's chains.

Posted by sfgaarthuis on 16-Feb-2018 02:13

Hi Rob,

concerning the record counts during the D&L:

When you perform a binary dump & load, record counts are given. It is quite easy to compare dump and load number of records by scraping the logfiles. I have some simple *nix shell-scripts to handle this. It wil compare the number records dumped against the number of records loaded. (don't load twice ;-)). If you like, i can send them to you.

Simon

Posted by George Potemkin on 16-Feb-2018 02:52

> When you perform a binary dump & load, record counts are given.

It's the number of the indexed records, not the number of all records that exist in a table.

Posted by sfgaarthuis on 18-Feb-2018 13:44

Thanks George, indeed, you are fully right. ;-)

This thread is closed