|
| |
Review of Piranhaweb version 1.5 from Thomson Financial
by Henrik Mathiesen, 18/3/2001
Introduction
Piranhaweb is one of several web-based information retrieval systems on
corporate information that Thomson Financial is offering to financial analysts, portfolio managers,
and increasingly to academics (www.piranhaweb.com).
This review looks at the system from an academic point of view and is therefore
focusing on issues that are of particular relevance for academics that are doing
empirical research on firms. Much of the review is given in terms of video
screen-captures, because it is difficult to understand the benefits and problems
of an information retrieval system until it is seen in action.
Which data are available?
The following databases are accessible by Piranhaweb when subscribing to the
entire system:
- The Worldscope Database contains
standardized financial, statistical and market information for over 23,000
active global companies from over 55 countries worldwide. Worldscope data
covers up to 15 years of data and has currently more than 1000 data items or
variables. View sample sheet.
-
The SEC Database contains standardized
financial, statistical and market information for over 11,000 actively traded
U.S. companies. The data covers up to 15 years and has currently about 700
data items. View sample sheet.
-
The Compustat Database contains
standardized financial, statistical and market information for over 10,000
actively traded U.S. companies, as well as over 10,900 inactive U.S.
companies. This data go 20 years back and has currently about 500 variables.
View sample sheet.
-
The Extel Database contains “As reported"
fundamental financial data for approximately 15,000 quoted companies from over
55 countries worldwide. Extel provides data 15 years of historical data from
the balance sheet, profit/loss, and cash flow statements and has currently
about 1100 variables. View sample sheet.
-
The IBES Database contains Current
I/B/E/S Summary and Detail Estimate coverage for over 18,000 companies in 60
countries worldwide. More than 850 firms contribute data to I/B/E/S, from the
largest global houses to regional and local brokers. Estimates for EPS, DPS,
Sales, CPS, and more are provided for up to 5 years forward. The database has
currently about 200 variables. View sample sheet.
Current and
history.
-
The US Pricing Database provided by
Interactive Data Corporation (IDC), contains security pricing, dividends and
earnings data for 32,000 US equity securities. Updated daily after the
market close, this data covers up to 10 years of “rolling” history. About 50
variables are available currently. View sample
sheet.
Although many data items or variables are listed for each database it does
not mean that they are available for every firm in the database and for each
year. Whether it is
available depends on what the law in a particular country/jurisdiction
require the firm to publish. Furthermore, some items are relevant for some
industries but not for other. Many data items are also created from calculations
of other variables. The point is that the databases are not as voluminous as
they appear to be at first glance. On the other hand the financial
databases are not the only data available in the Piranhaweb system. The system
also provides access to
hundreds of thousands of publicly available filings from tens of thousands of
firms plus numerous of articles from various financial newspapers and bulletins.
How are data accessed?
The data can be accessed in two different ways. One is to use a standard web
browser and the other is to use the Piranhaweb toolbar plug-in for Microsoft
Excel. Click links to view video screen-captures on how to capture data: 1)
Data by
Excel toolbar . 2) Data by browser
.
The most exciting feature of the Piranhaweb system is the Piranhaweb toolbar
(version 1.6.0.0) for Excel, which enables an intuitive and easy download of
data directly to an Excel spreadsheet. A onetime installation and configuration
of the toolbar is necessary before it can be used. The toolbar has a button that
launch a wizard that guides you through a few simple steps that define which
data-variables from which firms that should be downloaded to the Excel
spreadsheet and how. No knowledge of the underlying data query language is
required. The data-wizard automatically generates all the complicated code that
makes it possible for Excel to retrieve data from the Piranha web servers. This
simplicity does not come at a great cost of functionality. The wizard allows you
to specify, which database to download from, the portfolio of firms, the
data-variables, the dates of data-variables, the headings of data-variables, the
currency translations if any, whether to make corrections for stock splits and
other capital changes, the measurement scale, and the data input location in the
spreadsheet. The data can be downloaded either as time series or as
cross-sectional data (Click to view sample spreadsheet of
cross-sectional or time-series (pooled)
data made by the Piranhaweb toolbar). One of the mayor benefits
of the Piranhaweb toolbar is that it is easy to use. After a few hours you will
know enough about the toolbar to be able to download data to Excel in a format
that it easy to import into statistical programs, such as, the SAS System from
SAS Institute (Click to view code for SAS System,
version 8 that enables automatic import off data from an Excel spreadsheet
created by the Piranhaweb toolbar; use it as you please).
System limitations and system
stability
Although it only takes a few hours to learn how to get data into Excel in a
format that can is ready for import to statistical programs like SAS it will
take weeks of usage to learn the limitations of the Piranhaweb solution. To test
these limitations I downloaded more than 15 million data entries during three
weeks using the Piranhaweb toolbar (one data entry compares to a number
or a text in one cell in the Excel spreadsheet). Academic research on
firms is typically based on the selection of a representative portfolio of firms
whose characteristics (e.g. financial, or contractual) are statistically
analyzed in order to validate various economic theories about the relations
among those characteristics (Click to see
examples
of relations that are relevant for research in corporate governance). I used
the search feature of Piranhaweb to create a portfolio of 1938 firms from New
York Stock Exchange. Specifically, they where all firms at NYSE in 1999 that had
strictly positive values of the total assets variable from the SEC database
(Click to view a video screen-capture on how to
create portfolios
of firms using the Piranhaweb search facility). This portfolio
was used to produce several Excel spreadsheets containing all 1034 variables
from the Worldscope database, all variables from the SEC database, and all
variables from the Compustat database. Other spreadsheets was produced using
selected variables from the databases IBES, IBES history, Extel, Currency, and
Pricing. All these spreadsheets were made as cross-sectional data for the years
1999, 1998 and 1997. Other spreadsheets were made as 10 Year time-series form
1999 using selected variables from selected databases but still containing data
from all 1938 NYSE firms. All in all about 15 million data entries, or 200M of data was processed and downloaded. Such an exercise reveals the
limitations of the system. One of the limitations is that the Piranha system
allows for a maximum of 5000 firms in one portfolio. This was not a problem
since my portfolio was well below that limit with only 1938 firms. The 5000
firms limit is very good compared to other systems that I have seen because they
typically have a limit of 500 to 1000 firms per portfolio. Another limitation is
the processing power of the PC that is used to do the job of downloading data.
For some reason the download process consumes an awful lot of PC processing
power. If you use the Piranhaweb toolbar to start a job that will download
60.000 data entries it will take about one hour to complete on a PC with a 500Mz
Pentium III. During this time it is recommended not to use other PC programs,
because the PC will be very slow. The bottleneck is not the Piranhaweb system
nor the speed of your internet connection. It is the speed of your PC. If it is
twice as fast you will be able to get the data in half the time. To speed the
job I used 3 computer to get the data (two windows 2000 servers and one windows
2000 workstation with a Microsoft terminal services client to remote control the
two servers).
The first problematic issue that I detected about the Piranhaweb system
itself was that it becomes unstable when the size of the data-acquiring job
increases. In particular, I cannot recommend that one tries to start a job
downloading more than 60.000 data entries on a 500 Mz PC. I actually did manage
to complete a job with 147.000 entries on such a machine, but I must have been
lucky, because all subsequent attempts to execute similar sized jobs failed. It
is quite disappointing when a job fails, because even if it fails in the last
minute before completion all your data are lost and you have to start the entire
job once again. The reason that a job fails is that the internet connection to
the Piranha web-servers are lost. This happens for various reasons, but in most
of the cases that I experienced it was, because the Piranhaweb servers was down
sometimes for an hour or two. I did numerous of 1 hour, 60.000 entries jobs and
I think that one out of seven jobs failed to execute. This means that the
Piranhaweb servers are down 2 or 3 times a day. It is probably done deliberately
in order to update the databases. Fortunately, Piranhaweb is running on four
different locations (e.g. www.piranhaweb.com
and www.piranhawebuk.com) and I newer
experienced that all of the locations were down at the same time. So whenever a
job failed I simply changed the location and restarted the job. Still I believe
Thomson Financial should invest some time eliminating this instability (and
end-user inconvenience) of the Piranhaweb system. It could be done either by
making a recovery function in the Piranhaweb toolbar or by using more redundant
web server architectures.
Data-validation, data documentation,
and support
With regard to data-validation Piranhaweb offers two different ways to
control that the data are what they are supposed to be. One is to compare the
same data variables from different databases. If they differ significantly
something is wrong. The other is to check suspicious database entries with data
from actual reports that have been filed. In particular, Piranhaweb makes
available, as pdf image-files, copies of most of the files that the legislation
of various jurisdictions requires firms to publish. In this way the Piranhaweb
system stores hundreds of thousands of files that can be used for validation or further inquiry. Apart from
your own ability to check the quality of the data Piranhaweb is also running
various tests themselves to check that the data are correct such as checking
that basic accounting identities adds up. However, they can not avoid to
overlook something and I have also been able to find figures that did not make
sense, such as, a firms with negative insider ownership or ownership above 100%.
In such cases one can either delete the observation set or substitute a similar
figure from another database or take a look at the pdf file of the relevant
filing. In terms of productive customer feedback I believe Thomson Financial
will benefit from their academic accounts, because I could imagine that
academics are more careful about data-validation and have more time to report to
Piranhaweb about errors, than business people such as portfolio managers.
One issue that was a bit disappointing was that the Piranhaweb system seems
to have a default configuration that gives you the most resent data available if
you ask for data in a particular year that is not available for that year. I
discovered this failure, because I downloaded variables for several years.
Click to see an example of this error on a time-series of insider ownership from
the SEC database. This failure generally means that you have to download at
least two years to check that you have the data from the year you queried for. It must be possible to reconfigure the Piranhaweb system to return a N/A message
in the relevant Excel cell when you ask for a variable for a year that is not
present. It should be mentioned that this error probably is limitted to
ownership variables in the SEC database.
With regard to data-variable-documentation Piranhaweb offers some manuals
that either can be downloaded as Microsoft word files from the Piranhaweb help
web site or ordered in paperback from the Piranhaweb support office. These
manuals are fine and answer most questions. However, documentation can never be
too good, and it would be nice if Piranhaweb had more numerical examples on how
the ‘synthetic’ variables are calculated. By synthetic I mean variables that
are calculated from other variables, such as, growth rates or averages of EPS or
sales. The documentation should be detailed enough to make it is easy to
reproduce such measures for the sake of data-verification. Although Piranhaweb
is rich in data variables there is one synthetic measure that is missing but
whose presence would add considerable value to academics (and eventually to
financial analysts). This measure is the performance measure Tobin’s Q that theoretically
is defined as the market value of the firm’s outstanding financial claims
divided by the market costs of replacing all the assets represented by the
firm’s financial claims. Equity investors can use Tobin’s Q as an indicator
of whether or not it will be profitable to finance expansion plans. Tobin’s Q
can be approximated by using measures that are available through Piranhaweb (in
particular the Compustat database), but it is a very complicated and
time-consuming job. Instead of having researchers doing this individually the
job should be centralized so that different versions of Tobin’s Q was
available through Piranhaweb.
Finally it should also be mentioned that Piranhaweb is backed by a support
staff that can be reached by phone. I believe they are open 24 hours a day,
because they have clients all around the world. I never experienced any problems
getting someone to help me and they were always competent and polite.
Conclusion
Piranhaweb is the best information retrieval system on corporate information
that I have seen so far. It has thousands of data-variables for tens of
thousands of the largest firms in the world. Moreover the system offers the
ability to validate data by comparing the same variables collected from
different databases. Alternatively data can be validated directly by searching
Piranhaweb’s huge web-based archive of pdf image-files of the various filings
that firms are required to publish. The feature I found most impressive was the
Piranhaweb toolbar for Microsoft Excel. In less than 2 hours I knew exactly how
to get corporate data into a spreadsheet in a format that is ready for import to
a statistical program such as the SAS System from the SAS Institute. The Piranha
system scales well and I was able to gather more than 15 million data entries or
200M of data during three weekends. However, two issues
could have been better. The Piranha web-servers are down too often causing
inconvenient loss of data during downloads. Furthermore, it is not so good that
the system sometimes return current values in cases where you query for a
variable in a year that is not available (this is probably only a problem for
the SEC ownership variables). Nevertheless, the overall impression of the
Piranhaweb system is very convincing and on a scale from 6 to 0 with 6 being the
highest mark it gets 5.
| |
|