Scientometric Indicators as a Way to Classify Brands for Customer’s Information

Iris TUSA², Mihaela PAUN^1,2,*

¹ Louisiana Tech University, Ruston, Louisiana, 71272

² National Institute of Research and Development for Biological Sciences, Splaiul Independentei 296, Bucharest, Romania, 060031

Abstract. The paper proposes a novel approach for classification of different brands that commercialize similar products, for customer information. The approach is tested on electronic shopping records found on Amazon.com, by quantifying customer behavior and comparing the results with classifications of the same brands found online through search engines. The indicators proposed for the classification are currently used scientometric measures that can be easily applied to marketing classification.

Keywords: customer satisfaction, scientometric indicator, classification

JEL classification: D1, C8

1. Introduction

In the last decade, internet shopping has become more and more prominent for different reasons, among them being convenience, reliability and cost. But one of the most important reasons is the product diversity, which is hard to achieve in almost any single retailer store. Although many people still prefer to see, feel, touch the products they are buying, these actions are not sufficient in many cases to make decisions towards purchase. One may go and see the actual product in a store, but eventually may return to the online shopping for making a decision and most importantly for making a decision after some comparative analysis (mentioned here in a large sense) is performed, based on product evaluations, reviews, classifications that are available online. For a buyer that relies on product reviews and classifications, the important issue is to trust the validity of the source that provides the information and in particular to understand the criteria that is at the core of these classifications. In the purchase decision process, the search behavior for the right product is mostly motivated by the customer’s ability to acquire all the relevant information that can address and answer all the uncertainty associated with the purchase, (Murray, 1991). A prospective buyer it is generally influenced by the geographic location and cultural diversity in the area where the buyer lives. When there is a choice to be made, for many products, the

Corresponding author. Tel.: + 40 021-220.77.80 ext. 275 fax: +40 21-220.76 95.

E-mail address: mihaela.paun@dbio.ro.

buyer is influenced by what it is available in his region, what brands his network of friends, colleagues use, using the word of mouth information, (Bansal, 2000). Even when the buyer starts looking for the products that he heard about, sometimes he is not even aware of the diversity of brands and products that exist. A classification of the brands will not only provide the best brands from which the buyer can choose, but it will also provide the information about all the brands that are actually available.

As it is the case with most classifications and performance indices, any measure or criteria that one can come up with will have some drawbacks. However having one or several criteria to analyze, drawbacks or not, is better than having none.

2. Scientometric Measures in Economics

2.1 H index

The idea of this paper is to redefine and apply in the economic context a known measure that is used to characterize the scientific output of a researcher, the Hirsch index. The H index was introduced in the literature by Jorge E. Hirsch (2005). The basic idea in his paper was to propose an index h, defined as the number of papers with citation number ≥ h, as a useful index to characterize the scientific output of a researcher. The research published by any individual and the record of citation of their publications are data that provide useful information. The provided information is at various times evaluated by different people, from different backgrounds and with different sets of criteria. In his paper Hirsch proposed a single number, the so called “h index”, as a simple and useful measure to quantify the scientific output of a researcher. The definition is indeed very simple: “a scientist has index h, if h of his or her N papers have at least h citations each and the other (N-h) papers have ≤ h citations each”, (Hirsch 2005).

The idea was simple and widely adopted in the academic community, the Thomson Reuters Web of Knowledge(formerly ISI Web of Knowledge) is today's premier research platform, helping one to quickly find, analyze, and share information in the sciences, social sciences, arts, and humanities and reports the citation and h-index of each researcher. The present study is making a case that such a simple measure can be used to evaluate and classify similar products, from different brands on the market. The idea is to take different brands that commercialize different products, in different sizes, combinations and rank these brands, such that the prospective buyer is presented with a suggestion, a generated list of evaluated and ranked brands, a classification of the available brands that manufacture the products under consideration. Nowadays with the electronic retail and the amount of information gathered from consumers after purchase, such a study can be easily and readily conducted. The classification is done solely for the prospective buyer benefit, not for the seller.

When the h-index is considered, it is agreed that citations impact is a very important metric. Having papers that are highly cited is very important, as is having many papers and having a high total number of citations. What this translates into, in the brand-product-consumer trilogy is that having many brands is important, having different products per brand is also very important, as is having many buyers for the products that are commercialized.

With the h-index there were concerns that citations can be gamed or self-cited or citation-by-progeny is increasing the total number of citations, as are authors who refuse to cite relevant papers by competitors for a variety of reasons. Again, this phenomena is present in the brand-product-consumer dimension, where here are consumers that refuse to buy or recommend a product because it is a brand that for different reasons the consumer does not like, or consumers that purchase a product just because it is what their parents, friends, relatives do. Even the person preference towards a brand, is assimilated in the h index in the academic community into the person choice and preference an author has towards a specific journal in which he or she publishes.

A new framework is created in the brand-product-consumer dimension that can be quantified, compare and evaluate in a similar manner the academic community does using the H-index. For this we need a database of recorded number of buyers (and also buyer’s/consumer’s recommendations or reviews). In this case study, we have used public data collected from Amazon.com (data was collected on April 18, 2012).

The new h-index will be defined in this environment as: “A brand has index h if h of its N products have at least h reviews (recorded buyers) each and the other (N – h) products have ≤h reviews (recorded buyers) each.”

The index will provide a classification of brands assessing the fact that from each brand i, the brand has index h_i if there were at least h_i types of products bought (possibly more, but only for h_i we have consumer’s feedback) each type of product had at least h_i buyers that were recorded.

2.2 G Index

One of the big drawbacks of the h-index in the academic environment was that it ignores the total number of citations as long as they exceed the h cut-off value, when a paper is included in the in the h-set of articles, its actual citation count has no effect on h-index, (Egghe, 2008).

What do we really know about a scientist that has h index 5? Is this a scientist that has 5 papers with 5 citations each? Or is this a scientist that has 5 papers with 100 citations each and another 10 papers with 4 citations each? From the point of view of the h index, both of them have index 5. If we are only interested in scientists with h index at least 5, both of them achieved the minimum criteria to be on the list of interest. But if we want to differentiate them we need a new measure able to assign more weight to highly cited publications. To compensate for this drawback, in 2006 Leo Egghe, defined another measure to complement and be used with the h-index, the g-index, (Egghe, 2006).

The g-index is calculated based on the distribution of citations received by a given researcher's publications: “Given a set of articles ranked in decreasing order of the number of citations that they received, the g-index is the (unique) largest number such that the top g articles received (together) at least g² citations” . Therefore the g index aims to improve on the h-index by giving more weight to highly-cited articles. Egghe(2006) has shown that g ≥ h and that a large difference between these indexes indicates that the top h papers were cited way more than the cut-off of h citations. The two indices do not substitute but complement each other.

As in the academic environment, if we want to have a list of ranked brands that satisfy some predefined personal minimum criteria, one might say that a product is tested and it is good enough for the customer’s consideration if the brand considered has at least h products that were purchased (have recorded reviews) by at least h customers. But if the buyer is also interested in the magnitude of the sales for the products of a brand or if the buyer wants to make a decision between buying one of several products that are ranked (by the h-index) with the same rank (and price or availability are not factored in the buyer’s decision) the g index provides a supplementary ranking, helping the buyer in the decision making process.

2.3 A –index

Jin achieves with the A-index, as can be seen in (Jin, 2007) a similar goal as with the g-index defined above, namely to correct for the fact that the original h-index does not take into account the exact number of citations of articles included in the h-core. Simply the A-index is defined as the average number of citations received by the publications included into the Hirsch core. Evidently, the number of the citations in the A index are ranked into a decreasing order.

In the brand-product-consumer dimension one is looking at the magnitude of the sales as well as the number of products within each brand that is evaluated. In a sense the customer is interested to know also how popular a product is, and to have that factored into the classification.

3. Results and Discussion

The example in this present study is used to support and exemplify the decision to buy “the best” baby bottles. After the 2008 study was publicly released, (“Bisphenol A (BPA) Information for Parents”, 2008), great concern was expressed regarding the safe bottles that can be used to feed babies. A 2010 report from the United States Food and Drug Administration (FDA), (“U.S. Food and Drug Administration”, 2010) raised further concerns regarding exposure of fetuses, infants, and young children to the harmful, toxic materials used to make plastics.

Over the years mothers started to prefer the plastic bottles, over the glass ones, because they are convenient and are less likely to break than glass baby bottles. But once the plastic bottles were shown to contain potentially harmful chemicals that can leach into the formula or milk, there was an avid interest in what are the alternatives on the market. In a matter of months essentially, the market was full of established brand names in newborn and baby products and many other new brand names (like Green To Grow, Thinkbaby, Wee-Go), that all of the sudden were commercializing baby bottles made from plastic that were BPA-free (Bisphenol A free). The new plastic bottles were made of polypropylene that did not contain the BPA toxic substance. In 2010, Canada was the first one to declare the BPA compound a toxic substance, (“Order Adding a Toxic Substance to Schedule 1 to the Canadian Environmental Protection Act”, 2010) and (“Canada first to declare Bisphenol A toxic”, 2010), followed in 2011 by the European Community that declared that BPA use is banned in baby bottles, (“EU to ban Bisphenol A in baby bottles in 2011”, 2011). Even though not all countries banned the use of BPA in baby bottles, the offer of BPA free bottles is at the moment more substantial than the regular (clear) plastic bottles.

A part of the consumers started to prefer glass or metal bottles, when it is age appropriate. But still a large part of the consumers prefer the plastic, free BPA bottles, for the very same reasons described above. Some decisions regarding what is best to buy, like the baby bottles purchase are more important than others, there is not much room for error in a decision like this. Buying the wrong type of product, a harmful one can have extreme repercussions in the long run. The decision one makes should be an informed one, and well documented. A list that provides a rigorous classification and provides details about all available choices (some time people are aware only of products that their network use, or are commercialized in their area) may be needed.

The question that arises is which brand to choose, which one is safer, more durable, which brand to trust, since many of them are quite new in the market. The internet, as always, is providing a helping hand, in the sense that with a simple Google search one comes up with several rankings of the top brands of baby bottles. If things were that simple! The conflict begins when one realizes that although several classifications come up, they all differ in smaller or larger degrees and that there is no reasoning/explanation behind the way the classification was made. The study in this manuscript was conducted only on plastic, BPA free type of bottles, did not consider glass or metal baby bottles. At the time the study was conducted, the following classifications were readily available. Table 1 provides three different classifications of baby bottles, without detailing the analysis that generated the rankings. In the web classifications some of the brands are very new on the market. Green to Grow, Thinkbaby appeared on the market after the 2008 report about BPA and in a matter of years they managed to make it to the top 5 brands in some of the classifications one finds on the internet. People sometime have a tendency of not trusting new brands; they do not recognize them unless some solid reasoning stays behind that classification. The data presented in Table 1 was recorded in the study on April 18, 2012 and is citing the sources in Source 1- (“BPA-Free Baby Bottles”, 2011), Source 2-(“Baby Bottles: Reviews”, 2011), Source 3-(“The Five Best Baby Bottles”, 2010).

Table1: Best baby bottles internet classification

Classification by Source 1	Classification by Source 2	Classification by Source 3
Green to Grow	BornFree	Adiri Nurser
Adiri Nurser	Playtex	Born Free
Born Free	Adiri Nurser	Dr. Brown's
Thinkbaby	Dr. Brown's	Green to Grow
Medela	Wee-Go	Tommee Tippee

As mentioned previously the data for the case study was gathered on April 18, 2012, from the Amazon.com and recorded the buyers for all types of free BPA plastic bottles that the Amazon website provided. The data was collected on 17 brands that commercialize free BPA plastic bottles, including brands with tradition in baby bottles, as well as new brands on the market. Some of the brands are manufactured in Europe, in countries like UK, Austria, and Germany and are more known in Europe than in the United States.

Table 2: Ranking by the H-index

Rank	Brand	h index
1	Dr.Brown’s	9
2	Medela	8
3	MAM	7
4	Gerber, Nuby	6
5	Adiri Natural Nurser, Philips Avent, Playtex, The First Years	5
6	Green to Grow,Tommee Tippee, Evenflow, ThinkBaby	4
7	Born Free, Nuk,Wee-Go, Nurtria	3

Table 2 provides the classification obtained after applying the H-index. There were instances when several brands were ranked with the same rank, and in Table 2 they are placed in the same cell. A brand got the ranking h if the brand had h products that had at least h buyers (recorded reviews) each. Table 3 provides the classification obtained after applying the A-index. The A-index looked not only the ranks the H-index gave, but also at the average number of customers over the h categories that were considered in the h index, the top h products that were purchased from a brand.

Table 3: Ranking by the A-index

Rank	Brand	A-index
1	Born Free	87,67
2	Dr.Brown’s	77,89
3	Playtex	66,60
4	The First Years	62,20
5	Philips Avent	52,40
6	Adiri Natural Nurser	38,40
7	MAM	36,40
8	Medela	32,00
9	Nuby	17,83
10	Nuk	17,70
11	Gerber	16,20
12	Evenflow	16,00
13	Wee-Go	15,00
14	ThinkBaby	14,75
15	Green to Grow	10,25
16	Tommee Tippee	9,75
17	Nurtria	5,00

The rating provided by the A-index only allows one brand in a position, if the brands are grouped by deciles the adjusted A-index rating is presented in Table 4.

Table 4: Ranking by the adjusted A-index

Rank	Brand
1	Born Free
2	Dr.Brown’s
3	Playtex, The First Years
4	Philips Avent
5	Adiri Natural Nurser, MAM, Medela
6	Nuby, Nuk, Gerber, Evenflow, Wee-Go, ThinkBaby, Green to Grow
7	Tommee Tippee, Nurtria

Table 5 provides the classification of brands obtained after applying the g-index. The g-index looked as the A index did, both at the number of products from a brand that are used in the classification, but also at the cumulative number of recorded customers. There were instances when several brands were ranked with the same rank, and in Table 5 they are placed in the same cell.

Table 5: Ranking by the g-index

Rank	Brand	g index
1	MAM	16
2	Dr.Brown’s	12
3	Medela, Nuby	11
4	Tommee Tippee	10
5	Gerber, Green to Grow, Evenflow, Philips Avent	8
6	Nuk, Adiri Natural Nurser	7
7	Born Free ,ThinkBaby, The First Years, Playtex	6
8	Wee-Go	5
9	Nurtria	4

It is interesting to observe in Tables 2 and 5 that the h and g indices come up with the same brands in the classification of the first three baby bottles brand names, MAM, Dr. Brown’s, Medela, although in a slightly different order. These classifications were based on the number of buyers that are recorded on Amazon, as customers for different products of different baby bottles. The sample that was used for the analysis has 2744 buyers that purchased products from the 17 brands of baby bottles. The classifications in Table 1 found on the internet might be based on more information about the number of buyers, but the criterion for the ranking is not specified. As a prospective buyer one only has access to the information that is provided online. The metrics that are mentioned in this paper for classifications of products are very easy to implement. These metrics can become an automated tool that can be used by any prospective buyer that wants to generate a rapid classification on any type of products that the buyer is interested in.

Dr. Brown’s is a brand name that is found in the tree classifications this paper provided, as being in top 3 places, and can also be found in two of the internet classifications as one of the top 5 brand names suggested. From the analysis provided in this study, Dr. Brown’s is the brand with the highest number of customers (717) and the brand that offers the biggest number of products (12 were considered in the g index, and 9 products were considered in the h and A index), having on average number of 77 buyers per its top 9 products. Three of this brand’s products are listed by Amazon as position 1, 3 and 5 among the most relevant results when searching for baby bottles.

Up to this point the rankings provided by the measures introduced in this paper looked only at the number of customers sampling a product. We will provide an additional ranking, based on the analysis of the customer’s satisfaction with the product. We will investigate correlation between the customer satisfaction and the number of customers, the average price of the brand products, the manufacturer of the product (if the product is an US product or not).

Therefore, the attention goes not only to the number of reviews (customers), but also to the type of feedback customers left and construct a ranking based on the customer satisfaction with the product. A new variable is introduced, the Answer_ Diversity, which is measured using the Gini index (Gini, 1909), and gives a value between 0 and 1, where 0 means that there is perfect equality in the answers over the number of stars. In Amazon.com a product can be rated on a scale of 1 to 5 stars, where 5 stars represent the best rating (highest satisfaction with a product). For example an Answer_ Diversity=0, could mean that if for a brand we have 20 customers, we have 5 customers on each position in the 1 to 5 stars scale a product can be rated with. Another variable Product_ Satisfaction is created which is the percentage of customers that rated the product with 4 or 5 starts.

Table 6: Correlation matrix for the variables participating in the study

	Reviews/ Customers	Answer_ Diversity	Average Rating Amazon	US_ product	Average brand_ price	Product satisfaction
Reviews Customers	1.00000
Answer_ Diversity	-0.04257	1.00000
Average Rating Amazon	-0.05347	0.67943	1.00000
US_ product	0.03264	-0.28601	-0.20454	1.00000
Average brand_ price	-0.1138	0.436441	0.423870	-0.23079	1.00000
Product satisfaction	0.1152	0.937903	0.685385	-0.30083	0.51226	1.00000

Some very interesting conclusions can be obtained from the correlation matrix. There is a very high positive correlation between the Product_ satisfaction and Answer_ diversity variables, 0.937903. As the Answer_ diversity increases, the Product_ satisfaction increases, hence as there is bigger difference in the feedback the customers give (when rating the product), that the difference is due to the fact that more customers are satisfied with the product, resulting in more Product_ satisfaction. The correlation between Product_ satisfaction and Average Rating Amazon is natural, the higher the product satisfaction, the higher the Amazon average Rating. Very interesting is also the negative correlation between the US_ product and the Product satisfaction variables or US_ product and the Answer_ diversity variables, although not very high negative correlation. This implies that for US manufactured products there are some instances where the Answer_ diversity and the Product satisfaction decrease. The US products are among the products with the most Reviews and it was also observed that the median Answer_ diversity for the US products is smaller than the median for the non-US products.

A new variable Satisfied Customers was created, that gives the number of customers that were product satisfied and created a new ranking, referred to as the C-ranking in Table 7, based on the Satisfied Customers and sorted by Answer_ diversity decreasingly (it is expected that if there is large number of Satisfied Customers, is it because the is a large number of customers that ranked high the products and low number of customers that ranked low the same products, hence high Answer_ diversity).

Table 7: Brand rating with product satisfaction and answer diversity

C - Rank	Brand
1	Dr.Brown’s
2	Playtex
3	Medela
4	MAM
5	Philips Avent
6	The First years
7	Born Free
8	Adiri Nurser
9	Gerber
10	Nuby
11	Nuk
12	Wee-Go
13	Green to Grow
14	Think Baby
15	Tommee Tippee
16	Evenflo
17	Nurtria

4. Conclusion

After investigating the ratings provided by Table 7 and comparing with Tables 2, 4 and 5, we observe the top 5 brands in Table 7, are the same brands that are provided by the Table 2, with the H-index ranking. Table 8 will illustrate the all the results.

Table 8: Comparison of Top 5 brands by different rankings provided in the study

Ranking by H	C-ranking	g-index	Adjusted A-index
Dr. Brown’s	Dr. Brown’s	MAM	Born Free
Medela	Playtex	Dr. Brown’s	Dr. Brown’s
MAM	Medela	Medela, Nuby	The First Years, Playtex
Gerber, Nuby	MAM	Tommee Tippee	Philips Avent
Adiri Nurser, Philips Avent, Playtex, The First Years	Philips Avent	Philips Avent, Gerber, Green to Grow, Evenflo	Adiri Nurser, MAM, Medela

We can conclude the h-index offers a very good classification of the brands, taking into consideration the number of buyers and number of products the brand has. The classification provided by the h index is strongly related with the classification based on the correlation analysis which considers also the product satisfaction and the diversity of the ratings the customers reported. Both the g-index and the A-index reported similar rankings (with the g-index highlighting the products at the top of a brand, the ones that were purchased more), that agree with the h-ranking and C-ranking. In top 5 positions the same brands appear, although on slightly different position. When one decides to choose brand, one can decide based on the criteria that stays behind each of the classifications. Since all the classifications agree with the names of the top 5 brands, when one is not concerned with exactly which brand to buy, as long as it is in the top 5 brands, other considerations may be used, such as the price of the product or the availability of the product.

In summary, we conclude that the classifications provided in this paper are supported by sound reasoning and should be considered before other classifications that do not provide explanations on the criteria that stay behind them. The indices used in this paper can be applied to classify different brands or products; they are not particular to this case study. Since nowadays the information that was used to do the analysis and report the classification is available, the analysis is easy to perform can be done and updated on daily base, not only on sites like Amazon, but on different other sites like Best Buy, Buy.com, Newegg etc.

Acknowledgements

This work was partially supported by the Romania National Authority for Scientific Research, National Research Program PN 09-36, BIODIV

References

[1] L. Egghe. Theory and practice of the g-index. Scientometrics. 2006, 69(1): pp. 131-152, DOI: 10.1007/s11192-006-0144-7.

[2] L. Egghe. The influence of transformations on the h-index and the g-index. Journal of the American Society for Information Science and Technology, 2008, 59(8): pp. 1304-12.

[3] C. Gini. Concentration and dependency ratios (in Italian). English translation in Rivista di Politica Economica, 1909, 87 (1997): 769-789.

[4] J.E.Hirsch, An index to quantify an individual's scientific research output. Proc.Nat.Acad.Sci. 2005, 102(46): pp. 16569–16572, DOI: 10.1073/pnas.0507655102.

[5] H. S. Bansal, P. A. Voyer. Word-of-Mouth Processes within a Services Purchase Decision Context, Journal of Service Research, 2000, 3(2): pp. 166-177.

[6] B. Jin. The AR-index: complementing the h-index, ISSI Newsletter, 2007, 3(1), p.6.

[7] K. Murray. A Test of Services Marketing Theory: Consumer Information Acquisition Activities, Journal of Marketing , 1991, 55(1) .

[8] Bisphenol A (BPA) Information for Parents [Online], Last Accessed April 18, 2012. Available at http://www.hhs.gov/safety/bpa/

[9] U.S. Food and Drug Administration. Update on Bisphenol A for Use in Food Contact Applications: January 2010; Updated March 30, 2012. Last Accessed April 18, 2012. Available at http://www.fda.gov/newsevents/publichealthfocus/ucm064437.htm .

[10] Order Adding a Toxic Substance to Schedule 1 to the Canadian Environmental Protection Act 1999, Canada Gazette Part II. 13 October 2010; 144(21):1806–18. Last Accessed April 18, 2012. Available at: http://www.gazette.gc.ca/rp-pr/p2/2010/2010-10-13/pdf/g2-14421.pdf

[11] M. Mittelstaedt. Canada first to declare Bisphenol A toxic. Globe and Mail Canada, Available at http://www.theglobeandmail.com/news/national/canada-first-to-declare-bisphenol-a-toxic/article1755272/, 2010

[12] EU to ban Bisphenol A in baby bottles in 2011. Last Accessed April 18, 2012. Available at http://ec.europa.eu/dgs/health_consumer/dyna/consumervoice/create_cv.cfm?cv_id=716

[13] BPA-Free Baby Bottles, Source: http://www.parents.com/baby/feeding/bottlefeeding/bpa-free-baby-bottles/, last accessed April 1, 8, 2012.

[14] Baby Bottles: Reviews, Source: http://www.consumersearch.com/baby-bottles, last accessed April 18, 2012.

[15] The Five Best Baby Bottles, Source: http://www.lilsugar.com/Five-Best-Baby-Bottles-7515262, last accessed April 18, 2012.