14.1.4 Training Set Size, Analysis, Selection

Chapter Contents (Back)
Training Set.

Baum, L.E., Petrie, T., Soules, G., Weiss, N.,
A Maximizzation Technique Occurring in teh Statistical Analysis of Probabilistic Function of Markov Chains,
AMS(41), No. 1, 1970, pp. 164-171. BibRef 7000

Kanal, L.[Laveen], Chandrasekaran, B.,
On dimensionality and sample size in statistical pattern classification,
PR(3), No. 3, October 1971, pp. 225-234.
WWW Version. 0309
BibRef

Wilson, D.L.,
Asymptotic properties of nearest neighbor rules using edited data,
SMC(2), 1972, pp. 408-421. Remove from training set those items mis-classified by the chosen rules. BibRef 7200

Jain, A.K.[Anil K.], Dubes, R.C.[Richard C.],
Feature definition in pattern recognition with small sample size,
PR(10), No. 2, 1978, pp. 85-97.
WWW Version. 0309
BibRef

Kalayeh, H.M., Muasher, M.J., and Landgrebe, D.A.,
Feature Selection When Limited Numbers of Training Samples are Available,
GeoRS(21), No. 4, October 1983, pp. 434-438.
IEEE Top Reference. BibRef 8310

Muasher, M.J., and Landgrebe, D.A.,
The K-L Expansion as an Effective Feature Ordering Technique for Limited Training Sample Size,
GeoRS(21), No. 4, October 1983, pp. 438-441.
IEEE Top Reference. BibRef 8310

Kalayeh, H.M., Landgrebe, D.A.,
Predicting the Required Number of Training Samples,
PAMI(5), No. 6, November 1983, pp. 664-666. BibRef 8311

Muasher, M.J., and Landgrebe, D.A.,
A Binary Tree Feature Selection Technique for Limited Training Set Size,
RSE(16), No. 3, December 1984, pp. 183-194. BibRef 8412

Landgrebe, D.A., and Malaret, E.R.,
Noise in Remote Sensing Systems: Effect on Classification Accuracy,
GeoRS(24), No. 2, March 1986, pp. 294-299.
IEEE Top Reference. BibRef 8603

Shahshahani, B.M.[Behzad M.], and Landgrebe, D.A.[David A.],
The Effect of Unlabeled Samples in Reducing the Small Sample Size Problem and Mitigating the Hughes Phenomenon,
GeoRS(32), No. 5, September 1994, pp. 1087-1095.
IEEE Abstract. IEEE Top Reference.
IEEE DOI Link
PDF Version. BibRef 9409

Hoffbeck, J.P.[Joseph P.], Landgrebe, D.A.,
Covariance-Matrix Estimation and Classification with Limited Training Data,
PAMI(18), No. 7, July 1996, pp. 763-767.
IEEE Abstract. IEEE Top Reference.
WWW Version. 9608

PDF Version. BibRef

Herbst, K.[Klaus],
Pattern recognition by polynomial canonical regression,
PR(17), No. 3, 1984, pp. 345-350.
WWW Version. 0309
BibRef

Wharton, S.W.[Stephen W.],
An analysis of the effects of sample size on classification performance of a histogram based cluster analysis procedure,
PR(17), No. 2, 1984, pp. 239-244.
WWW Version. 0309
BibRef

Djouadi, A., Snorrason, O., and Garber, F.D.,
The Quality of Training-Sample Estimates of the Bhattacharyya Coefficient,
PAMI(12), No. 1, January 1990, pp. 92-97.
IEEE Abstract. IEEE Top Reference.
WWW Version. BibRef 9001

Hong, Z.Q.[Zi-Quan], Yang, J.Y.[Jing-Yu],
Optimal discriminant plane for a small number of samples and design method of classifier on the plane,
PR(24), No. 4, 1991, pp. 317-324.
WWW Version. 0401
BibRef

Rachkovskij, D.A., Kussul, E.M.,
Datagen: A Generator of Datasets for Evaluation of Classification Algorithms,
PRL(19), No. 7, May 1998, pp. 537-544. 9808
BibRef

Larsen, R.[Rasmus], Nielsen, A.A.[Allan Aasbjerg], Flesche, H.[Harald],
Sensitivity study of a semi-automatic training set generator,
PRL(21), No. 13-14, December 2000, pp. 1175-1182. 0011
BibRef
Earlier:
Sensitivity Study of a Semi-automatic Supervised Classifier Applied to Minerals from X-Ray Mapping Images,
SCIA99(Statistical Methods). BibRef

Larsen, R.[Rasmus], Hilger, K.B.[Klaus Baggesen],
Probabilistic Generative Modelling,
SCIA03(861-868).
WWW Version. 0310
BibRef

Hilger, K.B., Nielsen, A.A., Larsen, R.,
A Scheme for Initial Exploratory Data Analysis of Multivariate Image Data,
SCIA01(O-Tu4A). 0206
BibRef

Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.,
Analysis of new techniques to obtain quality training sets,
PRL(24), No. 7, April 2003, pp. 1015-1022.
Elsevier DOI Link
HTML Version. 0301
BibRef

Chen, D.M.[Dong-Mei], Stow, D.[Douglas],
The Effect of Training Strategies on Supervised Classification at Different Spatial Resolutions,
PhEngRS(68), No. 11, November 2002, pp. 1155-1162. Three different training strategies often used for supervised classification are compared for six image subsets containing a single land-use/land-cover component and at five different spatial resolutions.
WWW Version. 0304
BibRef

Beiden, S.V.[Sergey V.], Maloof, M.A.[Marcus A.], Wagner, R.F.[Robert F.],
A general model for finite-sample effects in training and testing of competing classifiers,
PAMI(25), No. 12, December 2003, pp. 1561-1569.
IEEE Abstract. IEEE Top Reference. 0401
More than size of sample set. BibRef

Inoue, M.[Masashi], Ueda, N.[Naonori],
Exploitation of unlabeled sequences in hidden Markov models,
PAMI(25), No. 12, December 2003, pp. 1570-1581.
IEEE Abstract. IEEE Top Reference. 0401
How to use unlabeled data in learning. BibRef

Sánchez, J.S.,
High training set size reduction by space partitioning and prototype abstraction,
PR(37), No. 7, July 2004, pp. 1561-1564.
WWW Version. 0405
BibRef

Wang, H.C.[Hai-Chuan], Zhang, L.M.[Li-Ming],
Linear generalization probe samples for face recognition,
PRL(25), No. 8, June 2004, pp. 829-840.
WWW Version. 0405
Generate probe sets using constrained linear subspace of the original probes. BibRef

Prudêncio, R.B.C.[Ricardo B. C.], Ludermir, T.B.[Teresa B.], de Carvalho, F.A.T.[Francisco A. T.],
A Modal Symbolic Classifier for selecting time series models,
PRL(25), No. 8, June 2004, pp. 911-921.
WWW Version. 0405
BibRef

Kuo, B.C., Chang, K.Y.,
Feature Extractions for Small Sample Size Classification Problem,
GeoRS(45), No. 3, March 2007, pp. 756-764.
IEEE DOI Link 0703
BibRef

Angiulli, F.[Fabrizio],
Condensed Nearest Neighbor Data Domain Description,
PAMI(29), No. 10, October 2007, pp. 1746-1758.
IEEE DOI Link 0710
Distinguish between normal and abnormal data to find the minimal subset of consistent data. BibRef

Farhangfar, A.[Alireza], Kurgan, L.A.[Lukasz A.], Dy, J.[Jennifer],
Impact of imputation of missing values on classification error for discrete data,
PR(41), No. 12, December 2008, pp. 3692-3705.
WWW Version. 0810
Missing values; Classification; Imputation of missing values; Single imputation; Multiple imputations For databases. studies the effect of missing data imputation using five single imputation methods (a mean method, a Hot deck method, a Naive-Bayes method, and the latter two methods with a recently proposed imputation framework) and one multiple imputation method (a polytomous regression based method) on classification accuracy for six popular classifiers (RIPPER, C4.5, K-nearest-neighbor, support vector machine with polynomial and RBF kernels, and Naive-Bayes) on 15 datasets. BibRef

Koikkalainen, J., Tolli, T., Lauerma, K., Antila, K., Mattila, E., Lilja, M., Lotjonen, J.,
Methods of Artificial Enlargement of the Training Set for Statistical Shape Models,
MedImg(27), No. 11, November 2008, pp. 1643-1654.
IEEE DOI Link 0811
Heart images. BibRef

Peres, R.T., Pedreira, C.E.,
Generalized Risk Zone: Selecting Observations for Classification,
PAMI(31), No. 7, July 2009, pp. 1331-1337.
IEEE DOI Link 0905
Select key observations in sample set. BibRef


Eaton, R., Lowell, J., Snorrason, M., Irvine, J.M., Mills, J.,
Rapid training of image classifiers through adaptive, multi-frame sampling method,
AIPR08(1-7).
IEEE DOI Link 0810
BibRef

Christoudias, C.M.[C. Mario], Urtasun, R.[Raquel], Kapoorz, A.[Ashish], Darrell, T.J.[Trevor J.],
Co-training with noisy perceptual observations,
CVPR09(2844-2851).
IEEE DOI Link 0906
BibRef

Wang, H.[Hai], Wang, S.H.[Shou-Hong],
Visualization of the Critical Patterns of Missing Values in Classification Data,
Visual07(267-274).
Springer DOI Link 0706
BibRef

Lapedriza, À.[Àgata], Masip, D.[David], Vitrià, J.[Jordi],
A Hierarchical Approach for Multi-task Logistic Regression,
IbPRIA07(II: 258-265).
Springer DOI Link 0706
small number of samples for training. BibRef

Sugiyama, M.[Masashi], Blankertz, B.[Benjamin], Krauledat, M.[Matthias], Dornhege, G.[Guido], Müller, K.R.[Klaus-Robert],
Importance-Weighted Cross-Validation for Covariate Shift,
DAGM06(354-363).
Springer DOI Link 0610
Training points distribution differs from test data. BibRef

Kim, S.W.[Sang-Woon],
On Using a Dissimilarity Representation Method to Solve the Small Sample Size Problem for Face Recognition,
ACIVS06(1174-1185).
Springer DOI Link 0609
BibRef

Ren, J.L.[Jun-Ling],
A Pattern Selection Algorithm Based on the Generalized Confidence,
ICPR06(II: 824-827).
WWW Version. 0609
Selecting the patterns that matter in training. BibRef

Levi, D.[Dan], Ullman, S.[Shimon],
Learning Model Complexity in an Online Environment,
CRV09(260-267).
IEEE DOI Link 0905
BibRef

Levi, D.[Dan], Ullman, S.[Shimon],
Learning to classify by ongoing feature selection,
CRV06(1-1).
IEEE DOI Link 0607
Continuous updating of the clustering based on new inputs. BibRef

Cazes, T.B., Feitosa, R.Q., Mota, G.L.A.,
Automatic Selection of Training Samples for Multitemporal Image Classification,
ICIAR04(II: 389-396).
WWW Version. 0409
BibRef

Yang, C.B.[Chang-Bo], Dong, M.[Ming], Fotouhi, F.[Farshad],
Learning the Semantics in Image Retrieval: A Natural Language Processing Approach,
MMDE04(137).
IEEE DOI Link 0406
BibRef

Yang, C.B.[Chang-Bo], Dong, M.[Ming], Fotouhi, F.[Farshad],
Image Content Annotation Using Bayesian Framework and Complement Components Analysis,
ICIP05(I: 1193-1196).
IEEE DOI Link 0512
BibRef

Vázquez, F.D.[Fernando D.], Salvador Sánchez, J., Pla, F.[Filiberto],
Learning and Forgetting with Local Information of New Objects,
CIARP08(261-268).
Springer DOI Link 0809
BibRef

Vázquez, F.D.[Fernando D.], Salvador-Sánchez, J., Pla, F.[Filiberto],
A Stochastic Approach to Wilson's Editing Algorithm,
IbPRIA05(II:35).
Springer DOI Link 0509
See also Asymptotic properties of nearest neighbor rules using edited data. BibRef

Angelova, A.[Anelia], Abu-Mostafa, Y.[Yaser], Perona, P.[Pietro],
Pruning Training Sets for Learning of Object Categories,
CVPR05(I: 494-501).
IEEE DOI Link 0507
BibRef

Franco, A., Maltoni, D., Nanni, L.,
Reward-punishment editing,
ICPR04(IV: 424-427).
IEEE DOI Link 0409
Editing: remove patterns that are not classified correctly. (in the training set). See also Asymptotic properties of nearest neighbor rules using edited data. BibRef

Kuhl, A., Kruger, L., Wohler, C., Kressel, U.,
Training of classifiers using virtual samples only,
ICPR04(III: 418-421).
IEEE DOI Link 0409
BibRef

Juszczak, P., Duin, R.P.W.,
Selective sampling based on the variation in label assignments,
ICPR04(III: 375-378).
IEEE DOI Link 0409
BibRef

Sprevak, D., Azuaje, F., Wang, H.,
A non-random data sampling method for classification model assessment,
ICPR04(III: 406-409).
IEEE DOI Link 0409
BibRef

Levin, A., Viola, P.A., Freund, Y.,
Unsupervised improvement of visual detectors using co-training,
ICCV03(626-633).
IEEE DOI Link 0311
Train detectors with limited data, then use that to label more data. Use training of 2 classifiers at once. Apply to vehicle tracking. BibRef

Kim, D.S.[Dong Sik], Lee, K.Y.[Kir-Yung],
Training sequence size in clustering algorithms and averaging single-particle images,
ICIP03(II: 435-438).
IEEE Abstract. IEEE Top Reference. 0312
BibRef

Franc, V.[Vojtech], Hlavác, V.[Václav],
Greedy Algorithm for a Training Set Reduction in the Kernel Methods,
CAIP03(426-433).
WWW Version. 0311
BibRef

Johnson, A.Y., Sun, J.[Jie], Bobick, A.F.,
Using similarity scores from a small gallery to estimate recognition performance for larger galleries,
AMFG03(100-103).
IEEE Abstract. IEEE Top Reference. 0311
BibRef

Paredes, R., Vidal, E., Keysers, D.,
An evaluation of the WPE algorithm using tangent distance,
ICPR02(IV: 48-51).
IEEE DOI Link 0211
Weighted Prototype Editing. BibRef

Veeramachaneni, S.[Sriharsha], Nagy, G.[George],
Classifier Adaptation with Non-representative Training Data,
DAS02(123 ff.).
HTML Version. 0303
BibRef

Maletti, G., Ersbĝll, B.K., Conradsen, K., Lira, J.,
An Initial Training Set Generation Scheme,
SCIA01(P-W3B). 0206
BibRef

Fursov, V.A.,
Training in Pattern Recognition from a Small Number of Observations Using Projections Onto Null-space,
ICPR00(Vol II: 785-788).
IEEE DOI Link
HTML Version. 0009
BibRef

Miyamoto, T., Mitani, Y., Hamamoto, Y.,
Use of Bootstrap Samples in Quadratic Classifier Design,
ICPR00(Vol II: 789-792).
IEEE DOI Link
HTML Version. 0009
BibRef

Mayer, H.A.[Helmut A.], Huber, R.[Reinhold],
ERC: Evolutionary Resample and Combine for Adaptive Parallel Training Data Set Selection,
ICPR98(Vol I: 882-885).
IEEE DOI Link 9808
BibRef

Takacs, B.[Barnabas], Sadovnik, L.[Lev], Wechsler, H.[Harry],
Optimal Training Set Design for 3D Object Recognition,
ICPR98(Vol I: 558-560).
IEEE DOI Link 9808
BibRef

Nedeljkovic, V., Milosavljevic, M.,
On the influence of the training set data preprocessing on neural networks training,
ICPR92(II:33-36).
IEEE DOI Link 9208
BibRef

Ferri, F.J., Vidal, E.,
Small sample size effects in the use of editing techniques,
ICPR92(II:607-610).
IEEE DOI Link 9208
BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Sample Sizes Issues, Data analysis, Training Sets .


Last update:Nov 16, 2009 at 19:35:14