-?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIInit.cpp:139: cart_rank 0 mNodePos(0,0,0) dims(1,2,1) left -1 right 1 top -1 bottom -1 top-right -1 bot-right -1 bottom-left -1 top-left -1 Rank: 0 out of: 2 from: deep2.stanford.edu using device: 0 Grid Layout: deep2.stanford.edu:0 deep3.stanford.edu:0 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- CommandLineParser.cpp:11: Parsed parameter: layers = 1 -?- CommandLineParser.cpp:11: Parsed parameter: epochs = 1 -?- CommandLineParser.cpp:11: Parsed parameter: datadir = /scail/group/deeplearning/speech/awni/kaldi-stanford/kaldi-trunk/egs/swbd/s5/exp/nn_data_fbank_train_nn/feats -?- CommandLineParser.cpp:11: Parsed parameter: filecount = 1 -?- CommandLineParser.cpp:11: Parsed parameter: step = 1e-5 -?- CommandLineParser.cpp:11: Parsed parameter: cost1 = 1 -?- CommandLineParser.cpp:11: Parsed parameter: step1 = 1 -?- CommandLineParser.cpp:11: Parsed parameter: lambda1 = .1 -?- CommandLineParser.cpp:11: Parsed parameter: alpha1 = .1 -?- CommandLineParser.cpp:11: Parsed parameter: optim = mo Image data: 96 1,31,24 --- GpuArray.h:361: Allocating device 0 memory ([1] -> 3.8147e-06 MB) total memory used 3.8147e-06 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIInit.cpp:139: cart_rank 1 mNodePos(0,1,0) dims(1,2,1) left 0 right -1 top -1 bottom -1 top-right -1 bot-right -1 bottom-left -1 top-left -1 Rank: 1 out of: 2 from: deep3.stanford.edu using device: 0 --- GpuArray.h:361: Allocating device 0 memory ([1] -> 3.8147e-06 MB) total memory used 3.8147e-06 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- CommandLineParser.cpp:11: Parsed parameter: layers = 1 -?- CommandLineParser.cpp:11: Parsed parameter: epochs = 1 -?- CommandLineParser.cpp:11: Parsed parameter: datadir = /scail/group/deeplearning/speech/awni/kaldi-stanford/kaldi-trunk/egs/swbd/s5/exp/nn_data_fbank_train_nn/feats -?- CommandLineParser.cpp:11: Parsed parameter: filecount = 1 -?- CommandLineParser.cpp:11: Parsed parameter: step = 1e-5 -?- CommandLineParser.cpp:11: Parsed parameter: cost1 = 1 -?- CommandLineParser.cpp:11: Parsed parameter: step1 = 1 -?- CommandLineParser.cpp:11: Parsed parameter: lambda1 = .1 -?- CommandLineParser.cpp:11: Parsed parameter: alpha1 = .1 -?- CommandLineParser.cpp:11: Parsed parameter: optim = mo -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- GpuArray.h:361: Allocating device 0 memory ([1] -> 3.8147e-06 MB) total memory used 7.62939e-06 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- TrainExperiment.cu:127: grid size 1,2,1 rank 1 pos 0,1,0 -?- Layer.h:153: Global array size: 96,1,31,24 -?- Layer.h:158: outputArray args. nodePos: 0,1,0 gridDim: 1,2,1 inputArray: 0,0,0,0 arrSize: 96,1,31,24 -?- BlockLayer.h:47: inSize: 1,31,24 mInputCube: 1,8,8 -?- Layer.h:153: Global array size: 96,8,96,68 -?- Layer.h:158: outputArray args. nodePos: 0,1,0 gridDim: 1,2,1 inputArray: 96,1,31,24 arrSize: 96,8,96,68 -?- BlockLayer.h:47: inSize: 8,96,68 mInputCube: 1,8,8 --- GpuArray.h:361: Allocating device 0 memory ([1] -> 3.8147e-06 MB) total memory used 7.62939e-06 MB --- TrainExperiment.cu:127: grid size 1,2,1 rank 0 pos 0,0,0 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- Layer.h:153: Global array size: 96,1,31,24 -?- Layer.h:158: outputArray args. nodePos: 0,0,0 gridDim: -?- Layer.h:153: Global array size: 96,8,92,64 -?- Layer.h:158: outputArray args. nodePos: 0,1,0 gridDim: 1,2,1 inputArray: 96,8,96,68 arrSize: 96,8,92,64 -?- Layer.h:153: Global array size: 1,1,1,1 -?- Layer.h:158: outputArray args. nodePos: 0,1,0 gridDim: 1,2,1 inputArray: 96,8,92,64 arrSize: 1,1,1,1 -?- Layer.h:204: Input pos = 0,46,0 size = 8,46,64 -?- Layer.h:205: Output pos = 0,0,--- GpuArray.h:361: Allocating device 0 memory (0 size = 1,1,1 -?- Layer.h:206: Valid pos = 0,0,0 size = 0,0,0 [1,1,1,1] -> 3.8147e-06 MB) total memory used 1.14441e-05 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1--- GpuArray.h:361: Allocating device 0 memory ([1,1,1,1] -> 3.8147e-06 MB) total memory used 1.14441e-05 MB ,46,64 -?- Layer.h:205: Output pos = 0,0,0 size = 1,1,1 -?- Layer.h:206: Valid pos = 0,0,0 size = 0,0,0 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- GpuArray.h:361: Allocating device 0 memory ([1,1,1,1] -> 3.8147e-06 MB) total memory used 1.52588e-05 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- GpuArray.h:361: Allocating device 0 memory ([1,1,1,1] -> 3.8147e-06 MB) total memory used 1.52588e-05 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- GpuArray.h:361: Allocating device 0 memory ([96,8,46,64] -> 8.625 MB) total memory used 8.62502 MB -?- Layer.h:204: Input pos = 0,48,0 size = 8,48,68 -?- Layer.h:205: Output pos = 0,48,0 size = 8,44,64 -?- Layer.h:206: Valid pos = 0,46,0 size = 8,46,64 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- GpuArray.h:361: Allocating device 0 memory ([96,8,48,64] -> 9 MB) total memory used 9.00002 MB -?- Layer.h:204: Input pos = 0,0,0 size = 8,52,68 -?- Layer.h:205: Output pos = 0,0,0 size = 8,48,64 -?- Layer.h:206: Valid pos = 0,0,0 size = 8,46,64 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 1 creating a recv message with an overlap size of 0,2,64 at a local position of 0,0,0 subL 0,0,0,0 subR 0,0,2,0 subD 0,0,0,1 arraySize 96,8,46,64 --- GpuArray.h:361: Allocating device 0 memory ([96,8,46,64] -> 8.625 MB) total memory used 17.25 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 0 creating a send message with an overlap size of 0,2,64 at a local position of 0,46,0 subL 0,0,46,0 subR 0,0,48,0 subD 0,0,46,1 arraySize 96,8,48,64 --- GpuArray.h:361: Allocating device 0 memory ([96,8,48,64] -> 9 MB) total memory used 18 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 1 creating a send message with an overlap size of 0,2,64 at a local position of 0,0,0 subL 0,0,0,0 subR 0,0,2,0 subD 0,0,0,1 arraySize 96,8,46,64 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- Layer.h:153: Global array size: 96,1,31,24 -?- Layer.h:158: outputArray args. nodePos: 0,1,0 gridDim: 1,2,1 inputArray: 96,8,96,68 arrSize: 96,1,--- MPIArray.h:335: Rank 0 creating a recv message with an overlap size of 0,2,64 at a local position of 0,46,0 subL 0,0,46,0 subR 0,0,48,0 subD 0,0,46,-?- cudaUtil.cpp:62: Setting device to 0 stream -1 31,24 -?- Layer.h:153: Global array size: 1,1,1,1 -?- Layer.h:158: outputArray args. nodePos: 0,1,0 gridDim: 1,2,1 inputArray: 96,1,31,24 arrSize: 1,1,1,1 -?- Layer.h:204: Input pos = 0,16,0 size = 1,15,24 -?- Layer.h:205: Output pos = 0,0,0 size = 1,1,1 -?- Layer.h:206: Valid pos = 0,--- GpuArray.h:361: Allocating device 0 memory ([1,1,1,1] -> 3.8147e-06 MB) total memory used 17.25 MB 0,0 size = 0,0,0 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- GpuArray.h:361: Allocating device 0 memory ([1,1,1,1] -> 3.8147e-06 MB) total memory used 17.25 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- Layer.h:204: Input pos = 0,48,0 size = 8,48,68 -?- Layer.h:205: Output pos = 0,12,0 size = 1,19,24 -?- Layer.h:206: Valid pos = --- GpuArray.h:361: Allocating device 0 memory ([96,1,19,24] -> 0.166992 MB) total memory used 17.417 MB 0,16,0 size = 1,15,24 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- GpuArray.h:361: Allocating device 0 memory ([1,1,1,1] -> 3.8147e-06 MB) total memory used 18 MB --- GpuArray.h:361: Allocating device 0 memory ([96,1,19,24] -> 0.166992 MB) total memory used 18.167 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- Layer.h:204: Input pos = 0,0,0 size = 8,48,68 -?- Layer.h:205: Output pos = 0,0,0 size = 1,19,24 -?- Layer.h:206: Valid pos = 0,0,0 size = 1,16,24 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 1 creating a send message with an overlap size of 0,4,24 at a local position of 0,0,0 subL 0,0,0,0 subR 0,0,4,0 subD 0,0,0,1 arraySize 96,1,19,24 --- MPIArray.h:335: Rank 1 creating a recv message with an overlap size of 0,3,24 at a local position of 0,4,0 subL 0,0,4,0 subR 0,0,7,0 subD 0,0,4,1 arraySize 96,1,19,24 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- GpuArray.h:361: Allocating device 0 memory ([96,1,19,24] -> 0.166992 MB) total memory used 17.584 MB --- MPIArray.h:335: Rank 1 creating a send message with an overlap size of 0,3,24 at a local position of 0,4,0 subL 0,0,4,0 subR 0,0,7,0 subD 0,0,4,1 arraySize 96,1,19,24 --- MPIArray.h:335: Rank 1 creating a recv message with an overlap size of 0,4,24 at a local position of 0,0,0-?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 subL 0,0,0,0 subR 0,0,4,0 subD 0,0,0,1 arraySize 96,1,19,24 --- GpuArray.h:361: Allocating device 0 memory ([96,8,48,68] -> 9.5625 MB) total memory used 27.1465 MB -?- Layer.h:204: Input pos = 0,12,0 size = 1,19,24 -?- Layer.h:205: Output pos = 0,48,0 size = 8,48,68 -?- Layer.h:206: Valid pos = 0,48,0 size = 8,48,68 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 0 creating a send message with an overlap size of 0,3,24 at a local position of 0,16,0 subL 0,0,16,0 subR 0,0,19,0 subD 0,0,16,1 arraySize 96,1,19,24 --- MPIArray.h:335: Rank 0 creating a recv message with an overlap size of 0,4,24 at a local position of 0,12,0 subL 0,0,12,0 subR 0,0,16,0 subD 0,0,12,1 arraySize 96,1,19,24 --- GpuArray.h:361: Allocating device 0 memory ([96,1,19,24] -> 0.166992 MB) total memory used 18.334 MB --- MPIArray.h:335: Rank 0 creating a send message with an overlap size of 0,4,24 at a local position of 0,12,0 subL 0,0,12,0 subR 0,0,16,0 subD 0,0,12,1 arraySize 96,1,19,24 --- MPIArray.h:335: Rank 0 creating a recv message with an overlap size of 0,3,24 at a local position of 0,16,0 subL 0,0,16,0 subR 0,0,19,0 subD 0,0,16,1 arraySize 96,1,19,24-?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 1 creating a send message with an overlap size of 0,4,68 at a local position of 0,0,0 subL 0,0,0,0 subR 0,0,4,0 subD 0,0,0,1 --- GpuArray.h:361: Allocating device 0 memory ([96,8,52,68] -> 10.3594 MB) total memory used 28.6934 MB -?- Layer.h:204: Input pos = 0,0,0 size = 1,19,24 -?- Layer.h:205: Output pos = 0,0,0 size = 8,48,68 -?- Layer.h:206: Valid pos = 0,0,0 size = 8,52,68 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 arraySize 96,8,48,68 --- GpuArray.h:361: Allocating device 0 memory ([96,8,48,68] -> 9.5625 MB) total memory used 36.709 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 0 creating a recv message with an overlap size of 0,4,68 at a local position of 0,48,0 subL 0,0,48,0 subR 0,0,52,0 subD 0,0,48,1 arraySize 96,8,52,68 --- GpuArray.h:361: Allocating device 0 memory ([96,8,52,68] -> 10.3594 MB) total memory used 39.0528 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 0 creating a send message with an overlap size of 0,4,68 at a local position of 0,48,0 subL 0,0,48,0 subR 0,0,52,0 subD 0,0,48,1 arraySize 96,8,52,68 --- GpuArray.h:361: Allocating device 0 memory ([8,4,4,1,8,8,12,17] -> 6.375 MB) total memory used 45.4278 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 1 creating a recv message with an overlap size of 0,4,68 at a local position of 0,0,0 subL 0,0,0,0 subR 0,0,4,0 subD 0,0,0,1 arraySize 96,8,48,68 --- GpuArray.h:361: Allocating device 0 memory ([8,4,4,1,8,8,12,17] -> 6.375 MB) total memory used 43.084 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- FilterBank.h:288: Filter bank size: 8,4,4,1,8,8,12,17 --- GpuArray.h:361: Allocating device 0 memory ([-?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- FilterBank.h:288: Filter bank size: 8,4,4,1,8,8,12,17 --- GpuArray.h:361: Allocating device 0 memory ([128,204] -> 0.0996094 MB) total memory used 45.5274 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 128,204] -> 0.0996094 MB) total memory used 43.1836 MB --- GpuArray.h:361: Allocating device 0 memory ([8,48,68] -> 0.0996094 MB) total memory used 43.2832 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- GpuArray.h:361: Allocating device 0 memory ([8,48,68] -> 0.0996094 MB) total memory used 45.627 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- ArrayReduce.h:171: 1.67117e+06 OPs, 6.47461 MB in 0.000133991 seconds. 12.8613 GOPS, 48.6606 GB/s -?- BsxFun.h:180: 26112 OPs, 0.199219 MB in 0.000135899 seconds. 0.193501 GOPS, 1.4417 GB/s -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- ArrayReduce.h:171: 1.67117e+06 OPs, 6.47461 MB in 0.000144958 seconds. 11.8402 GOPS, 44.7973 GB/s -?- BsxFun.h:180: 26112 OPs, 0.199219 MB in 0.000138044 seconds. 0.190472 GOPS, 1.41913 GB/s % L0_filters_dim{1} = [8,4,4,1,8,8,12,17]; --- GpuArray.h:361: Allocating device 0 memory ([1,8,48,68] -> 0.0996094 MB) total memory used 43.3828 MB % L0_filters_dim{2} = [8,4,4,1,8,8,12,17]; -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- Layer.h:204: Input pos = 0,0--- GpuArray.h:361: Allocating device 0 memory ([96,1,19,24] -> 0.166992 MB) total memory used 43.5498 MB ,0 size = 0,0,0 -?- Layer.h:205: Output pos = 0,16,0 size = 1,15,24 -?- Layer.h:206: Valid pos = 0,12,0 size = 1,19,24 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 1 creating a send message with an overlap size of 0,3,24 at a local position of 0,4,0 subL 0,0,4,0 subR 0,0,7,0 subD 0,0,4,1 arraySize 96,1,19,24 --- MPIArray.h:335: Rank 1 creating a recv message with an overlap size of 0,4,24 at a local position of 0,0,0 subL 0,0,0,0 subR 0,0,4,0 subD 0,0,0,1 arraySize 96,1,19,24 --- GpuArray.h:361: Allocating device 0 memory ([96,1,19,24] -> 0.166992 MB) total memory used 43.7168 MB -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 1 creating a send message with an overlap size of 0,4,24 at a local position of 0,0,0 subL 0,0,0,0 subR 0,0,4,0 subD 0,0,0,1 arraySize 96,1,19,24 --- MPIArray.h:335: Rank 1 creating a recv message with an overlap size of 0,3,24 at a local position of 0,4,0 subL 0,0,4,0 subR 0,0,7,0 subD 0,0,4,1 arraySize 96,1,19,24 --- GpuArray.h:361: Allocating device 0 memory ([1,8,48,68] -> 0.0996094 MB) total memory used 45.7266 MB --- GpuArray.h:361: Allocating device 0 memory ([96,1,19,24] -> 0.166992 MB) total memory used 45.8936 MB --- MPIArray.h:335: Rank 0 creating a send message with an overlap size of 0,4,24 at a local position of 0,12,0 subL 0,0,12,0 subR 0,0,16,0 subD 0,0,12,1 arraySize 96,1,19,24 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- Layer.h:204: Input pos = 0,0,0 size = 0,0,0 -?- Layer.h:205: Output pos = 0,0,0 size = 1,16,24 -?- Layer.h:206: Valid pos = 0,0,0 size = 1,19,24 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- MPIArray.h:335: Rank 0 creating a recv message with an overlap size of 0,3,24 at a local position of 0,16,0 subL 0,0,16,0 subR 0,0,19,0 subD 0,0,16,1 arraySize 96,1,19,24 --- GpuArray.h:361: Allocating device 0 memory ([96,1,19,24] -> 0.166992 MB) total memory used 46.0606 MB --- MPIArray.h:335: Rank 0 creating a send message with an overlap size of 0,3,24 at a local position of 0,16,0 subL 0,0,16,0 subR 0,0,19,0 subD 0,0,16,1 arraySize 96,1,19,24 --- MPIArray.h:335: Rank 0 creating a recv message with an overlap size of 0,4,24 at a local position of 0,12,0 subL 0,0,12,0 subR 0,0,16,0 subD 0,0,12,1 arraySize 96,1,19,24 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 -?- cudaUtil.cpp:62: Setting device to 0 stream -1 --- TrainExperiment.cu:139: Creating paramOptInterface --- DataAudioFbankLayer.h:573: Rank 1 producer: waiting on file mutex--- TrainExperiment.cu:139: ---Creating paramOptInterface DataAudioFbankLayer.h:575: Rank 1 producer: done waiting on file mutex --- TrainExperiment.cu:150: Setting opt: 0x7374616566 opt name: mo --- DataAudioFbankLayer.h:573: Rank 0 producer: waiting on file mutex --- DataAudioFbankLayer.h:575: Rank 0 producer: done waiting on file mutex --- TrainExperiment.cu:150: Setting opt: --- DataAudioFbankLayer.h:577: Producer: Waiting for files 0 opt name: mo -!- Experiment.h:92: Assertion failed: mOpt==0 && opt != 0 [deep2.stanford.edu:mpi_rank_0][error_sighandler] Caught error: Aborted (signal 6) [deep2.stanford.edu:mpi_rank_0][print_backtrace] 0: ./bin/saeModelParallelMPI(print_backtrace+0x1e) [0x5b1a9e] [deep2.stanford.edu:mpi_rank_0][print_backtrace] 1: ./bin/saeModelParallelMPI(error_sighandler+0x59) [0x5b1ba9] [deep2.stanford.edu:mpi_rank_0][print_backtrace] 2: /lib64/libpthread.so.0() [0x358520f500] [deep2.stanford.edu:mpi_rank_0][print_backtrace] 3: /lib64/libc.so.6(gsignal+0x35) [0x3584a328a5] [deep2.stanford.edu:mpi_rank_0][print_backtrace] 4: /lib64/libc.so.6(abort+0x175) [0x3584a34085] [deep2.stanford.edu:mpi_rank_0][print_backtrace] 5: ./bin/saeModelParallelMPI(_ZN10Experiment6setOptEP9Optimizer+0x64) [0x4c4780] [deep2.stanford.edu:mpi_rank_0][print_backtrace] 6: ./bin/saeModelParallelMPI(_ZN15TrainExperimentC1ERK17CommandLineParser+0x1944) [0x4a4c6a] [deep2.stanford.edu:mpi_rank_0][print_backtrace] 7: ./bin/saeModelParallelMPI(main+0xa6) [0x4a089e] [deep2.stanford.edu:mpi_rank_0][print_backtrace] 8: /lib64/libc.so.6(__libc_start_main+0xfd) [0x3584a1ecdd] [deep2.stanford.edu:mpi_rank_0][print_backtrace] 9: ./bin/saeModelParallelMPI() [0x4a0669] [deep2.stanford.edu:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died? [deep2.stanford.edu:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died? [deep2.stanford.edu:mpispawn_0][child_handler] MPI process (rank: 0, pid: 20919) terminated with signal 6 -> abort job [deep1.stanford.edu:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node deep2 aborted: Error while reading a PMI socket (4) [deep3.stanford.edu:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 6. MPI process died? [deep3.stanford.edu:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 6. MPI process died? [deep3.stanford.edu:mpispawn_1][handle_mt_peer] Error while reading PMI socket. MPI process died?