Gesture recognition is an open problem in the area of machine vision, a field of computer science that enables systems to emulate human vision. Gesture recognition has many applications in improving human-computer interaction, and one of them is in the field of Sign Language Translation, wherein a video sequence of symbolic hand gestures is translated into natural language.
Dataset
The dataset format is patterned to match closely with the classic MNIST. Each training and test case represents a label (0–25) as a one-to-one map for each alphabetic letter A-Z (and no cases for 9=J or 25=Z because of gesture motions). The training data (27,455 cases) and test data (7172 cases) are approximately half the size of the standard MNIST but otherwise similar with a header row of the label, pixel1, pixel2….pixel784 which represent a single 28x28 pixel image with grayscale values between 0–255.
Data Preprocessing
As the dataset has already given CSV values for images, we don’t need to do much preprocessing. If dataset of the image was in raw format, we have to convert them in CSV format arrays before doing any of the further operations. Still, we perform the following steps:
- Separate features(784 pixel columns) and output(result label)
- Reshape the features
- One Hot Encoding on the result
X_train = train.drop(['label'],axis=1)
X_test = test.drop(['label'], axis=1)
X_train = np.array(X_train.iloc[:,:])
X_train = np.array([np.reshape(i, (28,28)) for i in X_train])
X_test = np.array(X_test.iloc[:,:])
X_test = np.array([np.reshape(i, (28,28)) for i in X_test])
num_classes = 26
y_train = np.array(y_train).reshape(-1)
y_test = np.array(y_test).reshape(-1)
y_train = np.eye(num_classes)[y_train]
y_test = np.eye(num_classes)[y_test]
X_train = X_train.reshape((27455, 28, 28, 1))
X_test = X_test.reshape((7172, 28, 28, 1))
Model
We will use Keras to build the simple CNN(Convolutional Neural Network).
There are total 7 layers in the CNN:
- 1st Convolutional Layer with relu
- 1st Max Pooling
- 2nd Convolutional Layer with relu
- 2nd Max Pooling
- Flattening
- First Full Layer with relu
- Output Layer with sigmoid
def model():
classifier = Sequential()
classifier.add(Convolution2D(filters=8,
kernel_size=(3,3),
strides (1,1),
padding='same',
input_shape=(28,28,1),
activation='relu',
data_format='channels_last'))
classifier.add(MaxPooling2D(pool_size=(2,2)))
classifier.add(Convolution2D(filters=16,
kernel_size=(3,3),
strides=(1,1),
padding='same',
activation='relu'))
classifier.add(MaxPooling2D(pool_size=(4,4)))
classifier.add(Flatten())
classifier.add(Dense(128, activation='relu'))
classifier.add(Dense(26, activation='sigmoid'))
classifier.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
return classifier
Then fit the model on the training set and check the accuracy on the test set.
classifier.fit(X_train, y_train, batch_size = 100, epochs = 100)
y_pred = classifier.predict(X_test)
Note that the output present in y_pred is in the format of the array with 26 values for each training example. We have to see which one is maximum and then create y_pred again.
Result
- Training Set Accuracy: 96.06 %
- Test Set Accuracy: 87.77%
Complete Code with Dataset
blackbird71SR / Small-Deep-Learning-Projects
Small projects with Deep Learning magic! - Predicting Customer Churn in Banking, Predict tags on Stack Overflow, Sign Language Recognition
Neural Networks
Send a pull request for any suggestions and errors…
Top comments (0)