实战揭秘：OpenCV + 机器学习，开启计算机视觉识别超强之旅（含完整代码）

摘要：本文围绕OpenCV与机器学习算法在计算机视觉识别中的应用展开。先介绍环境搭建与数据集准备，接着深入行人检测和花卉分类两大实战。在行人检测中，利用OpenCV进行图像预处理和HOG特征提取，训练SVM模型实现检测并评估。花卉分类则通过OpenCV提取颜色、纹理等特征，训练决策树和随机森林模型进行分类及性能评估。完整代码展示各步骤实现，旨在助力读者掌握两者融合技术，提升计算机视觉识别任务处理能

AI_DL_CODE

1198人浏览 · 2025-03-03 11:27:29

AI_DL_CODE · 2025-03-03 11:27:29 发布

摘要：本文围绕OpenCV与机器学习算法在计算机视觉识别中的应用展开。先介绍环境搭建与数据集准备，接着深入行人检测和花卉分类两大实战。在行人检测中，利用OpenCV进行图像预处理和HOG特征提取，训练SVM模型实现检测并评估。花卉分类则通过OpenCV提取颜色、纹理等特征，训练决策树和随机森林模型进行分类及性能评估。完整代码展示各步骤实现，旨在助力读者掌握两者融合技术，提升计算机视觉识别任务处理能力，推动相关领域技术应用与发展。

实战揭秘：OpenCV + 机器学习，开启计算机视觉识别超强之旅（含完整代码）

一、引言

在当今数字化时代，计算机视觉识别技术已成为众多领域的核心驱动力。从安防监控中对异常行为的精准识别，到自动驾驶里对道路环境的实时感知，再到医疗影像分析中对疾病的辅助诊断，计算机视觉识别技术无处不在。OpenCV作为计算机视觉领域的开源瑰宝，提供了丰富且高效的图像处理功能。而机器学习算法凭借其强大的模式识别和数据分析能力，为计算机视觉识别注入了智能的灵魂。两者的深度融合，为实现各类复杂的计算机视觉识别任务开辟了广阔的前景。本文将深入探讨OpenCV与机器学习算法在目标检测和图像分类任务中的实际应用，通过详细的实操流程和完整的代码实现，带您领略这一技术组合的魅力与力量。

二、环境搭建与准备

2.1 安装Python

确保您的系统中安装了Python。推荐使用Python 3.7及以上版本，您可以从Python官方网站（https://www.python.org/downloads/）下载并按照安装向导进行安装。在安装过程中，建议勾选“Add Python to PATH”选项，以便在命令行中能够直接调用Python。

2.2 安装OpenCV

安装OpenCV库可以通过pip命令轻松完成。在命令行中输入以下命令：

pip install opencv - python

如果您还需要安装OpenCV的扩展模块，可以使用以下命令：

pip install opencv - contrib - python

2.3 安装机器学习库

对于机器学习部分，我们将使用scikit - learn库。同样在命令行中使用pip安装：

pip install - U scikit - learn

此外，为了进行数据处理和分析，还需要安装numpy和pandas库：

pip install numpy pandas

2.4 数据集准备

目标检测数据集（以行人检测为例）：
- 可以使用公开的数据集，如Caltech Pedestrian Dataset。您可以从官方网站（http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/）下载数据集。该数据集包含了大量在不同场景下拍摄的行人图像，以及对应的标注信息（行人的位置和轮廓）。
- 下载完成后，解压数据集，并将其整理成合适的目录结构。例如，可以创建一个名为“caltech_pedestrian”的文件夹，在该文件夹下分别创建“images”子文件夹用于存放图像文件，“annotations”子文件夹用于存放标注文件（通常为XML格式，包含行人的边界框信息）。
图像分类数据集（以花卉分类为例）：
- 常用的花卉分类数据集有Oxford Flowers 17或Oxford Flowers 102。以Oxford Flowers 17为例，您可以从官方网站（https://www.robots.ox.ac.uk/~vgg/data/flowers/17/）下载。
- 解压下载的文件后，将图像文件按照花卉类别分别存放在不同的子文件夹中。例如，在“flowers_17”文件夹下创建17个子文件夹，每个子文件夹以花卉类别命名，如“daisy”“tulip”等，并将对应的花卉图像放入相应的子文件夹中。

三、目标检测实战：行人检测

3.1 数据预处理

图像读取与灰度化：
- 使用OpenCV的cv2.imread函数读取图像，并使用cv2.cvtColor函数将彩色图像转换为灰度图像。

import cv2

def read_and_grayscale(image_path):
    image = cv2.imread(image_path)
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    return gray_image

图像降噪：
- 为了减少图像中的噪声干扰，使用高斯模糊进行降噪处理。通过cv2.GaussianBlur函数实现。

def denoise_image(image, kernel_size=(5, 5)):
    denoised_image = cv2.GaussianBlur(image, kernel_size, 0)
    return denoised_image

HOG特征提取：
- 利用OpenCV的cv2.HOGDescriptor类进行HOG特征提取。首先初始化HOG描述符，设置相关参数，如窗口大小、块大小、细胞单元大小等。

import numpy as np

def extract_hog_features(image):
    win_size = (64, 128)
    block_size = (16, 16)
    block_stride = (8, 8)
    cell_size = (8, 8)
    nbins = 9

    hog = cv2.HOGDescriptor(win_size, block_size, block_stride, cell_size, nbins)
    features = hog.compute(image)
    features = np.array(features).reshape(-1)
    return features

3.2 数据集加载与预处理

加载Caltech Pedestrian Dataset：
- 编写函数读取图像文件路径和对应的标注文件，并提取标注信息（行人的边界框）。

import xml.etree.ElementTree as ET
import os

def load_caltech_pedestrian_dataset(dataset_path):
    image_paths = []
    labels = []

    image_dir = os.path.join(dataset_path, 'images')
    annotation_dir = os.path.join(dataset_path, 'annotations')

    for filename in os.listdir(image_dir):
        if filename.endswith('.jpg'):
            image_path = os.path.join(image_dir, filename)
            annotation_path = os.path.join(annotation_dir, filename.replace('.jpg', '.xml'))

            if os.path.exists(annotation_path):
                image_paths.append(image_path)
                tree = ET.parse(annotation_path)
                root = tree.getroot()
                bboxes = []
                for obj in root.findall('object'):
                    bbox = obj.find('bndbox')
                    xmin = int(bbox.find('xmin').text)
                    ymin = int(bbox.find('ymin').text)
                    xmax = int(bbox.find('xmax').text)
                    ymax = int(bbox.find('ymax').text)
                    bboxes.append([xmin, ymin, xmax, ymax])
                labels.append(bboxes)

    return image_paths, labels

数据集预处理：
- 对加载的数据集进行预处理，包括图像读取、灰度化、降噪和HOG特征提取。

def preprocess_dataset(image_paths, labels):
    features = []
    for image_path in image_paths:
        gray_image = read_and_grayscale(image_path)
        denoised_image = denoise_image(gray_image)
        hog_features = extract_hog_features(denoised_image)
        features.append(hog_features)

    return np.array(features), labels

3.3 训练SVM模型

数据划分：
- 将预处理后的数据集划分为训练集和测试集，通常按照70% - 30%或80% - 20%的比例划分。这里使用scikit - learn的train_test_split函数进行划分。

from sklearn.model_selection import train_test_split

def split_dataset(features, labels, test_size=0.2, random_state=42):
    X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=test_size, random_state=random_state)
    return X_train, X_test, y_train, y_test

训练SVM模型：
- 使用scikit - learn的SVC类（支持向量分类器）进行模型训练。设置相关参数，如核函数类型（这里使用线性核函数'linear'）、惩罚参数C等。

from sklearn.svm import SVC

def train_svm_model(X_train, y_train, C=1.0):
    model = SVC(kernel='linear', C=C)
    model.fit(X_train, y_train)
    return model

3.4 行人检测与结果评估

行人检测：
- 使用训练好的SVM模型对测试图像进行行人检测。在检测过程中，对图像进行滑动窗口遍历，提取每个窗口的HOG特征并输入到模型中进行预测。

def detect_pedestrians(image, model, win_size=(64, 128), step_size=(8, 8)):
    height, width = image.shape[:2]
    detected_bboxes = []

    for y in range(0, height - win_size[1], step_size[1]):
        for x in range(0, width - win_size[0], step_size[0]):
            window = image[y:y + win_size[1], x:x + win_size[0]]
            window = cv2.resize(window, win_size)
            gray_window = cv2.cvtColor(window, cv2.COLOR_BGR2GRAY)
            denoised_window = denoise_image(gray_window)
            hog_features = extract_hog_features(denoised_window)
            hog_features = np.array(hog_features).reshape(1, -1)

            prediction = model.predict(hog_features)
            if prediction[0] == 1:
                detected_bboxes.append([x, y, x + win_size[0], y + win_size[1]])

    return detected_bboxes

结果评估：
- 使用一些评估指标，如准确率（Precision）、召回率（Recall）和平均准确率（Average Precision）对检测结果进行评估。这里通过计算预测的边界框与真实边界框的重叠程度（交并比，IoU）来判断检测的准确性。

def calculate_iou(bbox1, bbox2):
    x1, y1, x2, y2 = bbox1
    x3, y3, x4, y4 = bbox2

    x_overlap = max(0, min(x2, x4) - max(x1, x3))
    y_overlap = max(0, min(y2, y4) - max(y1, y3))

    intersection_area = x_overlap * y_overlap
    bbox1_area = (x2 - x1) * (y2 - y1)
    bbox2_area = (x4 - x3) * (y4 - y3)

    union_area = bbox1_area + bbox2_area - intersection_area

    iou = intersection_area / union_area
    return iou


def evaluate_detection(model, X_test, y_test, threshold=0.5):
    true_positives = 0
    false_positives = 0
    false_negatives = 0

    for i in range(len(X_test)):
        image = read_and_grayscale(X_test[i])
        detected_bboxes = detect_pedestrians(image, model)
        true_bboxes = y_test[i]

        for detected_bbox in detected_bboxes:
            match = False
            for true_bbox in true_bboxes:
                iou = calculate_iou(detected_bbox, true_bbox)
                if iou > threshold:
                    true_positives += 1
                    match = True
                    break
            if not match:
                false_positives += 1

        for true_bbox in true_bboxes:
            match = False
            for detected_bbox in detected_bboxes:
                iou = calculate_iou(detected_bbox, true_bbox)
                if iou > threshold:
                    match = True
                    break
            if not match:
                false_negatives += 1

    precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
    recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
    average_precision = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

    return precision, recall, average_precision

3.5 完整代码示例

import cv2
import numpy as np
import os
import xml.etree.ElementTree as ET
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score


def read_and_grayscale(image_path):
    """
    读取图像并将其转换为灰度图像
    :param image_path: 图像文件路径
    :return: 灰度图像
    """
    image = cv2.imread(image_path)
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    return gray_image


def denoise_image(image, kernel_size=(5, 5)):
    """
    对图像进行降噪处理
    :param image: 输入图像
    :param kernel_size: 高斯模糊核大小，默认为(5, 5)
    :return: 降噪后的图像
    """
    denoised_image = cv2.GaussianBlur(image, kernel_size, 0)
    return denoised_image


def extract_hog_features(image):
    """
    提取图像的HOG特征
    :param image: 输入图像
    :return: HOG特征向量
    """
    win_size = (64, 128)
    block_size = (16, 16)
    block_stride = (8, 8)
    cell_size = (8, 8)
    nbins = 9

    hog = cv2.HOGDescriptor(win_size, block_size, block_stride, cell_size, nbins)
    features = hog.compute(image)
    features = np.array(features).reshape(-1)
    return features


def load_caltech_pedestrian_dataset(dataset_path):
    """
    加载Caltech Pedestrian Dataset数据集
    :param dataset_path: 数据集路径
    :return: 图像路径列表和对应的标注信息列表
    """
    image_paths = []
    labels = []

    image_dir = os.path.join(dataset_path, 'images')
    annotation_dir = os.path.join(dataset_path, 'annotations')

    for filename in os.listdir(image_dir):
        if filename.endswith('.jpg'):
            image_path = os.path.join(image_dir, filename)
            annotation_path = os.path.join(annotation_dir, filename.replace('.jpg', '.xml'))

            if os.path.exists(annotation_path):
                image_paths.append(image_path)
                tree = ET.parse(annotation_path)
                root = tree.getroot()
                bboxes = []
                for obj in root.findall('object'):
                    bbox = obj.find('bndbox')
                    xmin = int(bbox.find('xmin').text)
                    ymin = int(bbox.find('ymin').text)
                    xmax = int(bbox.find('xmax').text)
                    ymax = int(bbox.find('ymax').text)
                    bboxes.append([xmin, ymin, xmax, ymax])
                labels.append(bboxes)

    return image_paths, labels


def preprocess_dataset(image_paths, labels):
    """
    对数据集进行预处理，包括图像读取、灰度化、降噪和HOG特征提取
    :param image_paths: 图像路径列表
    :param labels: 标注信息列表
    :return: 特征矩阵和标注信息列表
    """
    features = []
    for image_path in image_paths:
        gray_image = read_and_grayscale(image_path)
        denoised_image = denoise_image(gray_image)
        hog_features = extract_hog_features(denoised_image)
        features.append(hog_features)

    return np.array(features), labels


def split_dataset(features, labels, test_size=0.2, random_state=42):
    """
    将数据集划分为训练集和测试集
    :param features: 特征矩阵
    :param labels: 标注信息列表
    :param test_size: 测试集所占比例，默认为0.2
    :param random_state: 随机数种子，用于保证可重复性
    :return: 训练集特征、测试集特征、训练集标注、测试集标注
    """
    X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=test_size, random_state=random_state)
    return X_train, X_test, y_train, y_test


def train_svm_model(X_train, y_train, C=1.0):
    """
    训练SVM模型
    :param X_train: 训练集特征
    :param y_train: 训练集标注
    :param C: SVM的惩罚参数，默认为1.0
    :return: 训练好的SVM模型
    """
    model = SVC(kernel='linear', C=C)
    model.fit(X_train, y_train)
    return model


def detect_pedestrians(image, model, win_size=(64, 128), step_size=(8, 8)):
    """
    使用训练好的SVM模型对图像进行行人检测
    :param image: 输入图像
    :param model: 训练好的SVM模型
    :param win_size: 滑动窗口大小，默认为(64, 128)
    :param step_size: 滑动窗口步长，默认为(8, 8)
    :return: 检测到的行人边界框列表
    """
    height, width = image.shape[:2]
    detected_bboxes = []

    for y in range(0, height - win_size[1], step_size[1]):
        for x in range(0, width - win_size[0], step_size[0]):
            window = image[y:y + win_size[1], x:x + win_size[0]]
            window = cv2.resize(window, win_size)
            gray_window = cv2.cvtColor(window, cv2.COLOR_BGR2GRAY)
            denoised_window = denoise_image(gray_window)
            hog_features = extract_hog_features(denoised_window)
            hog_features = np.array(hog_features).reshape(1, -1)

            prediction = model.predict(hog_features)
            if prediction[0] == 1:
                detected_bboxes.append([x, y, x + win_size[0], y + win_size[1]])

    return detected_bboxes


def calculate_iou(bbox1, bbox2):
    """
    计算两个边界框的交并比（IoU）
    :param bbox1: 边界框1
    :param bbox2: 边界框2
    :return: IoU值
    """
    x1, y1, x2, y2 = bbox1
    x3, y3, x4, y4 = bbox2

    x_overlap = max(0, min(x2, x4) - max(x1, x3))
    y_overlap = max(0, min(y2, y4) - max(y1, y3))

    intersection_area = x_overlap * y_overlap
    bbox1_area = (x2 - x1) * (y2 - y1)
    bbox2_area = (x4 - x3) * (y4 - y3)

    union_area = bbox1_area + bbox2_area - intersection_area

    iou = intersection_area / union_area
    return iou


def evaluate_detection(model, X_test, y_test, threshold=0.5):
    """
    评估行人检测模型的性能
    :param model: 训练好的SVM模型
    :param X_test: 测试集特征
    :param y_test: 测试集标注
    :param threshold: IoU阈值，默认为0.5
    :return: 准确率、召回率、平均准确率
    """
    true_positives = 0
    false_positives = 0
    false_negatives = 0

    for i in range(len(X_test)):
        image = read_and_grayscale(X_test[i])
        detected_bboxes = detect_pedestrians(image, model)
        true_bboxes = y_test[i]

        for detected_bbox in detected_bboxes:
            match = False
            for true_bbox in true_bboxes:
                iou = calculate_iou(detected_bbox, true_bbox)
                if iou > threshold:
                    true_positives += 1
                    match = True
                    break
            if not match:
                false_positives += 1

        for true_bbox in true_bboxes:
            match = False
            for detected_bbox in detected_bboxes:
                iou = calculate_iou(detected_bbox, true_bbox)
                if iou > threshold:
                    match = True
                    break
            if not match:
                false_negatives += 1

    precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
    recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
    average_precision = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

    return precision, recall, average_precision


# 主程序
if __name__ == "__main__":
    dataset_path = "caltech_pedestrian"
    image_paths, labels = load_caltech_pedestrian_dataset(dataset_path)
    features, labels = preprocess_dataset(image_paths, labels)
    X_train, X_test, y_train, y_test = split_dataset(features, labels)

    model = train_svm_model(X_train, y_train)
    precision, recall, average_precision = evaluate_detection(model, X_test, y_test)

    print(f"Precision: {precision}")
    print(f"Recall: {recall}")
    print(f"Average Precision: {average_precision}")

    # 对单张图像进行行人检测示例
    test_image_path = "caltech_pedestrian/images/000001.jpg"
    test_image = cv2.imread(test_image_path)
    gray_test_image = read_and_grayscale(test_image_path)
    detected_bboxes = detect_pedestrians(gray_test_image, model)

    for bbox in detected_bboxes:
        x1, y1, x2, y2 = bbox
        cv2.rectangle(test_image, (x1, y1), (x2, y2), (0, 255, 0), 2)

    cv2.imshow("Pedestrian Detection", test_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

四、图像分类实战：花卉种类分类

4.1 图像特征提取

颜色特征提取：
- 计算图像的颜色直方图作为颜色特征。使用OpenCV的cv2.calcHist函数，分别计算图像在HSV颜色空间中H（色调）、S（饱和度）、V（明度）通道的直方图。

import cv2
import numpy as np


def extract_color_histogram(image, bins=(8, 8, 8)):
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    hist = cv2.calcHist([hsv_image], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256])
    hist = cv2.normalize(hist, hist).flatten()
    return hist

纹理特征提取：
- 利用灰度共生矩阵（GLCM）提取图像的纹理特征。通过计算图像中不同灰度级对出现的概率，来描述图像的纹理信息。

def extract_glcm_features(image, distances=[1], angles=[0]):
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    glcm_features = []
    for distance in distances:
        for angle in angles:
            glcm = np.zeros((256, 256), dtype=np.float32)
            for i in range(gray_image.shape[0] - distance):
                for j in range(gray_image.shape[1] - distance):
                    pixel1 = gray_image[i, j]
                    if angle == 0:
                        pixel2 = gray_image[i, j + distance]
                    elif angle == np.pi / 4:
                        pixel2 = gray_image[i + distance, j + distance]
                    elif angle == np.pi / 2:
                        pixel2 = gray_image[i + distance, j]
                    elif angle == 3 * np.pi / 4:
                        pixel2 = gray_image[i + distance, j - distance]
                    glcm[pixel1, pixel2] += 1
            glcm = glcm / np.sum(glcm)
            contrast = np.sum((i - j) ** 2 * glcm for i in range(256) for j in range(256))
            energy = np.sum(glcm ** 2 for i in range(256) for j in range(256))
            homogeneity = np.sum(glcm / (1 + (i - j) ** 2) for i in range(256) for j in range(256))
            glcm_features.extend([contrast, energy, homogeneity])
    return np.array(glcm_features)

综合特征提取：
- 将颜色特征和纹理特征进行合并，形成用于图像分类的综合特征向量。

def extract_comprehensive_features(image):
    color_features = extract_color_histogram(image)
    texture_features = extract_glcm_features(image)
    comprehensive_features = np.concatenate((color_features, texture_features))
    return comprehensive_features

4.2 数据集加载与预处理

加载Oxford Flowers 17数据集：
- 编写函数遍历数据集文件夹，读取图像文件，并获取对应的花卉类别标签。

import os
import numpy as np


def load_flowers_17_dataset(dataset_path):
    image_paths = []
    labels = []
    class_names = os.listdir(dataset_path)
    class_to_label = {class_name: i for i, class_name in enumerate(class_names)}

    for class_name in class_names:
        class_path = os.path.join(dataset_path, class_name)
        for filename in os.listdir(class_path):
            if filename.endswith('.jpg'):
                image_path = os.path.join(class_path, filename)
                image_paths.append(image_path)
                labels.append(class_to_label[class_name])

    return image_paths, np.array(labels)

数据集预处理：
- 对加载的数据集进行图像读取、特征提取和归一化处理。

def preprocess_flowers_dataset(image_paths, labels):
    features = []
    for image_path in image_paths:
        image = cv2.imread(image_path)
        comprehensive_features = extract_comprehensive_features(image)
        features.append(comprehensive_features)

    features = np.array(features)
    features = features / np.linalg.norm(features, axis = 1, keepdims = True)
    return features, labels

4.3 训练分类模型

选择分类模型：
- 这里使用决策树分类器（DecisionTreeClassifier）和随机森林分类器（RandomForestClassifier）进行对比实验。从scikit - learn库中导入相应的模型类。

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

数据划分与模型训练：
- 将预处理后的数据集划分为训练集和测试集，并分别训练决策树和随机森林模型。

from sklearn.model_selection import train_test_split


def train_classification_models(features, labels, test_size = 0.2, random_state = 42):
    X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size = test_size, random_state = random_state)

    decision_tree_model = DecisionTreeClassifier()
    decision_tree_model.fit(X_train, y_train)

    random_forest_model = RandomForestClassifier()
    random_forest_model.fit(X_train, y_train)

    return decision_tree_model, random_forest_model, X_test, y_test

4.4 图像分类与结果评估

图像分类：
- 使用训练好的模型对测试图像进行分类预测。

def classify_image(image_path, model):
    image = cv2.imread(image_path)
    features = extract_comprehensive_features(image)
    features = features / np.linalg.norm(features)
    features = features.reshape(1, -1)
    prediction = model.predict(features)
    return prediction

结果评估：
- 使用准确率（Accuracy）、召回率（Recall）、F1值（F1 - score）等指标对模型的分类性能进行评估。

from sklearn.metrics import accuracy_score, recall_score, f1_score


def evaluate_classification(model, X_test, y_test):
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    return accuracy, recall, f1

4.5 完整代码示例

import cv2
import numpy as np
import os
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, recall_score, f1_score


def extract_color_histogram(image, bins=(8, 8, 8)):
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    hist = cv2.calcHist([hsv_image], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256])
    hist = cv2.normalize(hist, hist).flatten()
    return hist


def extract_glcm_features(image, distances=[1], angles=[0]):
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    glcm_features = []
    for distance in distances:
        for angle in angles:
            glcm = np.zeros((256, 256), dtype=np.float32)
            for i in range(gray_image.shape[0] - distance):
                for j in range(gray_image.shape[1] - distance):
                    pixel1 = gray_image[i, j]
                    if angle == 0:
                        pixel2 = gray_image[i, j + distance]
                    elif angle == np.pi / 4:
                        pixel2 = gray_image[i + distance, j + distance]
                    elif angle == np.pi / 2:
                        pixel2 = gray_image[i + distance, j]
                    elif angle == 3 * np.pi / 4:
                        pixel2 = gray_image[i + distance, j - distance]
                    glcm[pixel1, pixel2] += 1
            glcm = glcm / np.sum(glcm)
            contrast = np.sum((i - j) ** 2 * glcm for i in range(256) for j in range(256))
            energy = np.sum(glcm ** 2 for i in range(256) for j in range(256))
            homogeneity = np.sum(glcm / (1 + (i - j) ** 2) for i in range(256) for j in range(256))
            glcm_features.extend([contrast, energy, homogeneity])
    return np.array(glcm_features)


def extract_comprehensive_features(image):
    color_features = extract_color_histogram(image)
    texture_features = extract_glcm_features(image)
    comprehensive_features = np.concatenate((color_features, texture_features))
    return comprehensive_features


def load_flowers_17_dataset(dataset_path):
    image_paths = []
    labels = []
    class_names = os.listdir(dataset_path)
    class_to_label = {class_name: i for i, class_name in enumerate(class_names)}

    for class_name in class_names:
        class_path = os.path.join(dataset_path, class_name)
        for filename in os.listdir(class_path):
            if filename.endswith('.jpg'):
                image_path = os.path.join(class_path, filename)
                image_paths.append(image_path)
                labels.append(class_to_label[class_name])

    return image_paths, np.array(labels)


def preprocess_flowers_dataset(image_paths, labels):
    features = []
    for image_path in image_paths:
        image = cv2.imread(image_path)
        comprehensive_features = extract_comprehensive_features(image)
        features.append(comprehensive_features)

    features = np.array(features)
    features = features / np.linalg.norm(features, axis = 1, keepdims = True)
    return features, labels


def train_classification_models(features, labels, test_size = 0.2, random_state = 42):
    X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size = test_size, random_state = random_state)

    decision_tree_model = DecisionTreeClassifier()
    decision_tree_model.fit(X_train, y_train)

    random_forest_model = RandomForestClassifier()
    random_forest_model.fit(X_train, y_train)

    return decision_tree_model, random_forest_model, X_test, y_test


def classify_image(image_path, model):
    image = cv2.imread(image_path)
    features = extract_comprehensive_features(image)
    features = features / np.linalg.norm(features)
    features = features.reshape(1, -1)
    prediction = model.predict(features)
    return prediction


def evaluate_classification(model, X_test, y_test):
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    return accuracy, recall, f1


# 主程序
if __name__ == "__main__":
    dataset_path = "flowers_17"
    image_paths, labels = load_flowers_17_dataset(dataset_path)
    features, labels = preprocess_flowers_dataset(image_paths, labels)

    decision_tree_model, random_forest_model, X_test, y_test = train_classification_models(features, labels)

    decision_tree_accuracy, decision_tree_recall, decision_tree_f1 = evaluate_classification(decision_tree_model, X_test, y_test)
    random_forest_accuracy, random_forest_recall, random_forest_f1 = evaluate_classification(random_forest_model, X_test, y_test)

    print("Decision Tree Classifier:")
    print(f"Accuracy: {decision_tree_accuracy}")
    print(f"Recall: {decision_tree_recall}")
    print(f"F1 - score: {decision_tree_f1}")

    print("\nRandom Forest Classifier:")
    print(f"Accuracy: {random_forest_accuracy}")
    print(f"Recall: {random_forest_recall}")
    print(f"F1 - score: {random_forest_f1}")

    # 对单张图像进行分类示例
    test_image_path = "flowers_17/daisy/100080439_68ee295c07.jpg"
    decision_tree_prediction = classify_image(test_image_path, decision_tree_model)
    random_forest_prediction = classify_image(test_image_path, random_forest_model)

    class_names = os.listdir(dataset_path)
    print(f"Decision Tree Prediction: {class_names[decision_tree_prediction[0]]}")
    print(f"Random Forest Prediction: {class_names[random_forest_prediction[0]]}")

五、总结与展望

通过本文的详细阐述与实操，我们深入体验了OpenCV与机器学习算法融合在计算机视觉识别任务中的强大效能。在目标检测的行人检测任务中，OpenCV的图像处理功能为HOG特征提取奠定基础，SVM模型基于这些特征实现了行人的准确识别。而在图像分类的花卉种类分类任务里，OpenCV提取的颜色和纹理特征，配合决策树、随机森林等机器学习模型，成功地对花卉进行了分类。

然而，技术的发展永无止境。未来，随着深度学习的不断演进，如基于卷积神经网络（CNN）的目标检测和图像分类模型，将进一步提升识别的准确性和效率。同时，OpenCV也在持续更新，不断推出更高效的图像处理算法和工具。我们可以期待将最新的深度学习技术与OpenCV的优势相结合，开发出更强大、更智能的计算机视觉识别系统。希望本文能为广大计算机视觉爱好者和从业者提供有价值的参考，在实际应用中不断探索和创新，推动计算机视觉识别技术在更多领域发挥更大的作用。

火山引擎 ADG 社区

火山引擎开发者社区是火山引擎打造的AI技术生态平台，聚焦Agent与大模型开发，提供豆包系列模型（图像/视频/视觉）、智能分析与会话工具，并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长，新用户可领50万Tokens权益，助力构建智能应用。

更多推荐

Chess用户界面设计：Tailwind CSS样式系统和组件库

GitHub推荐项目精选中的ch/chess是一个类似chess.com的多人在线象棋平台，它采用现代化的前端技术栈构建，尤其在用户界面设计上通过Tailwind CSS样式系统和组件库实现了优雅且功能丰富的交互体验。本文将深入探讨该项目如何利用Tailwind CSS打造一致的设计语言和高效的组件系统，为象棋爱好者提供沉浸式的游戏界面。## 🎨 Tailwind CSS样式系统：构建统一视

火山引擎 ADG 社区

终极指南：GPT-Engineer如何通过AI自动发现代码问题并提升质量

GPT-Engineer是一款强大的AI驱动代码工具，它能帮助开发者自动检测潜在代码问题、优化代码质量，让编程效率提升3倍以上。无论是新手还是资深开发者，都能通过这款工具轻松发现代码中的隐藏缺陷，减少调试时间，释放更多精力在创造性工作上。## 一键发现代码问题：GPT-Engineer的AI审查魔力GPT-Engineer的核心能力在于其内置的智能代码分析系统。通过集成Python代码格式

火山引擎 ADG 社区

SatDump中的纠错编码技术：从RS码到Turbo码的完整实现指南

在卫星数据传输过程中，信号往往会受到各种干扰，导致数据错误。SatDump作为一款通用卫星数据处理软件，集成了多种先进的纠错编码技术，确保从卫星接收到的数据能够准确解码。本文将深入解析SatDump中从Reed-Solomon（RS）码到Turbo码的实现细节，帮助读者理解这些技术如何保障卫星通信的可靠性。## 为什么纠错编码对卫星数据至关重要？卫星与地面站之间的通信链路面临着空间辐射、大