本文演示了如何使用 Shapefile 裁剪 NetCDF 文件,并展示了裁剪前后的结果。了解 NetCDF 文件及其操作对于有效的地理空间分析至关重要。

什么是 NetCDF 文件?

NetCDF(网络通用数据格式)是一种用于存储和管理多维科学数据的文件格式。CSV 和 Excel 文件仅限于简单的表格数据,而 NetCDF 支持 n 维数组,因此非常适合管理包含跨时间和地理位置的多个维度和变量的大型复杂数据集,例如温度、降水量和大气压力。

NetCDF 文件的主要特点:

  • 多维数组:通过多维数组支持具有多个维度的数据,例如时间、纬度和经度。
  • 自描述:文件的元数据描述其内容和数据结构。
  • 高效:设计的数据集具有快速访问数据集和对大数据集的有效压缩。

本教程使用的 NetCDF 样本数据来自印度气象局浦那分局。该数据集包含降水量。

先决条件

在我们开始之前,创建一个 Python 环境并确保安装了以下库:

步骤 1:创建 Python 环境

创建虚拟环境可确保您的项目依赖关系被隔离且易于管理。

<span style="background-color:#e3e2e2"><span style="color:#1f2937">复制<code class="language-python"><span style="color:#696969"># Create a new virtual environment in Anaconda Prompt</span>
conda create --name myenv python=<span style="color:#aa5d00">3.11</span><span style="color:#aa5d00">.9</span></code></span></span>
第 2 步:安装所需的软件包

安装处理和可视化 NetCDF 数据所需的软件包的特定版本:

<span style="background-color:#e3e2e2"><span style="color:#1f2937">复制<code class="language-python"><span style="color:#696969"># Install specific versions of packages</span>
!pip install xarray==<span style="color:#aa5d00">2024.6</span><span style="color:#aa5d00">.0</span>
!pip install geopandas==<span style="color:#aa5d00">0.14</span><span style="color:#aa5d00">.3</span>
!pip install rioxarray==<span style="color:#aa5d00">0.15</span><span style="color:#aa5d00">.7</span>
!pip install cartopy==<span style="color:#aa5d00">0.23</span><span style="color:#aa5d00">.0</span>
!pip install matplotlib==<span style="color:#aa5d00">3.8</span><span style="color:#aa5d00">.4</span>
!pip install numpy==<span style="color:#aa5d00">1.24</span><span style="color:#aa5d00">.3</span>
!pip install netCDF4==<span style="color:#aa5d00">1.7</span><span style="color:#aa5d00">.1</span>.post2</code></span></span>
  • xarray用于处理 NetCDF 文件
  • geopandas用于处理 Shapefile
  • rioxarray用于裁剪数据
  • cartopymatplotlib进行可视化
  • numpy以及netcdf4处理数组

验证已安装的软件包版本:

<span style="background-color:#e3e2e2"><span style="color:#1f2937">复制<code class="language-python"><span style="color:#7928a1">import</span> xarray
<span style="color:#aa5d00">print</span>(xarray.__version__)

<span style="color:#7928a1">import</span> cartopy
<span style="color:#aa5d00">print</span>(cartopy.__version__)

<span style="color:#7928a1">import</span> geopandas
<span style="color:#aa5d00">print</span>(geopandas.__version__)

<span style="color:#7928a1">import</span> rioxarray
<span style="color:#aa5d00">print</span>(rioxarray.__version__)

<span style="color:#7928a1">import</span> matplotlib
<span style="color:#aa5d00">print</span>(matplotlib.__version__)

<span style="color:#7928a1">import</span> numpy
<span style="color:#aa5d00">print</span>(numpy.__version__)</code></span></span>

注意:安装软件包后,不要忘记重启内核。此步骤至关重要,以确保您的环境能够识别新安装的版本。

步骤3:读取NetCDF文件

让我们从读取 NetCDF 文件开始。我们将使用这个xarray库,它非常适合处理带有元数据的 NetCDF 文件。

<span style="background-color:#e3e2e2"><span style="color:#1f2937">复制<code class="language-python"><span style="color:#696969"># reading netcdf</span>
<span style="color:#696969"># Loading Modules</span>
<span style="color:#7928a1">import</span> xarray <span style="color:#7928a1">as</span> xr <span style="color:#696969"># reads and handles netcdf files with metadata</span>
data = xr.open_dataset(<span style="color:green">r'C:\Users\Lenovo\Desktop\medium\rainfall.nc'</span>)
<span style="color:#aa5d00">print</span>(data)</code></span></span>
步骤 4:裁剪前绘图

在执行任何剪辑之前,我们会检查 NetCDF 文件中的数据以了解数据集的分布和覆盖范围。

<span style="background-color:#e3e2e2"><span style="color:#1f2937">复制<code class="language-python"><span style="color:#7928a1">import</span> xarray <span style="color:#7928a1">as</span> xr
<span style="color:#7928a1">import</span> cartopy.crs <span style="color:#7928a1">as</span> ccrs
<span style="color:#7928a1">import</span> cartopy.feature <span style="color:#7928a1">as</span> cfeature
<span style="color:#7928a1">import</span> matplotlib.pyplot <span style="color:#7928a1">as</span> plt
<span style="color:#7928a1">import</span> geopandas <span style="color:#7928a1">as</span> gpd

<span style="color:#696969"># Load the dataset</span>
ds = xr.open_dataset(<span style="color:green">r'C:\Users\Lenovo\Desktop\medium\rainfall.nc'</span>)

<span style="color:#696969"># Compute the yearly mean rainfall</span>
yearly_mean = ds.rain.mean(dim=<span style="color:green">'time'</span>)

<span style="color:#696969"># Load the shapefile</span>
shapefile_path = <span style="color:green">r'C:\Users\Lenovo\Desktop\medium\India_State_Boundary.shp'</span>
gdf = gpd.read_file(shapefile_path)

<span style="color:#696969"># Set up the plot</span>
fig, ax = plt.subplots(figsize=(<span style="color:#aa5d00">10</span>, <span style="color:#aa5d00">8</span>), subplot_kw={<span style="color:green">'projection'</span>: ccrs.PlateCarree()})

<span style="color:#696969"># Add features to the plot</span>
ax.coastlines()
ax.add_feature(cfeature.OCEAN)

<span style="color:#696969"># Add gridlines with labels</span>
gridlines = ax.gridlines(draw_labels=<span style="color:#aa5d00">True</span>, linestyle=<span style="color:green">'--'</span>, color=<span style="color:green">'gray'</span>, alpha=<span style="color:#aa5d00">0.7</span>)
gridlines.top_labels = <span style="color:#aa5d00">False</span>
gridlines.right_labels = <span style="color:#aa5d00">False</span>
gridlines.left_labels = <span style="color:#aa5d00">True</span>
gridlines.bottom_labels = <span style="color:#aa5d00">True</span>

<span style="color:#696969"># Adjust the padding and font size of the labels</span>
gridlines.xlabel_style = {<span style="color:green">'size'</span>: <span style="color:#aa5d00">10</span>, <span style="color:green">'color'</span>: <span style="color:green">'black'</span>}
gridlines.ylabel_style = {<span style="color:green">'size'</span>: <span style="color:#aa5d00">10</span>, <span style="color:green">'color'</span>: <span style="color:green">'black'</span>}

<span style="color:#696969"># Plot the yearly mean rainfall</span>
pc = yearly_mean.plot.pcolormesh(ax=ax, cmap=<span style="color:green">'rainbow_r'</span>, vmin=<span style="color:#aa5d00">0</span>, vmax=<span style="color:#aa5d00">2</span>,
                                cbar_kwargs={<span style="color:green">'orientation'</span>: <span style="color:green">'vertical'</span>, <span style="color:green">'pad'</span>: <span style="color:#aa5d00">0.06</span>, <span style="color:green">'shrink'</span>: <span style="color:#aa5d00">0.5</span>, <span style="color:green">'aspect'</span>: <span style="color:#aa5d00">30</span>})

<span style="color:#696969"># Overlay the shapefile using geopandas</span>
gdf.plot(ax=ax, transform=ccrs.PlateCarree(), edgecolor=<span style="color:green">'black'</span>, facecolor=<span style="color:green">'none'</span>, linewidth=<span style="color:#aa5d00">1</span>)

<span style="color:#696969"># Set the title</span>
ax.set_title(<span style="color:green">'Total Rainfall'</span>)

<span style="color:#696969"># Show the plot</span>
plt.show()</code></span></span>

没有任何

Netcdf 数据印度(图片来自作者)

步骤5:使用shapefile剪切数据

裁剪方法允许我们使用定义感兴趣边界的shapefile文件来聚焦于特定的感兴趣区域。以下是如何对NetCDF数据进行裁剪:

<span style="background-color:#e3e2e2"><span style="color:#1f2937">复制<code class="language-python"><span style="color:#7928a1">import</span> xarray <span style="color:#7928a1">as</span> xr
<span style="color:#7928a1">import</span> rioxarray
<span style="color:#7928a1">import</span> geopandas <span style="color:#7928a1">as</span> gpd

<span style="color:#696969"># Correctly specify the paths to the files</span>
netcdf_file = <span style="color:green">r'C:\Users\Lenovo\Desktop\medium\rainfall.nc'</span>
shapefile = <span style="color:green">r'C:\Users\Lenovo\Desktop\medium\rajasthan.shp'</span>

<span style="color:#696969"># Open the dataset using xarray</span>
data = xr.open_dataset(netcdf_file)

<span style="color:#696969"># Set spatial dimensions and CRS</span>
data = data.rio.set_spatial_dims(x_dim=<span style="color:green">"lon"</span>, y_dim=<span style="color:green">"lat"</span>, inplace=<span style="color:#aa5d00">True</span>)
data = data.rio.write_crs(<span style="color:green">"epsg:4326"</span>, inplace=<span style="color:#aa5d00">True</span>)

<span style="color:#696969"># Load the shapefile using geopandas</span>
gdf = gpd.read_file(shapefile)
gdf = gdf.to_crs(data.rio.crs)

<span style="color:#696969"># Extract geometries from the GeoDataFrame</span>
geometries = gdf.geometry

<span style="color:#696969"># Clip the dataset using the geometries</span>
clipped_data = data.rio.clip(geometries, gdf.crs)

<span style="color:#696969"># Save the clipped data to a new NetCDF file</span>
output_path = <span style="color:green">r'C:\Users\Lenovo\Desktop\medium\clip_rainfall.nc'</span>
clipped_data.to_netcdf(output_path)

<span style="color:#696969"># Display the clipped data</span>
<span style="color:#aa5d00">print</span>(clipped_data)</code></span></span>
步骤 6:裁剪后绘图

数据集被剪辑后,请检查结果以确保剪辑正确完成。

<span style="background-color:#e3e2e2"><span style="color:#1f2937">复制<code class="language-python"><span style="color:#7928a1">import</span> xarray <span style="color:#7928a1">as</span> xr
<span style="color:#7928a1">import</span> cartopy.crs <span style="color:#7928a1">as</span> ccrs
<span style="color:#7928a1">import</span> cartopy.feature <span style="color:#7928a1">as</span> cfeature
<span style="color:#7928a1">import</span> matplotlib.pyplot <span style="color:#7928a1">as</span> plt
<span style="color:#7928a1">import</span> geopandas <span style="color:#7928a1">as</span> gpd

<span style="color:#696969"># Load the clipped dataset</span>
ds_clipped = xr.open_dataset(<span style="color:green">r'C:\Users\Lenovo\Desktop\medium\clip_rainfall.nc'</span>)

<span style="color:#696969"># Compute the yearly mean rainfall</span>
yearly_mean_clipped = ds_clipped.rain.mean(dim=<span style="color:green">'time'</span>)

<span style="color:#696969"># Load the shapefile</span>
shapefile_path = <span style="color:green">r'C:\Users\Lenovo\Desktop\medium\rajasthan.shp'</span>
gdf = gpd.read_file(shapefile_path)

<span style="color:#696969"># Set up the plot</span>
fig, ax = plt.subplots(figsize=(<span style="color:#aa5d00">10</span>, <span style="color:#aa5d00">8</span>), subplot_kw={<span style="color:green">'projection'</span>: ccrs.PlateCarree()})

<span style="color:#696969"># Add features to the plot</span>
ax.coastlines()
ax.add_feature(cfeature.OCEAN)

<span style="color:#696969"># Add gridlines with labels</span>
gridlines = ax.gridlines(draw_labels=<span style="color:#aa5d00">True</span>, linestyle=<span style="color:green">'--'</span>, color=<span style="color:green">'gray'</span>, alpha=<span style="color:#aa5d00">0.7</span>)
gridlines.top_labels = <span style="color:#aa5d00">False</span>
gridlines.right_labels = <span style="color:#aa5d00">False</span>
gridlines.left_labels = <span style="color:#aa5d00">True</span>
gridlines.bottom_labels = <span style="color:#aa5d00">True</span>

<span style="color:#696969"># Adjust the padding and font size of the labels</span>
gridlines.xlabel_style = {<span style="color:green">'size'</span>: <span style="color:#aa5d00">10</span>, <span style="color:green">'color'</span>: <span style="color:green">'black'</span>}
gridlines.ylabel_style = {<span style="color:green">'size'</span>: <span style="color:#aa5d00">10</span>, <span style="color:green">'color'</span>: <span style="color:green">'black'</span>}

<span style="color:#696969"># Plot the yearly mean rainfall</span>
pc = yearly_mean_clipped.plot.pcolormesh(ax=ax, cmap=<span style="color:green">'rainbow_r'</span>, vmin=<span style="color:#aa5d00">0</span>, vmax=<span style="color:#aa5d00">2</span>,
                                         cbar_kwargs={<span style="color:green">'orientation'</span>: <span style="color:green">'vertical'</span>, <span style="color:green">'pad'</span>: <span style="color:#aa5d00">0.06</span>, <span style="color:green">'shrink'</span>: <span style="color:#aa5d00">0.5</span>, <span style="color:green">'aspect'</span>: <span style="color:#aa5d00">30</span>})

<span style="color:#696969"># Overlay the shapefile using geopandas</span>
gdf.plot(ax=ax, transform=ccrs.PlateCarree(), edgecolor=<span style="color:green">'black'</span>, facecolor=<span style="color:green">'none'</span>, linewidth=<span style="color:#aa5d00">1</span>)

<span style="color:#696969"># Set the title</span>
ax.set_title(<span style="color:green">'Total Rainfall (Clipped)'</span>)

<span style="color:#696969"># Show the plot</span>
plt.show()</code></span></span>

没有任何

剪辑的 netcdf 数据感兴趣区域(作者提供的图片)

结论

通过可视化裁剪前后的 NetCDF 数据,您可以有效地评估和分析感兴趣的特定地理区域。这种方法使我们能够将特定感兴趣的区域聚焦到数据集内的空间模式中。NetCDF 文件是理解和解释地理空间数据分析的一项关键技能。

Logo

火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。

更多推荐