文本 Slugify（URL）

将文本规范化为 URL 友好的 slug，支持小写、分隔符、自定义去停用词。

参数与输入

输入文本

分隔符建议使用 - 或 _，长度 1~3 个字符

转为小写移除常见停用词（英文）

结果

为什么需要 Slugify？

🔍 SEO 优化

URL 中的关键词有助于搜索引擎理解页面内容，提升排名。如 example.com/blog/how-to-learn-javascript 比 example.com/blog/123 更友好。

👁️ 可读性与分享

用户看到 URL 就能知道内容是什么，社交媒体分享时更友好，更容易被记住和手动输入。

💻 系统兼容性

避免文件名/URL 中的特殊字符导致错误，跨平台兼容（Windows/Linux/Mac），避免编码问题。

🗄️ 数据库友好

作为唯一标识符（如用户名、标签），避免 SQL 注入风险，便于索引和查询。

什么是 Slugify？

Slug 是对文本进行标准化后作为 URL/文件名/标识符使用的短语串。常见处理包括大小写统一、移除标点、以分隔符连接单词。

ASCII 优先：尽量移除重音与符号，仅保留字母数字与空格
Unicode 兼容：对大多数语言字符进行 NFKD 规范化后处理
URL 友好：结果仅包含字母/数字与分隔符，可直接用于路径

使用场景

博客文章 URL

《如何学习 JavaScript？》

how-to-learn-javascript

文件命名

产品需求文档 v2.0.docx

product-requirements-v2-0.docx

数据库标识符

用户-张三

user-zhang-san

常见问题

Q: 中文字符会怎么处理？

A: 默认会移除音标后保留拼音字母。纯中文可能变为空，建议先手动转拼音后再 slugify，或使用中文拼音转换工具。

Q: 为什么我的结果是空的？

A: 可能输入全是标点/符号/空格，或启用停用词过滤后无剩余单词。尝试关闭停用词选项或调整输入内容。

Q: 分隔符应该用 - 还是 _？

A: SEO 推荐用 - (连字符)，Google 会将其视为空格；_ (下划线) 会被视为连接符，不利于分词。文件名可任选。

Q: Slug 长度有限制吗？

A: 技术上无限制，但建议保持在 50 字符以内，便于 URL 显示与 SEO。过长的 slug 可能被搜索引擎截断。

最佳实践

避免做法

✗ 不要包含敏感信息（如 ID、邮箱、密码）
✗ 不要使用特殊字符（如 @#$%^&*）
✗ 不要保留空格或连续分隔符
✗ 不要重复使用相同单词

技术说明

Unicode 规范化：

使用 NFKD 分解 + 移除组合音标 (\p{M})，将 Café 转为 Cafe。支持大多数拉丁语系字符。

停用词列表：

基于英文常见词（a/an/the/and/or/of/to/in/on/for/at/by/with），可自定义扩展。中文停用词需额外处理。

浏览器兼容：

需支持 ES6+ 和 Unicode 正则表达式（\p{...}）。现代浏览器（Chrome 64+、Firefox 78+、Safari 11.1+）均支持。

如何通过编程语言生成 Slug？

JavaScript

function slugify(text) {
  return text
    .toLowerCase()
    .normalize("NFKD")
    .replace(/[\u0300-\u036f]/g, "")
    .replace(/[^\w\s-]/g, "")
    .trim()
    .replace(/[\s_-]+/g, "-")
    .replace(/^-+|-+$/g, "");
}

PHP

function slugify($text) {
  $text = mb_strtolower($text);
  $text = iconv("UTF-8", "ASCII//TRANSLIT", $text);
  $text = preg_replace("/[^\w\s-]/", "", $text);
  $text = preg_replace("/[\s_-]+/", "-", $text);
  return trim($text, "-");
}

Python

import re
import unicodedata

def slugify(text):
    text = text.lower()
    text = unicodedata.normalize("NFKD", text)
    text = text.encode("ascii", "ignore").decode("ascii")
    text = re.sub(r"[^\w\s-]", "", text)
    text = re.sub(r"[\s_-]+", "-", text)
    return text.strip("-")

Go

import (
    "regexp"
    "strings"
    "golang.org/x/text/unicode/norm"
)

func Slugify(text string) string {
    text = strings.ToLower(text)
    text = norm.NFKD.String(text)
    re := regexp.MustCompile(`[^\w\s-]`)
    text = re.ReplaceAllString(text, "")
    re = regexp.MustCompile(`[\s_-]+`)
    text = re.ReplaceAllString(text, "-")
    return strings.Trim(text, "-")
}

Ruby

require "unicode"

def slugify(text)
  text = text.downcase
  text = Unicode.nfkd(text).gsub(/[^\x00-\x7F]/, "")
  text = text.gsub(/[^\w\s-]/, "")
  text = text.gsub(/[\s_-]+/, "-")
  text.strip.gsub(/^-+|-+$/, "")
end

Java

import java.text.Normalizer;

public static String slugify(String text) {
    text = text.toLowerCase();
    text = Normalizer.normalize(text, Normalizer.Form.NFKD);
    text = text.replaceAll("[^\\w\\s-]", "");
    text = text.replaceAll("[\\s_-]+", "-");
    return text.replaceAll("^-+|-+$", "");
}