概述

在我平时使用 MongoDB 的时候,虽然说都是 CRUD 操作,但是,就以真实情况来说,在手工操作中,创建和删除的操作是很少的,更新操作偶尔,比创建和删除操作多一点,但是,把这三种操作的次数加起来,再翻几番可能都不如查询操作来得多。

查询操作,可能最简单的两种可能就是 List 和 Get 了,查看一下有哪些数据,那么就 list 一下,而想要查询具体的一条记录的详细信息,那么就需要 Get 一个。然而,光是这两个操作就有点复杂了,例如拿 List 来说,List 可以全量数据来查,但是你看得过来吗?既然看不过来势必就需要分页,那么分页怎么选择页码,怎么确定页大小,这就是一些选项了;还有,一条记录那么字段,我真的想 list 所有的字段吗?大多数情况下不是的,所以还得过滤字段,这又是一些选项;诸如此类种种。

所以本文就将对 MongoDB 在日常中的常用操作进行一个小结,内容包括但不限制简单的查询,条件过滤以及分页等,还会总结一下数据聚合等操作。但是,光描述没意思,所以全文都会以一段记录来进行,在开始之前,我会先准备一段数据记录,然后后续都会根据这个记录来操作,这份记录的数据你可以从我的 Github 中下载获得:MongoDB 演示数据

数据准备

因为我的数据放在 Github 了,并且是 json 形式,所以你可以直接下载下来,并且导入到 MongoDB 中:

[[email protected]]# wget https://raw.githubusercontent.com/liuliqiang/blog-demos/master/data/mongodb/students.json -O /tmp/students.json
[[email protected]]# mongoimport  -d  liqiang-io  -c  students  /tmp/students.json
2019-06-15T18:51:13.629+0800    connected to: localhost:27017
2019-06-15T18:51:13.814+0800    imported 200 documents
[[email protected]]# mongo
MongoDB shell version v4.0.10
connecting to: mongodb://127.0.0.1:27018/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("bb1006c3-5d90-4529-bcd0-44e7d0b57191") }
MongoDB server version: 4.1.5
WARNING: shell and server versions do not match
Server has startup warnings: 
> use liqiang-io;
switched to db liqiang-io
> db.students.count();
200

最后看到数据是 200 之后就表示我们的数据导入成功啦,然后不要退出 Mongo Shell,后面我的操作都将在这个环境进行,在正式开始之前,我们还是得清除一下数据的结构是怎么样的,所以可以这么看:

[[email protected]]> db.students.findOne();
{
    "_id" : 2,
    "name" : "Corliss Zuk",
    "scores" : [
        {
            "score" : 67.03077096065002,
            "type" : "exam"
        },
        {
            "score" : 6.301851677835235,
            "type" : "quiz"
        },
        {
            "score" : 66.28344683278382,
            "type" : "homework"
        }
    ]
}

简单的查询

一个比较常规的查询就是查看所有的记录了:

[[email protected]] > db.students.find();
{ "_id" : 2, "name" : "Corliss Zuk", "scores" : [ { "score" : 67.03077096065002, "type" : "exam" }, { "score" : 6.301851677835235, "type" : "quiz" }, { "score" : 66.28344683278382, "type" : "homework" } ] }
{ "_id" : 4, "name" : "Zachary Langlais", "scores" : [ { "score" : 78.68385091304332, "type" : "exam" }, { "score" : 90.2963101368042, "type" : "quiz" }, { "score" : 34.41620148042529, "type" : "homework" } ] }
{ "_id" : 5, "name" : "Wilburn Spiess", "scores" : [ { "score" : 44.87186330181261, "type" : "exam" }, { "score" : 25.72395114668016, "type" : "quiz" }, { "score" : 63.42288310628662, "type" : "homework" } ] }
{ "_id" : 3, "name" : "Bao Ziglar", "scores" : [ { "score" : 71.64343899778332, "type" : "exam" }, { "score" : 24.80221293650313, "type" : "quiz" }, { "score" : 42.26147058804812, "type" : "homework" } ] }
... ...
Type "it" for more

分页

可以发现,MongoDB 自动帮我们做了分页,这里默认的分页大小是 20 个,然后通过 it 命令翻页,可惜的是只有向下翻页,没有向上翻页(难道默认是自己拉 Terminal 的屏幕看上一页的内容?),那么如果我们要自己翻页该怎么查呢?这里其实有两个参数配合,那就是 limitskip

[[email protected]] > db.students.find().limit(2).skip(0);
{ "_id" : 2, "name" : "Corliss Zuk", "scores" : [ { "score" : 67.03077096065002, "type" : "exam" }, { "score" : 6.301851677835235, "type" : "quiz" }, { "score" : 66.28344683278382, "type" : "homework" } ] }
{ "_id" : 4, "name" : "Zachary Langlais", "scores" : [ { "score" : 78.68385091304332, "type" : "exam" }, { "score" : 90.2963101368042, "type" : "quiz" }, { "score" : 34.41620148042529, "type" : "homework" } ] }
[[email protected]] > db.students.find().limit(2).skip(0);
{ "_id" : 5, "name" : "Wilburn Spiess", "scores" : [ { "score" : 44.87186330181261, "type" : "exam" }, { "score" : 25.72395114668016, "type" : "quiz" }, { "score" : 63.42288310628662, "type" : "homework" } ] }
{ "_id" : 3, "name" : "Bao Ziglar", "scores" : [ { "score" : 71.64343899778332, "type" : "exam" }, { "score" : 24.80221293650313, "type" : "quiz" }, { "score" : 42.26147058804812, "type" : "homework" } ] }

嗯,现在看上去似乎好多了,但是,为什么这里的记录的 ID 都是乱的呢?怎么着也得是升序的吧?

排序

[[email protected]] > > db.students.find().limit(2).skip(0).sort({"_id": 1});
{ "_id" : 0, "name" : "aimee Zank", "scores" : [ { "score" : 1.463179736705023, "type" : "exam" }, { "score" : 11.78273309957772, "type" : "quiz" }, { "score" : 35.8740349954354, "type" : "homework" } ] }
{ "_id" : 1, "name" : "Aurelia Menendez", "scores" : [ { "score" : 60.06045071030959, "type" : "exam" }, { "score" : 52.79790691903873, "type" : "quiz" }, { "score" : 71.76133439165544, "type" : "homework" } ] }
[[email protected]] > db.students.find().limit(2).skip(2).sort({"_id": 1});
{ "_id" : 2, "name" : "Corliss Zuk", "scores" : [ { "score" : 67.03077096065002, "type" : "exam" }, { "score" : 6.301851677835235, "type" : "quiz" }, { "score" : 66.28344683278382, "type" : "homework" } ] }
{ "_id" : 3, "name" : "Bao Ziglar", "scores" : [ { "score" : 71.64343899778332, "type" : "exam" }, { "score" : 24.80221293650313, "type" : "quiz" }, { "score" : 42.26147058804812, "type" : "homework" } ] }

看上去现在是正常了,那么你有没有疑问呢?那就是这里的 limit/skipsort 的顺序有没有影响?例如上面的语义上看上去似乎是有问题的,但是事实上它们正如我们所期待的工作,这里又是什么原因?关于这个真正的执行过程,请继续关注我的博客:[liqiang.io](https://liqiang.io],我将会在后面的文章中介绍。

过滤条件

光是简单的查询我们已经用得得心应手了,那么该说说怎么过滤一些记录了,让我们的查询更加的有针对性。

字段匹配

如果我想找一个名字为 “Aurelia Menendez” 的同学的信息,那么可以这么查询:

[[email protected]] > db.students.find({"name": "Aurelia Menendez"});
{ "_id" : 1, "name" : "Aurelia Menendez", "scores" : [ { "score" : 60.06045071030959, "type" : "exam" }, { "score" : 52.79790691903873, "type" : "quiz" }, { "score" : 71.76133439165544, "type" : "homework" } ] }
{ "_id" : 115, "name" : "Aurelia Menendez", "scores" : [ { "score" : 5.105728872755167, "type" : "exam" }, { "score" : 7.375913405784407, "type" : "quiz" }, { "score" : 92.62414866541212, "type" : "homework" } ] }

这里可以发现,居然存在两个同学他们的名字是一样的。第一位同学的 ID 是 1,考试成绩居然才只有 60 分,刚刚好及格,那么我想统计一下有多少人的成绩比他多(比 60 分多,这里可能包含他自己),那么可以这么查询:

[[email protected]] > db.students.find( {"scores.0.score": {"$gt":  60}}).count();
80

这里有两个知识点,分别是:

这里只是比较数字,那么我想比较名字怎么说,例如我想找所有名字为 “Aurelia” 的同学怎么找:

[[email protected]]> db.students.find({name: {$regex: "Aurelia .*"}})
{ "_id" : 1, "name" : "Aurelia Menendez", "scores" : [ { "score" : 60.06045071030959, "type" : "exam" }, { "score" : 52.79790691903873, "type" : "quiz" }, { "score" : 71.76133439165544, "type" : "homework" } ] }
{ "_id" : 115, "name" : "Aurelia Menendez", "scores" : [ { "score" : 5.105728872755167, "type" : "exam" }, { "score" : 7.375913405784407, "type" : "quiz" }, { "score" : 92.62414866541212, "type" : "homework" } ] }

可以发现,也只有上面的这两位同名的同学了!这里也有两个知识点:

聚合

既然是和学生相关,并且是成绩,那么肯定是少不了统计数据的拉,前面我们看到的有同名的同学,那么我想知道有哪些名字是有同名的:

[[email protected]] > db.students.aggregate([ {$match: {}}, {$group: {_id: "$name", count: {$sum: 1}}}, {$match: {count: {$gt: 1}}} ])
{ "_id" : "Mariela Sherer", "count" : 2 }
{ "_id" : "Terica Brugger", "count" : 2 }
{ "_id" : "Vina Matsunaga", "count" : 2 }
{ "_id" : "Synthia Labelle", "count" : 2 }
{ "_id" : "Ernestine Macfarland", "count" : 2 }
{ "_id" : "Richelle Siemers", "count" : 2 }
{ "_id" : "Mariette Batdorf", "count" : 2 }
{ "_id" : "Lady Lefevers", "count" : 2 }
{ "_id" : "Meagan Oakes", "count" : 2 }
{ "_id" : "Kayce Kenyon", "count" : 2 }
{ "_id" : "Merissa Mann", "count" : 2 }
{ "_id" : "Tonisha Games", "count" : 2 }
{ "_id" : "Tamika Schildgen", "count" : 2 }
{ "_id" : "Aleida Elsass", "count" : 2 }
{ "_id" : "Laureen Salomone", "count" : 2 }
{ "_id" : "Elizabet Kleine", "count" : 2 }
{ "_id" : "Gwyneth Garling", "count" : 2 }
{ "_id" : "aimee Zank", "count" : 2 }
{ "_id" : "Malisa Jeanes", "count" : 2 }
{ "_id" : "Rudolph Domingo", "count" : 2 }
Type "it" for more

这里有发现同名的人居然很多,都不够看了,同时这个查询也是干货满满:

当我们了解了这些之后,例如想求平均分,和按平均分排名之类的操作都是简单的事情了,更多关于允许出现哪些操作的内容可以从官方文档的这个页面查看:Pipeline Aggregation Stages

小结

本文从一个简单的数据集开始,介绍了 MongoDB 中各种常见的查询操作,虽然简单,但是后续发挥空间还是无限大的;虽然本文出于篇幅和时间考虑,不能更多得展示 MongoDB 查询的功能,但是,我会保持更新,如果我觉得有一些内容应该讲,但是没有讲到的话。

Reference