News - 11 series#
Release 11.1.3 - 2022-01-29#
Improvements#
[snippet] Added support for using the keyword of 32 or more. [GitHub#1313][Pathched by Takashi Hashida]
We could not specify the keyword of 32 or more with
snippet
until now. However, we can specify the keyword of 32 or more by this improvement as below.table_create Entries TABLE_NO_KEY column_create Entries content COLUMN_SCALAR ShortText load --table Entries [ {"content": "Groonga is a fast and accurate full text search engine based on inverted index. One of the characteristics of Groonga is that a newly registered document instantly appears in search results. Also, Groonga allows updates without read locks. These characteristics result in superior performance on real-time applications.\nGroonga is also a column-oriented database management system (DBMS). Compared with well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are more suited for aggregate queries. Due to this advantage, Groonga can cover weakness of row-oriented systems.\nThe basic functions of Groonga are provided in a C library. Also, libraries for using Groonga in other languages, such as Ruby, are provided by related projects. In addition, groonga-based storage engines are provided for MySQL and PostgreSQL. These libraries and storage engines allow any application to use Groonga. See usage examples."}, {"content": "In widely used DBMSs, updates are immediately processed, for example, a newly registered record appears in the result of the next query. In contrast, some full text search engines do not support instant updates, because it is difficult to dynamically update inverted indexes, the underlying data structure.\nGroonga also uses inverted indexes but supports instant updates. In addition, Groonga allows you to search documents even when updating the document collection. Due to these superior characteristics, Groonga is very flexible as a full text search engine. Also, Groonga always shows good performance because it divides a large task, inverted index merging, into smaller tasks."} ] select Entries \ --output_columns ' \ snippet(content, \ "groonga", "inverted", "index", "fast", "full", "text", "search", "engine", "registered", "document", \ "results", "appears", "also", "system", "libraries", "for", "mysql", "postgresql", "column-oriented", "dbms", \ "basic", "ruby", "projects", "storage", "allow", "application", "usage", "sql", "well-known", "real-time", \ "weakness", "merging", "performance", "superior", "large", "dynamically", "difficult", "query", "examples", "divides", \ { \ "default_open_tag": "[", \ "default_close_tag": "]", \ "width" : 2048 \ })' [ [ 0, 1643165838.691991, 0.0003311634063720703 ], [ [ [ 2 ], [ [ "snippet", null ] ], [ [ "[Groonga] is a [fast] and accurate [full] [text] [search] [engine] based on [inverted] [index]. One of the characteristics of [Groonga] is that a newly [registered] [document] instantly [appears] in [search] [results]. [Also], [Groonga] [allow]s updates without read locks. These characteristics result in [superior] [performance] on [real-time] [application]s.\n[Groonga] is [also] a [column-oriented] database management [system] ([DBMS]). Compared with [well-known] row-oriented [system]s, such as [MySQL] and [PostgreSQL], [column-oriented] [system]s are more suited [for] aggregate queries. Due to this advantage, [Groonga] can cover [weakness] of row-oriented [system]s.\nThe [basic] functions of [Groonga] are provided in a C library. [Also], [libraries] [for] using [Groonga] in other languages, such as [Ruby], are provided by related [projects]. In addition, [groonga]-based [storage] [engine]s are provided [for] [MySQL] and [PostgreSQL]. These [libraries] and [storage] [engine]s [allow] any [application] to use [Groonga]. See [usage] [examples]." ] ], [ [ "In widely used [DBMS]s, updates are immediately processed, [for] example, a newly [registered] record [appears] in the result of the next [query]. In contrast, some [full] [text] [search] [engine]s do not support instant updates, because it is [difficult] to [dynamically] update [inverted] [index]es, the underlying data structure.\n[Groonga] [also] uses [inverted] [index]es but supports instant updates. In addition, [Groonga] [allow]s you to [search] [document]s even when updating the [document] collection. Due to these [superior] characteristics, [Groonga] is very flexible as a [full] [text] [search] [engine]. [Also], [Groonga] always shows good [performance] because it [divides] a [large] task, [inverted] [index] [merging], into smaller tasks." ] ] ] ] ]
[NormalizerNFKC130] Added a new option
remove_symbol
.This option removes symbols (e.g. #, !, “, &, %, …) from the string that the target of normalizing as below.
normalize 'NormalizerNFKC130("remove_symbol", true)' "#This & is %% a pen." WITH_TYPES [ [ 0, 1643595008.729597, 0.0005540847778320312 ], { "normalized": "this is a pen", "types": [ "alpha", "alpha", "alpha", "alpha", "others", "others", "alpha", "alpha", "others", "others", "alpha", "others", "alpha", "alpha", "alpha" ], "checks": [ ] } ]
[AlmaLinux] Added support for AlmaLinux 8 on ARM64.
[httpd] Updated bundled nginx to 1.21.5.
[Documentation] Fixed a typo in ruby_eval. [GitHub#1317][Pathched by wi24rd]
[Ubuntu] Dropped Ubuntu 21.04 (Hirsute Hippo) support.
Because Ubuntu 21.04 reached EOL January 20, 2022.
Fixes#
[load] Fixed a crash bug when we load data with specifying a nonexistent column.
This bug only occurs when we specify
apache-arrow
intoinput_type
as the argument ofload
.Fixed a bug that the version up of Groonga failed Because the version up of arrow-libs on which Groonga depends. [groonga-talk,540][Reported by Josep Sanz][Gitter,61eaaa306d9ba23328d23ce1][Reported by shibanao4870][GitHub#1316][Reported by Keitaro YOSHIMURA]
However, if arrow-libs update a major version, this problem reproduces. In this case, we will handle that by rebuilding the Groonga package.
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
*<
and*>
only valid when we usequery()
the right side of filter condition. If we specify as below,*<
and*>
work as&&
.'content @ "Groonga" *< content @ "Mroonga"'
Groonga may not return records that should match caused by
GRN_II_CURSOR_SET_MIN_ENABLE
.
Thanks#
Takashi Hashida
wi24rd
Josep Sanz
Keitaro YOSHIMURA
shibanao4870
Release 11.1.1 - 2021-12-29#
Improvements#
[select] Added support for near phrase product search.
This feature is a shortcut of
'*NP"..." OR *NP"..." OR ...'
. For example, we can use*NPP
instead of the expression that execute mulitiple*NP
withquery
as below.query ("title * 10 || content", "*NP"a 1 x" OR *NP"a 1 y" OR *NP"a 1 z" OR *NP"a 2 x" OR *NP"a 2 y" OR *NP"a 2 z" OR *NP"a 3 x" OR *NP"a 3 y" OR *NP"a 3 z" OR *NP"b 1 x" OR *NP"b 1 y" OR *NP"b 1 z" OR *NP"b 2 x" OR *NP"b 2 y" OR *NP"b 2 z" OR *NP"b 3 x" OR *NP"b 3 y" OR *NP"b 3 z"")
We can be written as
*NPP"(a b) (1 2 3) (x y z)"
the above expression by this feature. In addition,*NPP"(a b) (1 2 3) (x y z)"
is faster than'*NP"..." OR *NP"..." OR ...'
.query ("title * 10 || content", "*NPP"(a b) (1 2 3) (x y z)"")
We implements this feature for improving performance near phrase search like
'*NP"..." OR *NP"..." OR ...'
.[select] Added support for order near phrase product search.
This feature is a shortcut of
'*ONP"..." OR *ONP"..." OR ...'
. For example, we can use*ONPP
instead of the expression that execute multiple*ONP
withquery
as below.query ("title * 10 || content", "*ONP"a 1 x" OR *ONP"a 1 y" OR *ONP"a 1 z" OR *ONP"a 2 x" OR *ONP"a 2 y" OR *ONP"a 2 z" OR *ONP"a 3 x" OR *ONP"a 3 y" OR *ONP"a 3 z" OR *ONP"b 1 x" OR *ONP"b 1 y" OR *ONP"b 1 z" OR *ONP"b 2 x" OR *ONP"b 2 y" OR *ONP"b 2 z" OR *ONP"b 3 x" OR *ONP"b 3 y" OR *ONP"b 3 z"")
We can be written as
*ONPP"(a b) (1 2 3) (x y z)"
the above expression by this feature. In addition,*ONPP"(a b) (1 2 3) (x y z)"
is faster than'*ONP"..." OR *ONP"..." OR ...'
.query ("title * 10 || content", "*ONPP"(a b) (1 2 3) (x y z)"")
We implements this feature for improving performance near phrase search like
'*ONP"..." OR *ONP"..." OR ...'
.[request_cancel] Groonga became easily detects
request_cancel
while executing a search.Because we added more checks of return code to detect
request_cancel
.[thread_dump] Added a new command
thread_dump
Currently, this command works only on Windows.
We can put a backtrace of all threads into a log as logs of NOTICE level at the time of running this command.
This feature is useful when we solve a problem such as Groonga doesn’t return a response.
[CentOS] Dropped support for CentOS 8.
Because CentOS 8 will reach EOL at 2021-12-31.
Fixes#
Fixed a bug that we can’t remove a index column with invalid parameter. [GitHub#1301][Patched by Takashi Hashida]
For example, we can’t remove a table when we create an invalid index column with
column_create
as below.table_create Statuses TABLE_NO_KEY column_create Statuses start_time COLUMN_SCALAR UInt16 column_create Statuses end_time COLUMN_SCALAR UInt16 table_create Times TABLE_PAT_KEY UInt16 column_create Times statuses COLUMN_INDEX Statuses start_time,end_time [ [ -22, 1639037503.16114, 0.003981828689575195, "grn_obj_set_info(): GRN_INFO_SOURCE: multi column index must be created with WITH_SECTION flag: <Times.statuses>", [ [ "grn_obj_set_info_source_validate", "../../groonga/lib/db.c", 9605 ], [ "/tmp/d.grn", 6, "column_create Times statuses COLUMN_INDEX Statuses start_time,end_time" ] ] ], false ] table_remove Times [ [ -22, 1639037503.16515, 0.0005414485931396484, "[object][remove] column is broken: <Times.statuses>", [ [ "remove_columns", "../../groonga/lib/db.c", 10649 ], [ "/tmp/d.grn", 8, "table_remove Times" ] ] ], false ]
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
*<
and*>
only valid when we usequery()
the right side of filter condition. If we specify as below,*<
and*>
work as&&
.'content @ "Groonga" *< content @ "Mroonga"'
Groonga may not return records that should match caused by
GRN_II_CURSOR_SET_MIN_ENABLE
.
Thanks#
Takashi Hashida
Release 11.1.0 - 2021-11-29#
Improvements#
[load] Added support for ISO 8601 time format.[GitHub#1228][Patched by Takashi Hashida]
load
support the following format by this modification.YYYY-MM-ddThh:mm:ss.sZ
YYYY-MM-ddThh:mm:ss.s+10:00
YYYY-MM-ddThh:mm:ss.s-10:00
We can also use
t
andz
characters instead ofT
andZ
in this syntax. We can also use/
character instead of-
in this syntax. However, note that this is not an ISO 8601 format. This format is present for compatibility.plugin_register functions/time table_create Logs TABLE_NO_KEY column_create Logs case COLUMN_SCALAR ShortText column_create Logs created_at COLUMN_SCALAR Time column_create Logs created_at_text COLUMN_SCALAR ShortText load --table Logs [ {"case": "timezone: Z", "created_at": "2000-01-01T10:00:00Z", "created_at_text": "2000-01-01T10:00:00Z"}, {"case": "timezone: z", "created_at": "2000-01-01t10:00:00z", "created_at_text": "2000-01-01T10:00:00z"}, {"case": "timezone: 00:00", "created_at": "2000-01-01T10:00:00+00:00", "created_at_text": "2000-01-01T10:00:00+00:00"}, {"case": "timezone: +01:01", "created_at": "2000-01-01T11:01:00+01:01", "created_at_text": "2000-01-01T11:01:00+01:01"}, {"case": "timezone: +11:11", "created_at": "2000-01-01T21:11:00+11:11", "created_at_text": "2000-01-01T21:11:00+11:11"}, {"case": "timezone: -01:01", "created_at": "2000-01-01T08:59:00-01:01", "created_at_text": "2000-01-01T08:59:00-01:01"}, {"case": "timezone: -11:11", "created_at": "1999-12-31T22:49:00-11:11", "created_at_text": "1999-12-31T22:49:00-11:11"}, {"case": "timezone hour threshold: +23:00", "created_at": "2000-01-02T09:00:00+23:00", "created_at_text": "2000-01-02T09:00:00+23:00"}, {"case": "timezone minute threshold: +00:59", "created_at": "2000-01-01T10:59:00+00:59", "created_at_text": "2000-01-01T10:59:00+00:59"}, {"case": "timezone omitting minute: +01", "created_at": "2000-01-01T11:00:00+01", "created_at_text": "2000-01-01T11:00:00+01"}, {"case": "timezone omitting minute: -01", "created_at": "2000-01-01T09:00:00-01", "created_at_text": "2000-01-01T09:00:00-01"}, {"case": "timezone: localtime", "created_at": "2000-01-01T19:00:00", "created_at_text": "2000-01-01T19:00:00"}, {"case": "compatible: date delimiter: /", "created_at": "2000/01/01T10:00:00Z", "created_at_text": "2000/01/01T10:00:00Z"}, {"case": "decimal", "created_at": "2000-01-01T11:01:00.123+01:01", "created_at_text": "2000-01-01T11:01:00.123+01:01"} ] select Logs \ --limit -1 \ --output_columns "case, time_format_iso8601(created_at), created_at_text" [ [ 0, 0.0, 0.0 ], [ [ [ 14 ], [ [ "case", "ShortText" ], [ "time_format_iso8601", null ], [ "created_at_text", "ShortText" ] ], [ "timezone: Z", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T10:00:00Z" ], [ "timezone: z", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T10:00:00z" ], [ "timezone: 00:00", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T10:00:00+00:00" ], [ "timezone: +01:01", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T11:01:00+01:01" ], [ "timezone: +11:11", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T21:11:00+11:11" ], [ "timezone: -01:01", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T08:59:00-01:01" ], [ "timezone: -11:11", "2000-01-01T19:00:00.000000+09:00", "1999-12-31T22:49:00-11:11" ], [ "timezone hour threshold: +23:00", "2000-01-01T19:00:00.000000+09:00", "2000-01-02T09:00:00+23:00" ], [ "timezone minute threshold: +00:59", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T10:59:00+00:59" ], [ "timezone omitting minute: +01", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T11:00:00+01" ], [ "timezone omitting minute: -01", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T09:00:00-01" ], [ "timezone: localtime", "2000-01-01T19:00:00.000000+09:00", "2000-01-01T19:00:00" ], [ "compatible: date delimiter: /", "2000-01-01T19:00:00.000000+09:00", "2000/01/01T10:00:00Z" ], [ "decimal", "2000-01-01T19:00:00.123000+09:00", "2000-01-01T11:01:00.123+01:01" ] ] ] ]
[select] Added a new
query_flags
DISABLE_PREFIX_SEARCH
.We can use the prefix search operators
^
and*
as search keywords byDISABLE_PREFIX_SEARCH
as below.This feature is useful if we want to search documents including
^
and*
.table_create Users TABLE_PAT_KEY ShortText load --table Users [ {"_key": "alice"}, {"_key": "alan"}, {"_key": "ba*"} ] select Users \ --match_columns "_key" \ --query "a*" \ --query_flags "DISABLE_PREFIX_SEARCH" [[0,0.0,0.0],[[[1],[["_id","UInt32"],["_key","ShortText"]],[3,"ba*"]]]]
table_create Users TABLE_PAT_KEY ShortText load --table Users [ {"_key": "alice"}, {"_key": "alan"}, {"_key": "^a"} ] select Users \ --query "_key:^a" \ --query_flags "ALLOW_COLUMN|DISABLE_PREFIX_SEARCH" [[0,0.0,0.0],[[[1],[["_id","UInt32"],["_key","ShortText"]],[3,"^a"]]]]
[select] Added a new
query_flags
DISABLE_AND_NOT
.We can use
AND NOT
operators-
as search keywords byDISABLE_AND_NOT
as below.This feature is useful if we want to search documents including
-
.table_create Users TABLE_PAT_KEY ShortText load --table Users [ {"_key": "alice"}, {"_key": "bob"}, {"_key": "cab-"} ] select Users --match_columns "_key" --query "b - a" --query_flags "DISABLE_AND_NOT" [[0,0.0,0.0],[[[1],[["_id","UInt32"],["_key","ShortText"]],[3,"cab-"]]]]
Fixes#
[The browser based administration tool] Fixed a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list. [GitHub#1186][Patched by Takashi Hashida]
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
*<
and*>
only valid when we usequery()
the right side of filter condition. If we specify as below,*<
and*>
work as&&
.'content @ "Groonga" *< content @ "Mroonga"'
Groonga may not return records that should match caused by
GRN_II_CURSOR_SET_MIN_ENABLE
.
Thanks#
Takashi Hashida
Release 11.0.9 - 2021-11-04#
Improvements#
[snippet] Added a new option
delimiter_regexp
for detecting snippet delimiter with regular expression.snippet extracts text around search keywords. We call the text that is extracted by snippet snippet.
Normally, snippet () returns the text of 200 bytes around search keywords. However, snippet () gives no thought to a delimiter of sentences. The snippet may be composed of multi sentences.
delimiter_regexp
option is useful if we want to only extract the text of the same sentence as search keywords. For example, we can use\.\s*
to extract only text in the target sentence as below. Note that you need to escape\
in string.table_create Documents TABLE_NO_KEY column_create Documents content COLUMN_SCALAR Text table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto column_create Terms documents_content_index COLUMN_INDEX|WITH_POSITION Documents content load --table Documents [ ["content"], ["Groonga is a fast and accurate full text search engine based on inverted index. One of the characteristics of groonga is that a newly registered document instantly appears in search results. Also, groonga allows updates without read locks. These characteristics result in superior performance on real-time applications."], ["Groonga is also a column-oriented database management system (DBMS). Compared with well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are more suited for aggregate queries. Due to this advantage, groonga can cover weakness of row-oriented systems."] ] select Documents \ --output_columns 'snippet(content, \ { \ "default_open_tag": "[", \ "default_close_tag": "]", \ "delimiter_regexp": "\\\\.\\\\s*" \ })' \ --match_columns content \ --query "fast performance" [ [ 0, 1337566253.89858, 0.000355720520019531 ], [ [ [ 1 ], [ [ "snippet", null ] ], [ [ "Groonga is a [fast] and accurate full text search engine based on inverted index", "These characteristics result in superior [performance] on real-time applications" ] ] ] ] ]
[window_rank] Added a new function
window_rank()
.We can calculate a rank that includes a gap of each record. Normally, the rank isn’t incremented when multiple records that are the same order. For example, if values of sort keys are 100, 100, 200 then the ranks of them are 1, 1, 3. The rank of the last record is 3 not 2 because there are two 1 rank records.
This is similar to window_record_number. However, window_record_number gives no thought to gap.
table_create Points TABLE_NO_KEY column_create Points game COLUMN_SCALAR ShortText column_create Points score COLUMN_SCALAR UInt32 load --table Points [ ["game", "score"], ["game1", 100], ["game1", 200], ["game1", 100], ["game1", 400], ["game2", 150], ["game2", 200], ["game2", 200], ["game2", 200] ] select Points \ --columns[rank].stage filtered \ --columns[rank].value 'window_rank()' \ --columns[rank].type UInt32 \ --columns[rank].window.sort_keys score \ --output_columns 'game, score, rank' \ --sort_keys score [ [ 0, 1337566253.89858, 0.000355720520019531 ], [ [ [ 8 ], [ [ "game", "ShortText" ], [ "score", "UInt32" ], [ "rank", "UInt32" ] ], [ "game1", 100, 1 ], [ "game1", 100, 1 ], [ "game2", 150, 3 ], [ "game2", 200, 4 ], [ "game2", 200, 4 ], [ "game1", 200, 4 ], [ "game2", 200, 4 ], [ "game1", 400, 8 ] ] ] ]
[in_values] Added support for auto cast when we search tables.
For example, if we load values of
UInt32
into a table that a key type isUInt64
, Groonga cast the values toUInt64
automatically when we search the table within_values()
. However,in_values(_key, 10)
doesn’t work withUInt64
key table. Because 10 is parsed asInt32
.table_create Numbers TABLE_HASH_KEY UInt64 load --table Numbers [ {"_key": 100}, {"_key": 200}, {"_key": 300} ] select Numbers --output_columns _key --filter 'in_values(_key, 200, 100)' --sortby _id [[0,0.0,0.0],[[[2],[["_key","UInt64"]],[100],[200]]]]
[httpd] Updated bundled nginx to 1.21.3.
[AlmaLinux] Added support for AlmaLinux 8.
[Ubuntu] Added support for Ubuntu 21.10 (Impish Indri).
Fixes#
Fixed a bug that Groonga doesn’t return a response when an error occurred in command (e.g. sytax error in filter).
This bug only occurs when we use
--output_type apache-arrow
.
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
[The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.
*<
and*>
only valid when we usequery()
the right side of filter condition. If we specify as below,*<
and*>
work as&&
.'content @ "Groonga" *< content @ "Mroonga"'
Groonga may not return records that should match caused by
GRN_II_CURSOR_SET_MIN_ENABLE
.
Release 11.0.7 - 2021-09-29#
Improvements#
[load] Added support for casting a string like as “[int, int,…]” to a vector of integer like as [int, int,…].
For example, Groonga handle as a vector of integer like as [1, -2] even if we load vector of string like as “[1, -2]” as below.
table_create Data TABLE_NO_KEY column_create Data numbers COLUMN_VECTOR Int16 table_create Numbers TABLE_PAT_KEY Int16 column_create Numbers data_numbers COLUMN_INDEX Data numbers load --table Data [ {"numbers": "[1, -2]"}, {"numbers": "[-3, 4]"} ] dump --dump_plugins no --dump_schema no load --table Data [ ["_id","numbers"], [1,[1,-2]], [2,[-3,4]] ] column_create Numbers data_numbers COLUMN_INDEX Data numbers select Data --filter 'numbers @ -2' [[0,0.0,0.0],[[[1],[["_id","UInt32"],["numbers","Int16"]],[1,[1,-2]]]]]
This feature supports for the floowings types.
Int8
UInt8
Int16
UInt16
Int32
UInt32
Int64
UInt64
[load] Added support for loading a JSON array expressed as a text string as a vector of string.
For example, Groonga handle as a vector that has two elements like as [“hello”, “world”] if we load JSON array expressed as a text string like as “["hello", "world"]” as below.
table_create Data TABLE_NO_KEY [[0,0.0,0.0],true] column_create Data strings COLUMN_VECTOR ShortText [[0,0.0,0.0],true] table_create Terms TABLE_PAT_KEY ShortText --normalizer NormalizerNFKC130 --default_tokenizer TokenNgram [[0,0.0,0.0],true] column_create Terms data_strings COLUMN_INDEX Data strings [[0,0.0,0.0],true] load --table Data [ {"strings": "[\"Hello\", \"World\"]"}, {"strings": "[\"Good-bye\", \"World\"]"} ] [[0,0.0,0.0],2] dump --dump_plugins no --dump_schema no load --table Data [ ["_id","strings"], [1,["Hello","World"]], [2,["Good-bye","World"]] ] column_create Terms data_strings COLUMN_INDEX Data strings select Data --filter 'strings @ "bye"' [ [ 0, 0.0, 0.0 ], [ [ [ 1 ], [ [ "_id", "UInt32" ], [ "strings", "ShortText" ] ], [ 2, [ "Good-bye", "World" ] ] ] ] ]
In before version, Groonga handled as a vector that had one element like as [“["hello", "world"]”] if we loaded JSON array expressed as a text string like as “["hello", "world"]”.
[Documentation] Added a documentation about the following items.
[column_create] Added a documentation about
WEIGHT_FLOAT32
flag.[NormalizerNFKC121] Added a documentation about
NormalizerNFKC121
.[NormalizerNFKC130] Added a documentation about
NormalizerNFKC130
.[NormalizerTable] Added a documentation about
NormalizerTable
.
Updated to 3.0.0 that the version of Apache Arrow that Groonga requires. [GitHub#1265][Patched by Takashi Hashida]
Fixes#
Fixed a memory leak when we created a table with a tokenizer with invalid option.
Fixed a bug that may not add a new entry in Hash table.
This bug only occurs in Groonga 11.0.6, and it may occur if we quite a lot of add and delete data. If this bug occurs in your environment, you can resolve this problem by executing the following steps.
We upgrade Groonga to 11.0.7 or later from 11.0.6.
We make a new table that has the same schema as the original table.
We copy data to the new table from the original table.
[Windows] Fixed a resource leak when Groonga fail open a new file caused by out of memory.
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
[The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.
*<
and*>
only valid when we usequery()
the right side of filter condition. If we specify as below,*<
and*>
work as&&
.'content @ "Groonga" *< content @ "Mroonga"'
Groonga may not return records that should match caused by
GRN_II_CURSOR_SET_MIN_ENABLE
.
Thanks#
Takashi Hashida
Release 11.0.6 - 2021-08-29#
Warning
Groonga 11.0.6 has had a bug that may not add a new entry in Hash table.
We fixed this bug on Groonga 11.0.7. This bug only occurs in Groonga 11.0.6. Therefore, if you were using Groonga 11.0.6, we highly recommended that you use Groonga 11.0.7 or later.
Improvements#
Added support for recovering on crash. (experimental)
This is a experimental feature. Currently, this feature is still not stable.
If Groonga crashes, it recovers the database automatically when it opens a database for the first time since the crash. However, This feature can’t recover the database automatically in all crash cases. We need to recover the database manually depending on timing even if this feature enables.
Groonga execute WAL (write ahead log) when this feature is enable. We can dump WAL by the following tools, but currently, users doesn’t need to use them.
[grndb]
dump-wal
command.dump-wal.rb
scritp.
[cache_limit] Groonga remove cache when we execute
cache_limit 0
. [GitHub#1224][Reported by higchi]Groonga stores query cache to internally table. The maximum total size of keys of this table is 4GiB. Because this table is hash table. Therefore, If we execute many huge queries, Groonga may be unable to store query cache, because the maximum total size of keys may be over 4GiB. In such cases, We can clear the table for query cache by using
cache_limit 0
, and Groonga can store query cache
Fixes#
Fixed a bug that Groonga doesn’t clear lock when some threads open the same object around the same time.
If some threads open the same object around the same time, threads except for a thread that executes the opening object at first are waiting for opening the target object. At this time, threads that wait for an opening object take locks, but these locks are not released. Therefore, these locks remain until Groonga’s process is restarted in the above case, and a new thread can’t also open the object all the time until Groonga’s process is restarted.
However, this bug rarely happens. Because a time of a thread open the object is a very short time.
[query_parallel_or] Fixed a bug that result may be different from the
query()
.For example, If we used
query("tags || tags2", "beginner man")
, the following record was a match, but if we usedquery_parallel_or("tags || tags2", "beginner man")
, the following record wasn’t a match until now.{"_key": "Bob", "comment": "Hey!", "tags": ["expert", "man"], "tags2": ["beginner"]}
Even if we use
query_parallel_or("tags || tags2", "beginner man")
, the above record is match by this modification.
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
[The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.
*<
and*>
only valid when we usequery()
the right side of filter condition. If we specify as below,*<
and*>
work as&&
.'content @ "Groonga" *< content @ "Mroonga"'
Groonga may not return records that should match caused by
GRN_II_CURSOR_SET_MIN_ENABLE
.
Thanks#
higchi
Release 11.0.5 - 2021-07-29#
Improvements#
[Normalizers] Added support for multiple normalizers.
We can specify multiple normalizers by
--notmalizers
option when we create a table since this release. If we can also specify them by--normalizer
existing option because of compatibility.We added
NormalizerTable
for customizing a normalizer in Groonga 11.0.4. We can more flexibly behavior of the normalizer by combiningNormalizerTable
with existing normalizer.For example, this feature is useful in the following case.
Search for a telephone number. However, we import data handwritten by OCR. If data is handwritten, OCR may misunderstand a number and string(e.g. 5 and S).
The details are as follows.
table_create Normalizations TABLE_PAT_KEY ShortText column_create Normalizations normalized COLUMN_SCALAR ShortText load --table Normalizations [ {"_key": "s", "normalized": "5"} ] table_create Tels TABLE_NO_KEY column_create Tels tel COLUMN_SCALAR ShortText table_create TelsIndex TABLE_PAT_KEY ShortText \ --normalizers 'NormalizerNFKC130("unify_hyphen_and_prolonged_sound_mark", true), \ NormalizerTable("column", "Normalizations.normalized")' \ --default_tokenizer 'TokenNgram("loose_symbol", true, "loose_blank", true)' column_create TelsIndex tel_index COLUMN_INDEX|WITH_SECTION Tels tel load --table Tels [ {"tel": "03-4S-1234"} {"tel": "03-45-9876"} ] select --table Tels \ --filter 'tel @ "03-45-1234"' [ [ 0, 1625227424.560146, 0.0001730918884277344 ], [ [ [ 1 ], [ [ "_id", "UInt32" ], [ "tel", "ShortText" ] ], [ 1, "03-4S-1234" ] ] ] ]
Existing normalizers can’t meet in such case, but we can meet it by combining
NormalizerTable
with existing normalizer since this release.[query_parallel_or][query] Added support for customizing thresholds for sequential search.
We can customize thresholds in each queries whether to use sequential search by the following options.
{"max_n_enough_filtered_records": xx}
max_n_enough_filtered_records
specify the number of records.query
orquery_parallel_or
use sequential search when they seems to narrow down until under this number.{"enough_filtered_ratio": x.x}
enough_filtered_ratio
specify percentage of total.query
orquery_parallel_or
use sequential search when they seems to narrow down until under this percentage. For example, if we specify{"enough_filtered_ratio": 0.5}
,query
orquery_parallel_or
use sequential search when they seems to narrow down until half of the whole.
The details are as follows.
table_create Products TABLE_NO_KEY column_create Products name COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText --normalizer NormalizerAuto column_create Terms products_name COLUMN_INDEX Products name load --table Products [ ["name"], ["Groonga"], ["Mroonga"], ["Rroonga"], ["PGroonga"], ["Ruby"], ["PostgreSQL"] ] select \ --table Products \ --filter 'query("name", "r name:Ruby", {"enough_filtered_ratio": 0.5})'
table_create Products TABLE_NO_KEY column_create Products name COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText --normalizer NormalizerAuto column_create Terms products_name COLUMN_INDEX Products name load --table Products [ ["name"], ["Groonga"], ["Mroonga"], ["Rroonga"], ["PGroonga"], ["Ruby"], ["PostgreSQL"] ] select \ --table Products \ --filter 'query("name", "r name:Ruby", {"max_n_enough_filtered_records": 10})'
[between][in_values] Added support for customizing thresholds for sequential search.
[between] and [in_values] have a feature that they switch to sequential search when the target of search records is narrowed down enough.
The value of
GRN_IN_VALUES_TOO_MANY_INDEX_MATCH_RATIO
/GRN_BETWEEN_TOO_MANY_INDEX_MATCH_RATIO
is used as threshold whether Groonga execute sequential search or search with indexes in such a case.This behavior is customized by only the following environment variable until now.
in_values()
:# Don't use auto sequential search GRN_IN_VALUES_TOO_MANY_INDEX_MATCH_RATIO=-1 # Set threshold to 0.02 GRN_IN_VALUES_TOO_MANY_INDEX_MATCH_RATIO=0.02
between()
:# Don't use auto sequential search GRN_BETWEEN_TOO_MANY_INDEX_MATCH_RATIO=-1 # Set threshold to 0.02 GRN_BETWEEN_TOO_MANY_INDEX_MATCH_RATIO=0.02
if customize by the environment variable, this threshold applies to all queries, but we can specify it in each query by using this feature.
The details are as follows. We can specify the threshold by using
{"too_many_index_match_ratio": x.xx}
option. The value type of this option isdouble
.table_create Memos TABLE_HASH_KEY ShortText column_create Memos timestamp COLUMN_SCALAR Time table_create Times TABLE_PAT_KEY Time column_create Times memos_timestamp COLUMN_INDEX Memos timestamp load --table Memos [ {"_key": "001", "timestamp": "2014-11-10 07:25:23"}, {"_key": "002", "timestamp": "2014-11-10 07:25:24"}, {"_key": "003", "timestamp": "2014-11-10 07:25:25"}, {"_key": "004", "timestamp": "2014-11-10 07:25:26"}, {"_key": "005", "timestamp": "2014-11-10 07:25:27"}, {"_key": "006", "timestamp": "2014-11-10 07:25:28"}, {"_key": "007", "timestamp": "2014-11-10 07:25:29"}, {"_key": "008", "timestamp": "2014-11-10 07:25:30"}, {"_key": "009", "timestamp": "2014-11-10 07:25:31"}, {"_key": "010", "timestamp": "2014-11-10 07:25:32"}, {"_key": "011", "timestamp": "2014-11-10 07:25:33"}, {"_key": "012", "timestamp": "2014-11-10 07:25:34"}, {"_key": "013", "timestamp": "2014-11-10 07:25:35"}, {"_key": "014", "timestamp": "2014-11-10 07:25:36"}, {"_key": "015", "timestamp": "2014-11-10 07:25:37"}, {"_key": "016", "timestamp": "2014-11-10 07:25:38"}, {"_key": "017", "timestamp": "2014-11-10 07:25:39"}, {"_key": "018", "timestamp": "2014-11-10 07:25:40"}, {"_key": "019", "timestamp": "2014-11-10 07:25:41"}, {"_key": "020", "timestamp": "2014-11-10 07:25:42"}, {"_key": "021", "timestamp": "2014-11-10 07:25:43"}, {"_key": "022", "timestamp": "2014-11-10 07:25:44"}, {"_key": "023", "timestamp": "2014-11-10 07:25:45"}, {"_key": "024", "timestamp": "2014-11-10 07:25:46"}, {"_key": "025", "timestamp": "2014-11-10 07:25:47"}, {"_key": "026", "timestamp": "2014-11-10 07:25:48"}, {"_key": "027", "timestamp": "2014-11-10 07:25:49"}, {"_key": "028", "timestamp": "2014-11-10 07:25:50"}, {"_key": "029", "timestamp": "2014-11-10 07:25:51"}, {"_key": "030", "timestamp": "2014-11-10 07:25:52"}, {"_key": "031", "timestamp": "2014-11-10 07:25:53"}, {"_key": "032", "timestamp": "2014-11-10 07:25:54"}, {"_key": "033", "timestamp": "2014-11-10 07:25:55"}, {"_key": "034", "timestamp": "2014-11-10 07:25:56"}, {"_key": "035", "timestamp": "2014-11-10 07:25:57"}, {"_key": "036", "timestamp": "2014-11-10 07:25:58"}, {"_key": "037", "timestamp": "2014-11-10 07:25:59"}, {"_key": "038", "timestamp": "2014-11-10 07:26:00"}, {"_key": "039", "timestamp": "2014-11-10 07:26:01"}, {"_key": "040", "timestamp": "2014-11-10 07:26:02"}, {"_key": "041", "timestamp": "2014-11-10 07:26:03"}, {"_key": "042", "timestamp": "2014-11-10 07:26:04"}, {"_key": "043", "timestamp": "2014-11-10 07:26:05"}, {"_key": "044", "timestamp": "2014-11-10 07:26:06"}, {"_key": "045", "timestamp": "2014-11-10 07:26:07"}, {"_key": "046", "timestamp": "2014-11-10 07:26:08"}, {"_key": "047", "timestamp": "2014-11-10 07:26:09"}, {"_key": "048", "timestamp": "2014-11-10 07:26:10"}, {"_key": "049", "timestamp": "2014-11-10 07:26:11"}, {"_key": "050", "timestamp": "2014-11-10 07:26:12"} ] select Memos \ --filter '_key == "003" && \ between(timestamp, \ "2014-11-10 07:25:24", \ "include", \ "2014-11-10 07:27:26", \ "exclude", \ {"too_many_index_match_ratio": 0.03})'
table_create Tags TABLE_HASH_KEY ShortText table_create Memos TABLE_HASH_KEY ShortText column_create Memos tag COLUMN_SCALAR Tags load --table Memos [ {"_key": "Rroonga is fast!", "tag": "Rroonga"}, {"_key": "Groonga is fast!", "tag": "Groonga"}, {"_key": "Mroonga is fast!", "tag": "Mroonga"}, {"_key": "Groonga sticker!", "tag": "Groonga"}, {"_key": "Groonga is good!", "tag": "Groonga"} ] column_create Tags memos_tag COLUMN_INDEX Memos tag select \ Memos \ --filter '_id >= 3 && \ in_values(tag, \ "Groonga", \ {"too_many_index_match_ratio": 0.7})' \ --output_columns _id,_score,_key,tag
[between] Added support for
GRN_EXPR_OPTIMIZE=yes
.between()
supported for optimizing the order of evaluation of a conditional expression.[query_parallel_or][query] Added support for specifying group of match_columns as vector. [GitHub#1238][Patched by naoa]
We can use vector in
match_columns
ofquery
andquery_parallel_or
as below.table_create Users TABLE_NO_KEY column_create Users name COLUMN_SCALAR ShortText column_create Users memo COLUMN_SCALAR ShortText column_create Users tag COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenNgram \ --normalizer NormalizerNFKC130 column_create Terms name COLUMN_INDEX|WITH_POSITION Users name column_create Terms memo COLUMN_INDEX|WITH_POSITION Users memo column_create Terms tag COLUMN_INDEX|WITH_POSITION Users tag load --table Users [ {"name": "Alice", "memo": "Groonga user", "tag": "Groonga"}, {"name": "Bob", "memo": "Rroonga user", "tag": "Rroonga"} ] select Users \ --output_columns _score,name \ --filter 'query(["name * 100", "memo", "tag * 10"], \ "Alice OR Groonga")'
[select] Added support for section and weight in prefix search. [GitHub#1240][Patched by naoa]
We can use multi column index and adjusting score in prefix search.
table_create Memos TABLE_NO_KEY column_create Memos title COLUMN_SCALAR ShortText column_create Memos tags COLUMN_VECTOR ShortText table_create Terms TABLE_PAT_KEY ShortText column_create Terms index COLUMN_INDEX|WITH_SECTION Memos title,tags load --table Memos [ {"title": "Groonga", "tags": ["Groonga"]}, {"title": "Rroonga", "tags": ["Groonga", "Rroonga", "Ruby"]}, {"title": "Mroonga", "tags": ["Groonga", "Mroonga", "MySQL"]} ] select Memos \ --match_columns "Terms.index.title * 2" \ --query 'G*' \ --output_columns title,tags,_score [ [ 0, 0.0, 0.0 ], [ [ [ 1 ], [ [ "title", "ShortText" ], [ "tags", "ShortText" ], [ "_score", "Int32" ] ], [ "Groonga", [ "Groonga" ], 2 ] ] ] ]
[grndb] Added support for closing used object immediately in
grndb recover
.We can reduce memory usage by this. This may decrease performance but it will be acceptable.
Note that
grndb check
doesn’t close used objects immediately yet.[query_parallel_or][query] Added support for specifying
scorer_tf_idf
inmatch_columns
as below.table_create Tags TABLE_HASH_KEY ShortText table_create Users TABLE_HASH_KEY ShortText column_create Users tags COLUMN_VECTOR Tags load --table Users [ {"_key": "Alice", "tags": ["beginner", "active"]}, {"_key": "Bob", "tags": ["expert", "passive"]}, {"_key": "Chris", "tags": ["beginner", "passive"]} ] column_create Tags users COLUMN_INDEX Users tags select Users \ --output_columns _key,_score \ --sort_keys _id \ --command_version 3 \ --filter 'query_parallel_or("scorer_tf_idf(tags)", \ "beginner active")' { "header": { "return_code": 0, "start_time": 0.0, "elapsed_time": 0.0 }, "body": { "n_hits": 1, "columns": [ { "name": "_key", "type": "ShortText" }, { "name": "_score", "type": "Float" } ], "records": [ [ "Alice", 2.098612308502197 ] ] } }
[query_expand] Added support for weighted increment, decrement, and negative.
We can specify weight against expanded words.
If we want to increment score, we use
>
. If we want to decrement score, we use<
.We can specify the quantity of scores as a number. We can also use a negative numbers in it.
table_create TermExpansions TABLE_NO_KEY column_create TermExpansions term COLUMN_SCALAR ShortText column_create TermExpansions expansions COLUMN_VECTOR ShortText load --table TermExpansions [ {"term": "Rroonga", "expansions": ["Rroonga", "Ruby Groonga"]} ] query_expand TermExpansions "Groonga <-0.2Rroonga Mroonga" \ --term_column term \ --expanded_term_column expansions [[0,0.0,0.0],"Groonga <-0.2((Rroonga) OR (Ruby Groonga)) Mroonga"]
[httpd] Updated bundled nginx to 1.21.1.
Updated bundled Apache Arrow to 5.0.0.
[Ubuntu] Dropped Ubuntu 20.10 (Groovy Gorilla) support.
Because Ubuntu 20.10 reached EOL July 22, 2021.
Fixes#
[query_parallel_or][query] Fixed a bug that if we specify
query_options
and the other options, the other options are ignored.For example,
"default_operator": "OR"
option had been ignored in the following case.plugin_register token_filters/stop_word table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto \ --token_filters TokenFilterStopWord column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content column_create Terms is_stop_word COLUMN_SCALAR Bool load --table Terms [ {"_key": "and", "is_stop_word": true} ] load --table Memos [ {"content": "Hello"}, {"content": "Hello and Good-bye"}, {"content": "and"}, {"content": "Good-bye"} ] select Memos \ --filter 'query_parallel_or( \ "content", \ "Hello and", \ {"default_operator": "OR", \ "options": {"TokenFilterStopWord.enable": false}})' \ --match_escalation_threshold -1 \ --sort_keys -_score [ [ 0, 0.0, 0.0 ], [ [ [ 1 ], [ [ "_id", "UInt32" ], [ "content", "ShortText" ] ], [ 2, "Hello and Good-bye" ] ] ] ]
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
[The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.
*<
and*>
only valid when we usequery()
the right side of filter condition. If we specify as below,*<
and*>
work as&&
.'content @ "Groonga" *< content @ "Mroonga"'
If we repeat that we remove any data and load them again, Groonga may not return records that should match.
Thanks#
naoa
Release 11.0.4 - 2021-06-29#
Improvements#
[Normalizer] Added support for customized normalizer.
We define a table for normalize to use this feature. We can normalize with use that table. In other words, we can use customized normalizer.
For example, we define that “S” normalize to “5” in the following example. The
Substitutions
table is for nromalize.table_create Substitutions TABLE_PAT_KEY ShortText column_create Substitutions substituted COLUMN_SCALAR ShortText load --table Substitutions [ {"_key": "S", "substituted": "5"} ] table_create TelLists TABLE_NO_KEY column_create TelLists tel COLUMN_SCALAR ShortText table_create Terms TABLE_HASH_KEY ShortText \ --default_tokenizer TokenNgram \ --normalizer 'NormalizerTable("column", "Substitutions.substituted", \ "report_source_offset", true)' column_create Terms tel_index COLUMN_INDEX|WITH_POSITION TelLists tel load --table TelLists [ {"tel": "03-4S-1234"} ] select TelLists --filter 'tel @ "03-45-1234"' [ [ 0, 1624686303.538532, 0.001319169998168945 ], [ [ [ 1 ], [ [ "_id", "UInt32" ], [ "tel", "ShortText" ] ], [ 1, "03-4S-1234" ] ] ] ]
For example, we can define to the table easy to false recognize words when we input a handwritten data. By this, we can normalize incorrect data to correct data.
Note that we need to reconstruct the index if we updated the table for normalize.
Added a new command
object_warm
.This commnad ship Groonga’s DB to OS’s page cache.
If we never startup Groonga after OS startup, Groonga’s DB doesn’t exist on OS’s page cache When Groonga on the first run. Therefore, the first operation to Groonga is slow.
If we execute this command in advance, the first operation to Groonga is fast. In Linux, we can do the same by also executing
cat *.db > dev/null
. However, we could not do the same thing in Windows until now.By using this command, we can ship Groonga’s DB to OS’s page cache in both Linux and Windows. Then, we can also do that in units of table, column, and index. Therefore, we can ship only table, column, and index that we often use to OS’s page cache.
We can execute this command against various targets as below.
If we specify
object_warm --name index_name
, the index is shipped to OS’s page cache.If we specify
object_warm --name column_name
, the column is shipped to OS’s page cache.If we specify
object_warm --name table_name
is shipped to OS’s page cache.If we specify
object_warm
, whole Groonga’s database is shipped to OS’s page cache.
However, note that if OS has not empty space on memory, this command has no effect.
[select] Added support for adjusting the score of a specific record in
--filter
.We can adjust the score of a specific record by using a oprtator named
*~
.*~
is logical operator same as&&
and||
. Therefore, we can use*~
like as&&
ans||
. Default weight of*~
is -1.Therefore, for example,
'content @ "Groonga" *~ content @ "Mroonga"'
mean the following operations.Extract records that match
'content @ "Groonga"
andcontent @ "Mroonga"'
.Add a score as below.
Calculate the score of
'content @ "Groonga"
.Calculate the score of
'content @ "Mroonga"'
.b’s score multiplied by -1 by
*~
.The socre of this record is a + b Therefore, if a’s socre is 1 and b’s score is 1, the score of this record is 1 + (1 * -1) = 0.
Then, we can specify score quantity by
*~${score_quantity}
.In particular, the following query adjust the score of match records by the following condition(
'content @ "Groonga" *~2.5 content @ "Mroonga")'
).table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto column_create Terms index COLUMN_INDEX|WITH_POSITION Memos content load --table Memos [ {"content": "Groonga is a full text search engine."}, {"content": "Rroonga is the Ruby bindings of Groonga."}, {"content": "Mroonga is a MySQL storage engine based of Groonga."} ] select Memos \ --command_version 3 \ --filter 'content @ "Groonga" *~2.5 content @ "Mroonga"' \ --output_columns 'content, _score' \ --sort_keys -_score,_id { "header": { "return_code": 0, "start_time": 1624605205.641078, "elapsed_time": 0.002965450286865234 }, "body": { "n_hits": 3, "columns": [ { "name": "content", "type": "ShortText" }, { "name": "_score", "type": "Float" } ], "records": [ [ "Groonga is a full text search engine.", 1.0 ], [ "Rroonga is the Ruby bindings of Groonga.", 1.0 ], [ "Mroonga is a MySQL storage engine based of Groonga.", -1.5 ] ] } }
We can do the same by also useing adjuster . If we use adjuster , we need to make
--filter
condition and--adjuster
conditon on our application, but we make only--filter
condition on it by this improvement.We can also describe filter condition as below by using
query()
.--filter 'content @ "Groonga" *~2.5 content @ "Mroonga"'
[select] Added support for
&&
with weight.We can use
&&
with weight by using*<
or*>
. Default weight of*<
is 0.5. Default weight of*>
is 2.0.We can specify score quantity by
*<${score_quantity}
and*>${score_quantity}
. Then, if we specify*<${score_quantity}
, a plus or minus sign of${score_quantity}
is reverse.For example,
'content @ "Groonga" *<2.5 query("content", "MySQL")'
is as below.Extract records that match
'content @ "Groonga"
andcontent @ "Mroonga"'
.Add a score as below.
Calculate the score of
'content @ "Groonga"
.Calculate the score of
query("content", "MySQL")
.b’s score multiplied by -2.5 by
*<
.The socre of this record is a + b Therefore, if a’s socre is 1 and b’s score is 1, the score of this record is 1 + (1 * -2.5) = -1.5.
In particular, the following query adjust the score of match records by the following condition(
'content @ "Groonga" *<2.5 query("content", "Mroonga")'
).table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto column_create Terms index COLUMN_INDEX|WITH_POSITION Memos content load --table Memos [ {"content": "Groonga is a full text search engine."}, {"content": "Rroonga is the Ruby bindings of Groonga."}, {"content": "Mroonga is a MySQL storage engine based of Groonga."} ] select Memos \ --command_version 3 \ --filter 'content @ "Groonga" *<2.5 query("content", "Mroonga")' \ --output_columns 'content, _score' \ --sort_keys -_score,_id { "header": { "return_code": 0, "start_time": 1624605205.641078, "elapsed_time": 0.002965450286865234 }, "body": { "n_hits": 3, "columns": [ { "name": "content", "type": "ShortText" }, { "name": "_score", "type": "Float" } ], "records": [ [ "Groonga is a full text search engine.", 1.0 ], [ "Rroonga is the Ruby bindings of Groonga.", 1.0 ], [ "Mroonga is a MySQL storage engine based of Groonga.", -1.5 ] ] } }
[Log] Added support for outputting to stdout and stderr.
[Process log] and [Query log] supported output to stdout and stderr.
If we specify as
--log-path -
,--query-log-path -
, Groonga output log to stdout.If we specify as
--log-path +
,--query-log-path +
, Groonga output log to stderr.
[Process log] is for all of Groonga works. [Query log] is just for query processing.
This feature is useful when we execute Groonga on Docker. Docker has the feature that records stdout and stderr in standard. Therefore, we don’t need to login into the environment of Docker to get Groonga’s log.
For example, this feature is useful as he following case.
If we want to analyze slow queries of Groonga on Docker.
If we specify
--query-log-path -
when startup Groonga, we can analyze slow queries by only execution the following commands.docker logs ${container_name} | groonga-query-log-analyze
By this, we can analyze slow query with the query log that output from Groonga on Docker simply.
[Documentation] Filled missing documentation of
string_substring
. [GitHub#1209][Patched by Takashi Hashida]
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
[The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.
*<
and*>
only valid when we usequery()
the right side of filter condition. If we specify as below,*<
and*>
work as&&
.'content @ "Groonga" *< content @ "Mroonga"'
Thanks#
Takashi Hashida
Release 11.0.3 - 2021-05-29#
Improvements#
[query] Added support for ignoring
TokenFilterStem
by the query.TokenFilterStem
can search by using a stem. For example, all ofdevelop
,developing
,developed
anddevelops
tokens are stemmed asdevelop
. So we can finddevelop
,developing
anddeveloped
bydevelops
query.In this release, we are able to search without
TokenFilterStem
in only a specific query as below.plugin_register token_filters/stem table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto \ --token_filters 'TokenFilterStem("keep_original", true)' column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content load --table Memos [ {"content": "I develop Groonga"}, {"content": "I'm developing Groonga"}, {"content": "I developed Groonga"} ] select Memos \ --match_columns content \ --query '"developed groonga"' \ --query_options '{"TokenFilterStem.enable": false}' [ [ 0, 0.0, 0.0 ], [ [ [ 1 ], [ [ "_id", "UInt32" ], [ "content", "ShortText" ] ], [ 3, "I developed Groonga" ] ] ] ]
This feature is useful when users want to search by a stemmed word generally but users sometimes want to search by a exact (not stemmed) word as below.
If Groonga returns many results when searching by a stemmed word.
If
TokenFilterStem
returns the wrong result of stemming.If we want to find only records that have an exact (not stemmed) word.
[query] Added support for ignoring
TokenFilterStopWord
by the query.TokenFilterStopWord
searched without stop word that we registered beforehand. It uses for reducing noise of search by ignoring frequently word (e.g.,and
,is
, and so on.).However, we sometimes want to search include these words only a specific query. In this release, we are able to search without
TokenFilterStopWord
in only a specific query as below.plugin_register token_filters/stop_word table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto \ --token_filters TokenFilterStopWord column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content column_create Terms is_stop_word COLUMN_SCALAR Bool load --table Terms [ {"_key": "and", "is_stop_word": true} ] load --table Memos [ {"content": "Hello"}, {"content": "Hello and Good-bye"}, {"content": "Good-bye"} ] select Memos \ --match_columns content \ --query "Hello and" \ --query_options '{"TokenFilterStopWord.enable": false}' \ --match_escalation_threshold -1 \ --sort_keys -_score [ [ 0, 0.0, 0.0 ], [ [ [ 1 ], [ [ "_id", "UInt32" ], [ "content", "ShortText" ] ], [ 2, "Hello and Good-bye" ] ] ] ]
In the above example, we specify
TokenFilterStopWord.enable
by using--query-options
, but we also specify it by using{"options": {"TokenFilterStopWord.enable": false}}
as below.plugin_register token_filters/stop_word table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto \ --token_filters TokenFilterStopWord column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content column_create Terms is_stop_word COLUMN_SCALAR Bool load --table Terms [ {"_key": "and", "is_stop_word": true} ] load --table Memos [ {"content": "Hello"}, {"content": "Hello and Good-bye"}, {"content": "Good-bye"} ] select Memos \ --filter 'query("content", \ "Hello and", \ {"options": {"TokenFilterStopWord.enable": false}})' \ --match_escalation_threshold -1 \ --sort_keys -_score [ [ 0, 0.0, 0.0 ], [ [ [ 1 ], [ [ "_id", "UInt32" ], [ "content", "ShortText" ] ], [ 2, "Hello and Good-bye" ] ] ] ]
This feature is useful if that Groonga can’t return results correctly if we don’t search by keywords include commonly used words (e.g., if a search for a song title, a shop name, and so on.).
[Normalizers][NormalizerNFKC] Added a new option
remove_new_line
.If we want to normalize the key of a table that stores data, we set a normalizer to it. However, normally, normalizers remove a new line.
Groonga can’t handle a key that is only a new line.
We can register data that is only a new line as key by this option.
[string_slice] Added a new function
string_slice()
. [Github#1177][Patched by Takashi Hashida]string_slice()
extracts a substring of a string.To enable this function, we need to register
functions/string
plugin.We can use two different extraction methods depending on the arguments as below.
Extraction by position:
plugin_register functions/string table_create Memos TABLE_HASH_KEY ShortText load --table Memos [ {"_key": "Groonga"} ] select Memos --output_columns '_key, string_slice(_key, 2, 3)' [ [ 0, 1337566253.89858, 0.000355720520019531 ], [ [ [ 1 ], [ [ "_key", "ShortText" ], [ "string_slice", null ] ], [ "Groonga", "oon" ] ] ] ]
Extraction by regular expression:
plugin_register functions/string table_create Memos TABLE_HASH_KEY ShortText load --table Memos [ {"_key": "Groonga"} ] select Memos --output_columns '_key, string_slice(_key, "(Gro+)(.*)", 2)' [ [p 0, 1337566253.89858, 0.000355720520019531 ], [ [ [ 1 ], [ [ "_key", "ShortText" ], [ "string_slice", null ] ], [ "Groonga", "nga" ] ] ] ]
[Ubuntu] Dropped support for Ubuntu 16.04 LTS (Xenial Xerus).
Added EditorConfig for Visual Studio. [GitHub#1191][Patched by Takashi Hashida]
Most settings are for Visual Studio only.
[httpd] Updated bundled nginx to 1.20.1.
Contains security fix of CVE-2021-23017.
Fixes#
Fixed a bug that Groonga may not have returned a result of a search query if we sent many search queries when tokenizer, normalizer, or token_filters that support options were used.
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
[The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.
Thanks#
Takashi Hashida
Release 11.0.2 - 2021-05-10#
Improvements#
[Documentation] Removed a reference about
ruby_load
command. [GitHub#1172][Patched by Anthony M. Cook]Because this command has already deleted.
[Debian GNU/Linux] Added support for Debian 11(Bullseye).
[select] Added support for
--post_filter
.We can use
post_filter
to filter byfiltered
stage dynamic columns as below.table_create Items TABLE_NO_KEY column_create Items price COLUMN_SCALAR UInt32 load --table Items [ {"price": 100}, {"price": 150}, {"price": 200}, {"price": 250}, {"price": 300} ] select Items \ --filter "price >= 150" \ --columns[price_with_tax].stage filtered \ --columns[price_with_tax].type UInt32 \ --columns[price_with_tax].flags COLUMN_SCALAR \ --columns[price_with_tax].value "price * 1.1" \ --post_filter "price_with_tax <= 250" [ [ 0, 0.0, 0.0 ], [ [ [ 2 ], [ [ "_id", "UInt32" ], [ "price_with_tax", "UInt32" ], [ "price", "UInt32" ] ], [ 2, 165, 150 ], [ 3, 220, 200 ] ] ] ]
[select] Added support for
--slices[].post_filter
.We can use
post_filter
to filter by--slices[].filter
as below.table_create Items TABLE_NO_KEY column_create Items price COLUMN_SCALAR UInt32 load --table Items [ {"price": 100}, {"price": 200}, {"price": 300}, {"price": 1000}, {"price": 2000}, {"price": 3000} ] select Items \ --slices[expensive].filter 'price >= 1000' \ --slices[expensive].post_filter 'price < 3000' [ [ 0, 0.0, 0.0 ], [ [ [ 6 ], [ [ "_id", "UInt32" ], [ "price", "UInt32" ] ], [ 1, 100 ], [ 2, 200 ], [ 3, 300 ], [ 4, 1000 ], [ 5, 2000 ], [ 6, 3000 ] ], { "expensive": [ [ 2 ], [ [ "_id", "UInt32" ], [ "price", "UInt32" ] ], [ 4, 1000 ], [ 5, 2000 ] ] } ] ]
[select] Added support for describing expression into
--sort_keys
.We can describe the expression into
--sort_keys
.If nonexistent keys into expression as a
--sort_keys
, they are ignored and outputted warns into a log.
By this, for example, we can specify a value of an element of
VECTOR COLUMN
to--sort_keys
. And we can sort a result with it.We can sort a result with an element of
VECTOR COLUMN
even if the before version by using dynamic column. However, we can sort a result with an element ofVECTOR COLUMN
without using dynamic column by this feature.table_create Values TABLE_NO_KEY column_create Values numbers COLUMN_VECTOR Int32 load --table Values [ {"numbers": [127, 128, 129]}, {"numbers": [126, 255]}, {"numbers": [128, -254]} ] select Values --sort_keys 'numbers[1]' --output_columns numbers [ [ 0, 0.0, 0.0 ], [ [ [ 3 ], [ [ "numbers", "Int32" ] ], [ [ 128, -254 ] ], [ [ 127, 128, 129 ] ], [ [ 126, 255 ] ] ] ] ]
[Token filters] Added support for multiple token filters with options.
We can specify multiple token filters with options like
--token_filters 'TokenFilterStopWord("column", "ignore"), TokenFilterNFKC130("unify_kana", true)'
. [Github#mroonga/mroonga#399][Reported by MASUDA Kazuhiro]
[query] Added support a dynamic column of
result_set
stage with complex expression.Complex expression is that it needs temporary result sets internally like a following expression.
'(true && query("name * 10", "ali", {"score_column": "ali_score"})) || \ (true && query("name * 2", "li", {"score_column": "li_score"}))'
In the above expressions, the temporary result sets are used to store the result of evaluating the
true
.Therefore, for example, in the following expression, we can use a value of dynamic column of
result_set
stage in expression. Because temporary result sets internally are needless as below expression.'(query("name * 10", "ali", {"score_column": "ali_score"})) || \ (query("name * 2", "li", {"score_column": "li_score"}))'
In this release, for example, we can set a value to
li_score
as below. (The value ofli_score
had been0
in before version. Because the second expression could not get dynamic column.)table_create Users TABLE_NO_KEY column_create Users name COLUMN_SCALAR ShortText table_create Lexicon TABLE_HASH_KEY ShortText \ --default_tokenizer TokenBigramSplitSymbolAlphaDigit \ --normalizer NormalizerAuto column_create Lexicon users_name COLUMN_INDEX|WITH_POSITION Users name load --table Users [ {"name": "Alice"}, {"name": "Alisa"}, {"name": "Bob"} ] select Users \ --columns[ali_score].stage result_set \ --columns[ali_score].type Float \ --columns[ali_score].flags COLUMN_SCALAR \ --columns[li_score].stage result_set \ --columns[li_score].type Float \ --columns[li_score].flags COLUMN_SCALAR \ --output_columns name,_score,ali_score,li_score \ --filter '(true && query("name * 10", "ali", {"score_column": "ali_score"})) || \ (true && query("name * 2", "li", {"score_column": "li_score"}))' [ [ 0, 0.0, 0.0 ], [ [ [ 2 ], [ [ "name", "ShortText" ], [ "_score", "Int32" ], [ "ali_score", "Float" ], [ "li_score", "Float" ] ], [ "Alice", 14, 10.0, 2.0 ], [ "Alisa", 14, 10.0, 2.0 ] ] ] ]
We also supported a dynamic vector column of
result_set
stage as below.table_create Users TABLE_NO_KEY column_create Users name COLUMN_SCALAR ShortText table_create Lexicon TABLE_HASH_KEY ShortText \ --default_tokenizer TokenBigramSplitSymbolAlphaDigit \ --normalizer NormalizerAuto column_create Lexicon users_name COLUMN_INDEX|WITH_POSITION Users name load --table Users [ {"name": "Alice"}, {"name": "Alisa"}, {"name": "Bob"} ] select Users \ --columns[tags].stage result_set \ --columns[tags].type ShortText \ --columns[tags].flags COLUMN_VECTOR \ --output_columns name,tags \ --filter '(true && query("name", "al", {"tags": ["al"], "tags_column": "tags"})) || \ (true && query("name", "sa", {"tags": ["sa"], "tags_column": "tags"}))' [ [ 0, 0.0, 0.0 ], [ [ [ 2 ], [ [ "name", "ShortText" ], [ "tags", "ShortText" ] ], [ "Alice", [ "al" ] ], [ "Alisa", [ "al", "sa" ] ] ] ] ]
If we use a dynamic vector column, the storing values are appended values of each element.
[Ubuntu] Added support for Ubuntu 21.04 (Hirsute Hippo).
[httpd] Updated bundled nginx to 1.19.10.
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
[The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list. [Github#1186][Reported by poti]
Thanks#
Anthony M. Cook
MASUDA Kazuhiro
poti
Release 11.0.1 - 2021-03-31#
Improvements#
[Debian GNU/Linux] Added support for a ARM64 package.
[select] Added support for customizing adjust weight every key word.
We need to specify
<
or>
to all keywords to adjust scores until now. Because the default adjustment of weight (6 or 4) is larger than the default score (1).Therefore, for example, “A“‘s weight is 1 and “B“‘s weight is 4 in
A <B
. Decremented “B“‘s weight (4) is larger than not decremented “A“‘s weight (1). This is not works as expected. we need to specify>A <B
to use smaller weight than “A” for “B”. “A“‘s weight is 6 and “B“‘s weight is 4 in>A <B
.
We can customize adjustment of weight every key word by only specifying
<${WEIGHT}
or>${WEIGHT}
to target keywords since this release. For example, “A“‘s weight is 1 and “B“‘s weight is 0.9 inA <0.1B
(“B“‘s weight decrement 0.1).However, note that these forms (
>${WEIGHT}...
,<${WEIGHT}...
, and~${WEIGHT}...
) are incompatible.
[select] Added support for outputting
Float
andFloat32
value in Apache Arrow format.For example, Groonga output as below.
table_create Data TABLE_NO_KEY column_create Data float COLUMN_SCALAR Float load --table Data [ {"float": 1.1} ] select Data \ --command_version 3 \ --output_type apache-arrow return_code: int32 start_time: timestamp[ns] elapsed_time: double -- metadata -- GROONGA:data_type: metadata return_code start_time elapsed_time 0 0 1970-01-01T09:00:00+09:00 0.000000 ======================================== _id: uint32 float: double -- metadata -- GROONGA:n_hits: 1 _id float 0 1 1.100000
[select] Added support for getting a reference destination data via index column when we output a result.
Until now, Groonga had returned involuntary value when we specified output value like
index_column.xxx
. For example, A value of--columns[tags].value purchases.tag
was["apple",["many"]],["banana",["man"]],["cacao",["man"]]
in the following example. In this case, the expected values was["apple",["man","many"]],["banana",["man"]],["cacao",["woman"]]
. In this release, we can get a correct reference destination data via index column as below.table_create Products TABLE_PAT_KEY ShortText table_create Purchases TABLE_NO_KEY column_create Purchases product COLUMN_SCALAR Products column_create Purchases tag COLUMN_SCALAR ShortText column_create Products purchases COLUMN_INDEX Purchases product load --table Products [ {"_key": "apple"}, {"_key": "banana"}, {"_key": "cacao"} ] load --table Purchases [ {"product": "apple", "tag": "man"}, {"product": "banana", "tag": "man"}, {"product": "cacao", "tag": "woman"}, {"product": "apple", "tag": "many"} ] select Products \ --columns[tags].stage output \ --columns[tags].flags COLUMN_VECTOR \ --columns[tags].type ShortText \ --columns[tags].value purchases.tag \ --output_columns _key,tags [ [ 0, 0.0, 0.0 ], [ [ [ 3 ], [ [ "_key", "ShortText" ], [ "tags", "ShortText" ] ], [ "apple", [ "man", "many" ] ], [ "banana", [ "man" ] ], [ "cacao", [ "woman" ] ] ] ] ]
[select] Added support for specifying index column directly as a part of nested index.
We can search source table after filtering by using
index_column.except_source_column
. For example, we specifycomments.content
when searching in the following example. In this case, at first, this query execute full text search fromcontent
column ofCommentts
table, then fetch the records of Articles table which refers to already searched records of Comments table.table_create Articles TABLE_HASH_KEY ShortText table_create Comments TABLE_NO_KEY column_create Comments article COLUMN_SCALAR Articles column_create Comments content COLUMN_SCALAR ShortText column_create Articles content COLUMN_SCALAR Text column_create Articles comments COLUMN_INDEX Comments article table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerNFKC130 column_create Terms articles_content COLUMN_INDEX|WITH_POSITION \ Articles content column_create Terms comments_content COLUMN_INDEX|WITH_POSITION \ Comments content load --table Articles [ {"_key": "article-1", "content": "Groonga is fast!"}, {"_key": "article-2", "content": "Groonga is useful!"}, {"_key": "article-3", "content": "Mroonga is fast!"} ] load --table Comments [ {"article": "article-1", "content": "I'm using Groonga too!"}, {"article": "article-3", "content": "I'm using Mroonga!"}, {"article": "article-1", "content": "I'm using PGroonga!"} ] select Articles --match_columns comments.content --query groonga \ --output_columns "_key, _score, comments.content [ [ 0, 0.0, 0.0 ], [ [ [ 1 ], [ [ "_key", "ShortText" ], [ "_score", "Int32" ], [ "comments.content", "ShortText" ] ], [ "article-1", 1, [ "I'm using Groonga too!", "I'm using PGroonga!" ] ] ] ] ]
[load] Added support for loading reference vector with inline object literal.
For example, we can load data like
"key" : "[ { "key" : "value", ..., "key" : "value" } ]"
as below.table_create Purchases TABLE_NO_KEY column_create Purchases item COLUMN_SCALAR ShortText column_create Purchases price COLUMN_SCALAR UInt32 table_create Settlements TABLE_HASH_KEY ShortText column_create Settlements purchases COLUMN_VECTOR Purchases column_create Purchases settlements_purchases COLUMN_INDEX Settlements purchases load --table Settlements [ { "_key": "super market", "purchases": [ {"item": "apple", "price": 100}, {"item": "milk", "price": 200} ] }, { "_key": "shoes shop", "purchases": [ {"item": "sneakers", "price": 3000} ] } ]
It makes easier to add JSON data into reference columns by this feature.
Currently, this feature only support with JSON input.
[load] Added support for loading reference vector from JSON text.
We can load data to reference vector from source table with JSON text as below.
table_create Purchases TABLE_HASH_KEY ShortText column_create Purchases item COLUMN_SCALAR ShortText column_create Purchases price COLUMN_SCALAR UInt32 table_create Settlements TABLE_HASH_KEY ShortText column_create Settlements purchases COLUMN_VECTOR Purchases column_create Purchases settlements_purchases COLUMN_INDEX Settlements purchases load --table Settlements [ { "_key": "super market", "purchases": "[{\"_key\": \"super market-1\", \"item\": \"apple\", \"price\": 100}, {\"_key\": \"super market-2\", \"item\": \"milk\", \"price\": 200}]" }, { "_key": "shoes shop", "purchases": "[{\"_key\": \"shoes shop-1\", \"item\": \"sneakers\", \"price\": 3000}]" } ] dump \ --dump_plugins no \ --dump_schema no load --table Purchases [ ["_key","item","price"], ["super market-1","apple",100], ["super market-2","milk",200], ["shoes shop-1","sneakers",3000] ] load --table Settlements [ ["_key","purchases"], ["super market",["super market-1","super market-2"]], ["shoes shop",["shoes shop-1"]] ] column_create Purchases settlements_purchases COLUMN_INDEX Settlements purchases
Currently, this feature doesn’t support nested reference record.
[Windows] Added support for UNIX epoch for
time_classify_*
functions.Groonga handles timestamps on local time. Therefore, for example, if we input the UNIX epoch in Japan, inputting time is 9 hours behind the UNIX epoch.
The Windows API outputs an error when we input the time before the UNIX epoch.
We can use the UNIX epoch in
time_classify_*
functions as below in this release.plugin_register functions/time table_create Timestamps TABLE_PAT_KEY Time load --table Timestamps [ {"_key": 0}, {"_key": "2016-05-06 00:00:00.000001"}, {"_key": "2016-05-06 23:59:59.999999"}, {"_key": "2016-05-07 00:00:00.000000"}, {"_key": "2016-05-07 00:00:00.000001"}, {"_key": "2016-05-08 23:59:59.999999"}, {"_key": "2016-05-08 00:00:00.000000"} ] select Timestamps \ --sortby _id \ --limit -1 \ --output_columns '_key, time_classify_day_of_week(_key)' [ [ 0, 0.0, 0.0 ], [ [ [ 7 ], [ [ "_key", "Time" ], [ "time_classify_day_of_week", null ] ], [ 0.0, 4 ], [ 1462460400.000001, 5 ], [ 1462546799.999999, 5 ], [ 1462546800.0, 6 ], [ 1462546800.000001, 6 ], [ 1462719599.999999, 0 ], [ 1462633200.0, 0 ] ] ] ]
[query_parallel_or] Added a new function for processing queries in parallel.
query_parallel_or
requires Apache Arrow for processing queries in parallel. If it does not enable,query_parallel_or
processes queries in sequence.query_parallel_or
processes combination ofmatch_columns
andquery_string
in parallel.Syntax of
query_parallel_or
is as follow:query_parallel_or(match_columns, query_string1, query_string2, . . . query_stringN, {"option": "value", ...})
[select] Added support for ignoring nonexistent sort keys.
Groonga had been outputted error when we specified nonexistent sort keys until now. However, Groonga ignore nonexistent sort keys since this release. (Groonga doesn’t output error.)
This feature implements for consistency. Because we just ignore invalid values in
output_columns
and most of invalid values insort_keys
.
[select] Added support for ignoring nonexistent tables in
drilldowns[].table
. [GitHub#1169][Reported by naoa]Groonga had been outputted error when we specified nonexistent tables in
drilldowns[].table
until now. However, Groonga ignore nonexistent tables indrilldowns[].table
since this release. (Groonga doesn’t output error.)This feature implements for consistency. Because we just ignore invalid values in
output_columns
and most of invalid values insort_keys
.
[httpd] Updated bundled nginx to 1.19.8.
Fixes#
[reference_acquire] Fixed a bug that Groonga crash when a table’s reference is acquired and a column is added to the table before auto release is happened.
Because the added column’s reference isn’t acquired but it’s released on auto release.
[Windows] Fixed a bug that one or more processes fail an output backtrace on SEGV when a new backtrace logging process starts when another backtrace logging process is running in another thread.
Known Issues#
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
Thanks#
naoa
Release 11.0.0 - 2021-02-09#
This is a major version up! But It keeps backward compatibility. We can upgrade to 11.0.0 without rebuilding database.
Improvements#
[select] Added support for outputting values of scalar column and vector column via nested index.
The nested index is that has structure as below.
table_create Products TABLE_PAT_KEY ShortText table_create Purchases TABLE_NO_KEY column_create Purchases product COLUMN_SCALAR Products column_create Purchases tag COLUMN_SCALAR ShortText column_create Products purchases COLUMN_INDEX Purchases product
The
Products.purchases
column is a index ofPurchases.product
column in the above example. Also,Purchases.product
is a reference toProducts
table.We had not got the correct search result when we search via nested index until now.
The result had been as follows until now. We can see that
{"product": "apple", "tag": "man"}
is not output.table_create Products TABLE_PAT_KEY ShortText table_create Purchases TABLE_NO_KEY column_create Purchases product COLUMN_SCALAR Products column_create Purchases tag COLUMN_SCALAR ShortText column_create Products purchases COLUMN_INDEX Purchases product load --table Products [ {"_key": "apple"}, {"_key": "banana"}, {"_key": "cacao"} ] load --table Purchases [ {"product": "apple", "tag": "man"}, {"product": "banana", "tag": "man"}, {"product": "cacao", "tag": "woman"}, {"product": "apple", "tag": "many"} ] select Products \ --output_columns _key,purchases.tag [ [ 0, 1612504193.380738, 0.0002026557922363281 ], [ [ [ 3 ], [ [ "_key", "ShortText" ], [ "purchases.tag", "ShortText" ] ], [ "apple", "many" ], [ "banana", "man" ], [ "cacao", "man" ] ] ] ]
The result will be as follows from this release. We can see that
{"product": "apple", "tag": "man"}
is output.select Products \ --output_columns _key,purchases.tag [ [ 0, 0.0, 0.0 ], [ [ [ 3 ], [ [ "_key", "ShortText" ], [ "purchases.tags", "Tags" ] ], [ "apple", [ [ "man", "one" ], [ "child", "many" ] ] ], [ "banana", [ [ "man", "many" ] ] ], [ "cacao", [ [ "woman" ] ] ] ] ] ]
[Windows] Dropped support for packages of Windows version that we had cross compiled by using MinGW on Linux.
Because there aren’t probably many people who use that.
These above packages are that We had provided as below name until now.
groonga-x.x.x-x86.exe
groonga-x.x.x-x86.zip
groonga-x.x.x-x64.exe
groonga-x.x.x-x86.zip
From now on, we use the following packages for Windows.
groonga-latest-x86-vs2019-with-vcruntime.zip
groonga-latest-x64-vs2019-with-vcruntime.zip
If a system already has installed Microsoft Visual C++ Runtime Library, we suggest that we use the following packages.
groonga-latest-x86-vs2019.zip
groonga-latest-x64-vs2019.zip
Fixes#
Fixed a bug that there is possible that index is corrupt when Groonga executes many additions, delete, and update information in it.
This bug occurs when we only execute many delete information from index. However, it doesn’t occur when we only execute many additions information into index.
We can repair the index that is corrupt by this bug using reconstruction of it.
This bug doesn’t detect unless we reference the broken index. Therefore, the index in our indexes may has already broken.
We can use [index_column_diff] command to confirm whether the index has already been broken or not.