Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ocaml index with occurrences #76

Merged
merged 12 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion merlin-lib.opam
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ build: [
]
depends: [
"ocaml" {>= "5.1" & < "5.2"}
"dune" {>= "2.9.0"}
"dune" {>= "3.0.0"}
"csexp" {>= "1.5.1"}
"menhir" {dev & = "20210419"}
"menhirLib" {dev & = "20210419"}
Expand Down
33 changes: 33 additions & 0 deletions ocaml-index.opam
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# This file is generated by dune, edit dune-project instead
opam-version: "2.0"
synopsis: "A tool that indexes value usages from cmt files"
description:
"ocaml-index should integrate with the build system to index codebase and allow tools such as Merlin to perform project-wide occurrences queries."
maintainer: ["[email protected]"]
authors: ["[email protected]"]
license: "MIT"
homepage: "https://github.com/ocaml/merlin/ocaml-index"
bug-reports: "https://github.com/ocaml/merlin/issues"
depends: [
"dune" {>= "3.0.0"}
"ocaml" {>= "5.1" & < "5.2"}
"merlin-lib" {>= "4.9"}
"odoc" {with-doc}
]
build: [
["dune" "subst"] {dev}
[
"dune"
"build"
"-p"
name
"-j"
jobs
"--promote-install-files=false"
"@install"
"@runtest" {with-test}
"@doc" {with-doc}
]
["dune" "install" "-p" name "--create-install-files" name]
]
dev-repo: "git+https://github.com/ocaml/merlin.git"
3 changes: 2 additions & 1 deletion src/index-format/index_format.ml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ type index = {
approximated : Lid_set.t Uid_map.t;
cu_shape : (string, Shape.t) Hashtbl.t;
stats : stat Stats.t;
root_directory: string option;
}

let pp_partials (fmt : Format.formatter) (partials : Lid_set.t Uid_map.t) =
Expand Down Expand Up @@ -80,7 +81,7 @@ let write ~file index =
output_string oc magic_number;
output_value oc (index : index))

type file_content = Cmt of Cmt_format.cmt_infos | Index of index | Unknown
type file_content = Cmt of Cmt_format.cmt_infos | Cms of Cms_format.cms_infos | Index of index | Unknown

let read ~file =
let ic = open_in_bin file in
Expand Down
2 changes: 2 additions & 0 deletions src/index-format/index_format.mli
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ type index = {
approximated : Lid_set.t Uid_map.t;
cu_shape : (string, Shape.t) Hashtbl.t;
stats : stat Stats.t;
root_directory: string option;
}

val pp : Format.formatter -> index -> unit
Expand All @@ -29,6 +30,7 @@ val add : Lid_set.t Uid_map.t -> Shape.Uid.t -> Lid_set.t -> Lid_set.t Uid_map.t

type file_content =
| Cmt of Cmt_format.cmt_infos
| Cms of Cms_format.cms_infos
| Index of index
| Unknown

Expand Down
10 changes: 10 additions & 0 deletions src/ocaml-index/CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
1.0 (2024-06-18)
----------------

### Added

- Initial release.
- The `aggregate`` command that finishes reduction of shapes in cmt files and
store the output in a single index file.
- The `stats` command that prints information about an index file.
- The `dump` command that prints all locs of an index.
62 changes: 62 additions & 0 deletions src/ocaml-index/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# ocaml-index

Ocaml-index is a tool that indexes values from CMT files. Its current purpose is
to provide project-wide occurrences for OCaml codebases. The tool iterate on
given cmt's occurrences list (`cmt_ident_occurrences`) and determines the
definition of every value found in it. It then write an index to disk where
values corresponding to the same definition are grouped together. The tool can
also take multiple input files, index them and merge the results into a single
index.

# Usage

## Process cmt files and merge indexes


> ocaml-index aggregate [-o _output_file_] _cmt_file_ ... _index_file_ ... [-I _dir_] ... [--no-cmt-load-path]


- Input `cmt` files are indexed and merged into the final output
- Input index files are directly merged into the output
- If no input files is provided an empty index is created
- The default output file name is `project.ocaml-index`

### Load path:
Identifying definitions while processing `cmt` files may require loading any of
the `cmt` files of every transitive dependency of the compilation unit. By
default the `cmt_load_path` of the first input `cmt` file will be used to search
for these other units. One can add more paths to the load path using the `-I`
option. Usage of the cmt's loadpath can be disabled using the
`--no-cmt-load-path` option.

### Paths:
By default, the paths stored in the cmt's locations are relative to the
directory where the compiler was called. for build systems that do not always
call the compiler from the same root folder it might be useful to rewrite these
paths.

Using the `--root <path>` option stores the given path in the output file.
Additionally, the ` --rewrite-root` option will prepend `root` to all paths in
indexed location.

[Note: this feature is not used in the reference Dune rules, it might evolve in
the future if needed]

## Querying indexes

The tool does not provide actual queries but one can dump an entire index:

> ocaml-index dump _index_file_ ...

Or only print the number of definitions it stores:

> ocaml-index stats _index_file_ ...

```bash
$ ocaml-index stats _build/default/src/dune_rules/.dune_rules.objs/cctx.ocaml-index
Index ".../cctx.ocaml-index" contains:
- 28083 definitions
- 86850 locations
- 0 approximated definitions
- 0 compilation units shapes
```
9 changes: 9 additions & 0 deletions src/ocaml-index/bin/dune
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
(executable
(name ocaml_index)
(public_name ocaml-index)
(package ocaml-index)
(libraries lib ocaml_typing merlin_index_format)
(flags
:standard
-open Ocaml_typing
-open Merlin_index_format))
108 changes: 108 additions & 0 deletions src/ocaml-index/bin/ocaml_index.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
(** The indexer's binary *)

open Lib

let usage_msg =
"ocaml-index [COMMAND] [-verbose] <file1> [<file2>] ... -o <output>"

let verbose = ref false
let debug = ref false
let input_files = ref []
let build_path = ref []
let output_file = ref "project.ocaml-index"
let root = ref ""
let rewrite_root = ref false
let store_shapes = ref false
let do_not_use_cmt_loadpath = ref false

type command = Aggregate | Dump | Stats | Gather_shapes

let parse_command = function
| "aggregate" -> Some Aggregate
| "dump" -> Some Dump
| "stats" -> Some Stats
| "gather-shapes" -> Some Gather_shapes
| _ -> None

let command = ref None

let anon_fun arg =
match !command with
| None -> (
match parse_command arg with
| Some cmd -> command := Some cmd
| None ->
command := Some Aggregate;
input_files := arg :: !input_files)
| Some _ -> input_files := arg :: !input_files

let speclist =
[
("--verbose", Arg.Set verbose, "Output more information");
("--debug", Arg.Set debug, "Output debugging information");
("-o", Arg.Set_string output_file, "Set output file name");
( "--root",
Arg.Set_string root,
"Set the root path for all relative locations" );
( "--rewrite-root",
Arg.Set rewrite_root,
"Rewrite locations paths using the provided root" );
( "--store-shapes",
Arg.Set store_shapes,
"Aggregate input-indexes shapes and store them in the new index" );
( "-I",
Arg.String (fun arg -> build_path := arg :: !build_path),
"An extra directory to add to the load path" );
( "--no-cmt-load-path",
Arg.Set do_not_use_cmt_loadpath,
"Do not initialize the load path with the paths found in the first input \
cmt file" );
]

let set_log_level debug verbose =
Log.set_log_level Error;
if verbose then Log.set_log_level Warning;
if debug then Log.set_log_level Debug

let () =
Arg.parse speclist anon_fun usage_msg;
set_log_level !debug !verbose;
(match !command with
| Some Aggregate ->
let root = if String.equal "" !root then None else Some !root in
Index.from_files ~store_shapes:!store_shapes ~root
~rewrite_root:!rewrite_root ~output_file:!output_file
~build_path:!build_path
~do_not_use_cmt_loadpath:!do_not_use_cmt_loadpath !input_files
| Some Dump ->
List.iter
(fun file ->
Index_format.(
read_exn ~file |> pp Format.std_formatter))
!input_files
| Some Gather_shapes ->
Index.gather_shapes ~output_file:!output_file !input_files
| Some Stats ->
List.iter
(fun file ->
let open Merlin_index_format.Index_format in
let { defs; approximated; cu_shape; root_directory; _ } =
read_exn ~file
in
Printf.printf
"Index %S contains:\n\
- %i definitions\n\
- %i locations\n\
- %i approximated definitions\n\
- %i compilation units shapes\n\
- root dir: %s\n\n"
file (Uid_map.cardinal defs)
(Uid_map.fold
(fun _uid locs acc -> acc + Lid_set.cardinal locs)
defs 0)
(Uid_map.cardinal approximated)
(Hashtbl.length cu_shape)
(Option.value ~default:"none" root_directory))
!input_files
| _ -> Printf.printf "Nothing to do.\n%!");
exit 0
5 changes: 5 additions & 0 deletions src/ocaml-index/lib/cache.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
include File_cache.Make (struct
type t = Index_format.file_content
let read file = Index_format.read ~file
let cache_name = "Index_cache"
end)
17 changes: 17 additions & 0 deletions src/ocaml-index/lib/dune
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
(library
(name lib)
(libraries
ocaml_typing
ocaml_parsing
ocaml_utils
merlin_utils
merlin_analysis
merlin_index_format)
(flags
:standard
-open Ocaml_typing
-open Ocaml_parsing
-open Ocaml_utils
-open Merlin_utils
-open Merlin_analysis
-open Merlin_index_format))
Loading
Loading